Friday, February 09, 2007

Google does the Impossible!

Google does the Impossible

I often wondered whether Google would hire me, until now I had doubts. Google only hire people who are smarter than the average Googler, and given my lack of a CS background I would feel at a further disadvantage given a question like "What is the most efficient way to sort a million integers?"; I might blurt out "With a computer". With this kind of attitude you might start to think they are a bit arrogant.

However this paper on the Chubby Lock Service gives me hope! I was interested in it because it uses the Paxos algorithm, and I am pretty interested in Paxos from our work on HARC. This section in the Chubby paper introduction caught my eye.

Readers familiar with distributed computing will recognize the election of a primary among peers as an instance of the distributed consensus problem, and realize we require a solution using asynchronous communication; this term describes the behavior of the vast majority of real networks, such as Ethernet or the Internet, which allow packets to be lost, delayed, and reordered. (Practitioners should normally beware of protocols based on models that make stronger assumptions on the environment.) Asynchronous consensus is solved by the Paxos protocol [12, 13]. The same protocol was used by Oki and Liskov (see their paper on viewstamped replication [19]), an equivalence noted by others [14]. Indeed, all working protocols for asynchronous consensus we have so far encountered have Paxos at their core. Paxos maintains safety without timing assumptions, but clocks must be introduced to ensure liveness; this overcomes the impossibility result of Fischer et al. [5].

Lets start with the first sentence, "asynchronous communication; this term describes the behavior of the vast majority of real networks, such as Ethernet or the Internet, which allow packets to be lost, delayed, and reordered.". This definition of asynchronous communications, in the context of a discussion on Consensus, is wrong. Let's grab the Fischer et al. paper for our reference. This paper, famously known by the initials of the authors names as FLP, is one of the most important in distributed systems literature; we will see why it is so important in a bit. The full title of the paper is "Impossibility of Distributed Consensus with One Faulty Process",pretty arresting stuff! From the paper:

"In this paper, we show the surprising result that no completely asynchronous consensus protocol can tolerate even a single unannounced process death. We do not consider Byzantine failures, and we assume that the message system is reliable - it delivers all messages correctly and exactly once. Nevertheless, even with these assumptions, the stopping of a single process at an inopportune time can cause any distributed commit protocol to fail to reach agreement."

The paper goes on to describe an asynchronous system:

"Crucial to our proof is that processing is completely asynchronous; that is, we make no assumptions about the relative speeds of processes or about the delay time in delivering a message. We also assume that processes do not have access to synchronized clocks, so algorithms based on time-outs, for example, cannot be used. (In particular, the solutions in [6] are not applicable.) Finally, we do not postulate the ability to detect the death of a process, so it is impossible for one process to tell whether another has died (stopped entirely) or is just running very slowly."

So "asynchronous communication" actually means that there is no upper bound on how long a message can take to be delivered. You can have asynchronous communications and a perfect network: messages are always delivered, they are delivered in order and delivered only once, but it can take an infinite time for them to be delivered. This is the FLP model. (For previous rants on asynchronous, synchronous and partially synchronous see this blog entry: Asynchronous?.) Of course in synchronous communications you can have message losses, message delays and message re ordering, but it is easier to deal with in this environment: if I don't get a message in the next 30 seconds I know there has been a fault.

The FLP paper is important for two reasons: it says something fundamental about dealing with faults in distributed systems, and it demonstrated how to show some problems in distributed computing are impossible to solve. Dealing with faults in an asynchronous system is very difficult, because you cannot tell whether a processor is very slow or faulted. You need to use time, or fault detectors. To show that a problem is impossible to solve FLP uses proof by contradiction: show what properties an algorithm that solves the problem must have, and then show that there is some contradiction in the properties. After FLP people had lots of fun showing stuff was impossible.

The next phrase in the Chubby paper is: "(Practitioners should normally beware of protocols based on models that make stronger assumptions on the environment.)" This seems a bit rich! With respect to timing models, you can choose between synchronous, asynchronous or partially synchronous, then you can choose the fault model, for example Byzantine or non-Byzantine. As we will see Chubby system actually does make stronger assumptions about the timing model, it is actually depending on the system being partially synchronous.

The next line in the Chubby paper is: "Asynchronous consensus is solved by the Paxos protocol [12, 13].". The wording is a bit loose here, there are various versions of the consensus problem with different fault models and requirements, for example Uniform Consensus or even the Byzantine Generals, however most people would take this statement to mean the consensus problem as defined by FLP. But FLP showed consensus in an asynchronous system is impossible with only one faulty process! (Consensus is straight forward without faults, so is not really considered a problem).

Now skipping to the last sentence from the Chubby extract: "Paxos maintains safety without timing assumptions, but clocks must be introduced to ensure liveness; this overcomes the impossibility result of Fischer et al. [5]." The logic of the last part implies that Fischer et al. are wrong, or else Google have achieved the impossible! However if we wind back a bit, they introduce clocks and once you do that the system is no longer completely asynchronous. They don't overcome the impossibility result of Fischer et al., they are using a different model which isn't completely asynchronous.

So why all the confusion? Lamport clarifies it as follows: "Asynchronous consensus algorithms like Paxos maintain safety despite asynchrony, but are guaranteed to make progress only when the system becomes synchronous - meaning that messages are delivered in a bounded length of time." Paxos is an asynchronous algorithm because it doesn't use time, but the timing of events in the system must conspire for it to terminate.

Butler Lampson summarized the timing requirements nicely in "How to build a highly available system using consensus.":

"It [Paxos] terminates if there is a single leader for a long enough time during which the leader can talk to a majority of the agent processes twice. It may not terminate if there are always too many leaders (fortunate, since we know that guaranteed termination is impossible ).". The last part is FLP.

There is a pathological case in Paxos were two leaders compete to get a value chosen and you effectively end up with livelock, however it is not difficult in practice to avoid this. Lampson describes how to use a sloppy leadership election to try and make sure there is only one leader, it doesn't matter if the sloppy leadership election screws up because we know Paxos is safe with multiple leaders. The pathological case corresponds to the FLP result.

"Consensus in the Presence of Partial Synchrony" provides details on the conditions that are required in order to reach consensus.

I have only a practical interest in Paxos in order to solve the problems addressed by HARC. I am not sure I fully understand the proofs behind it, or FLP, but the main ideas behind it are important to get right. Which is why I am so disappointed by the Chubby paper, more people will read it than FLP, or the Paxos papers, and it will only cause confusion. As if the "The Part-Time Parliament" doesn't do enough of that!


Blogger 422d said...

Good points about the google article and the other papers. I've read all of them too...several times...and it's always nice to hear somebody else explain the concepts. I work on distributed systems professionally and definitely understand the need to get the theory right before selling a product.

9:55 PM  
Blogger Marco P said...

I have a question about the consensus problem. Are the proposed alternatives always binary? Or can there be more than two alternatives?

5:30 AM  

Post a Comment

<< Home