Friday, February 23, 2007

Beta Defining SOA, or adding to the pollution.

Defining SOA, or adding to the pollution.

Someone sent me this link to a movie on SOA. Those gold disks look pretty expensive. And I hope the red bouncing balls don't have to change colour, or shape; it could break the expensive gold disks.

My immediate response to anything about SOA is SnakeOil. This is because I have never found a good definition of SOA that I really liked; something really visceral that you can chew on. Mostly I have found the definitions to have too much hand waving, with high minded terms that I don't really understand. Or else, too obvious to be of any use.

REST is more tangible, for example, anything that can have an identity can be a resource. OK then, I have this thing, not really sure what this thing is, but at least I can give it a URI. However, REST does have a few glib phrases: "hypermedia as the engine of application state", but they add spice.

I have been challenged to provide my own definition of SOA. Normally, I am loathe to create definitions as it pollutes the world and causes confusion. It is always better to reference an existing definition. Even this concept is RESTful, if you reference an existing definition you get the same network effects as linking; the more people that "link" to that definition the more authoritative it becomes. This doesn't always work though, the definitive definition of REST, Dr. Fielding's thesis, is not the top hit in Google for the term REST. However, in any debate Dr. Fielding always has the last word on what REST means, though he may of course be wrong in what he asserts in his thesis; REST is his word, and his definition is the only correct one. Of course his definition may change over time, but this is just as unhealthy as creating new definitions in terms of confusing people.

So what is my definition of SOA. An architecture for a distributed system based on services, is nice and vapid, though logically sound. The key, then is to define a service.

A service interface is defined by the set of messages it can process, and the set of messages it emits. The set of messages it can process, MessagesIn, and the set of messages it emits, MessagesOut, can change over time. There may exist a mapping of a subset of MessagesIn to a subset of MessagesOut, such that if the service receives a message from this subset of MessagesIn it will emit a message from the subset of MessagesOut. A service interface description describes a subset of the MessagesIn that a service can process, a subset of MessagesOut that a service can emit and any mapping between the subsets of MessagesIn and MessagesOut that it describes. A service interface description only defines sets of messages at a particular time, so a service may have a series of service interface descriptions. A service interface description is incomplete if it does not include the compete set of MessagesIn and MessagesOut for a service at a particular time. Since it is only possible to communicate using messages, and as messages must take a finite time to travel between sender and receiver, a client can never know if a service interface description is still valid (assuming the service interface description is in the MessagesOut set). A service interface description is faulty if it describes a set of input or output messages that includes messages which are not in MessagesIn or MessagesOut.

A service may also have internal state, and can be modelled by a state machine. This state machine may or may not be deterministic. The mapping of input to output messages may depend on the internal state of the machine, and the mapping may change over time.

Service developers should make the set of messages MessagesIn as large as possible, preferably it should be an infinite set to make interoperability and service description easier. Service developers should constrain the size of MessagesOut to be as small as possible, preferably a set with no members, as it makes interoperability and service description easier. (The last two clauses are a form of Postal's Law). Service developers should make the internal state of their service as large as possible, preferably of infinite size because then they will be better than Google, and can charge a lot for advertising. If a service developer achieves this, an infinite set of MessagesIn, a zero size set of MessagesOut with an infinite amount of internal state, then they will have created God.

Tuesday, February 20, 2007

The WSRF CurrentTime is 5.41pm

CurrentTime kills caching for WSRF
We have been fixing a bug in our AJAX to REST/WSRF stuff.
The bug was that IE6 did not update the properties properly. Bruno discovered that IE6 was not invoking the HTTP GET because the URI had not changed, we had not included any cache control HTTP Headers, but I assumed that would mean "don't cache". The HTTP spec is hard going wrt caching but Section 13.4 states "If there is neither a cache validator nor an explicit expiration time associated with a response, we do not expect it to be cached, but certain caches MAY violate this expectation (for example, when little or no network connectivity is available)".The solution was to add HTTP Headers which explicitly said don't cache this. Adding metadata cannot be a bad thing I guess.

The next issue was should we do caching properly? There are a lot of advantages, especially for the AJAX client, as we could support conditional GETs. I had a vague idea how to do it too, put in a hook for the developer to attach his caching policy/code. But then I remembered CurrentTime; WSRF defines a ResourceProperty, CurrentTime, so that a client can find out the time when the WS-Resource had a certain set of properties. So one of our properties is always changing; so we cannot cache!

CurrentTime is a bad name, because almost by definition it is not the current time -- its some time later, Lamport's clocks. HTTP uses a HTTP Header, Date, to convey the same concept, and the concept really is the time at which the message was created. Its metadata about the message, it should not really be part of the message.

The good thing is I do not have to write the hooks for supporting caching, the bad thing is that we cannot do caching.

Wednesday, February 14, 2007

A Solution to a Distributed Systems Problem

The solution to this puzzle is:

Any protocol that solves this problem is equivalent to one
in which there are rounds of message exchanges: first A (say)
sends to B, next B sends to A, then A sends to B, and so on. We
show that in assuming the existence of a protocol to solve the
problem, we are able to derive a contradiction. This
establishes that no such protocol exists.

Select the protocol that solves the problem using the fewest
rounds. By assumption, such a protocol must exist and, by
construction, no protocol solving the problem using fewer rounds
exists. Without loss of generality suppose that m, the last
message sent by either process, is sent by A.

Observe that the action ultimately taken by A cannot depend on
whether m is recieved by B, because its receipt could never be
learned by A (since it is the last message). Thus, A's choice of
action alpha or beta does not depend on m. Next, observe that the
action ultimately taken by B cannot depend on whether m is
recieved by B, because B must make the same choice of action alpha
or beta even if m is lost (due to channel failure).

Having argued that the action chosen by A and B does not depend on
m, we conclude that m is superfluous. Thus, we can construct a new
protocol in which one fewer message is sent. However, the existence
of such a shorter protocol contradicts the assumption that our
original protocol used the fewest number of rounds.

A Distributed Systems Puzzle...

A Distributed Systems Puzzle
I like this little puzzle taken from Distributed Systems:

Two processes, A and B, communicate by sending and
receiving messages on a bidirectional channel. Neither
process can fail. However, the channel can experience
transient failures, resulting in loss of a subset
of the messages that have been sent. Devise a protocol
where either of two actions alpha or beta are possible,
but (i) both processes take the same action and (ii)
neither takes both actions.

Can you show that there is no solution?

Friday, February 09, 2007

Google does the Impossible!

Google does the Impossible

I often wondered whether Google would hire me, until now I had doubts. Google only hire people who are smarter than the average Googler, and given my lack of a CS background I would feel at a further disadvantage given a question like "What is the most efficient way to sort a million integers?"; I might blurt out "With a computer". With this kind of attitude you might start to think they are a bit arrogant.

However this paper on the Chubby Lock Service gives me hope! I was interested in it because it uses the Paxos algorithm, and I am pretty interested in Paxos from our work on HARC. This section in the Chubby paper introduction caught my eye.

Readers familiar with distributed computing will recognize the election of a primary among peers as an instance of the distributed consensus problem, and realize we require a solution using asynchronous communication; this term describes the behavior of the vast majority of real networks, such as Ethernet or the Internet, which allow packets to be lost, delayed, and reordered. (Practitioners should normally beware of protocols based on models that make stronger assumptions on the environment.) Asynchronous consensus is solved by the Paxos protocol [12, 13]. The same protocol was used by Oki and Liskov (see their paper on viewstamped replication [19]), an equivalence noted by others [14]. Indeed, all working protocols for asynchronous consensus we have so far encountered have Paxos at their core. Paxos maintains safety without timing assumptions, but clocks must be introduced to ensure liveness; this overcomes the impossibility result of Fischer et al. [5].

Lets start with the first sentence, "asynchronous communication; this term describes the behavior of the vast majority of real networks, such as Ethernet or the Internet, which allow packets to be lost, delayed, and reordered.". This definition of asynchronous communications, in the context of a discussion on Consensus, is wrong. Let's grab the Fischer et al. paper for our reference. This paper, famously known by the initials of the authors names as FLP, is one of the most important in distributed systems literature; we will see why it is so important in a bit. The full title of the paper is "Impossibility of Distributed Consensus with One Faulty Process",pretty arresting stuff! From the paper:

"In this paper, we show the surprising result that no completely asynchronous consensus protocol can tolerate even a single unannounced process death. We do not consider Byzantine failures, and we assume that the message system is reliable - it delivers all messages correctly and exactly once. Nevertheless, even with these assumptions, the stopping of a single process at an inopportune time can cause any distributed commit protocol to fail to reach agreement."

The paper goes on to describe an asynchronous system:

"Crucial to our proof is that processing is completely asynchronous; that is, we make no assumptions about the relative speeds of processes or about the delay time in delivering a message. We also assume that processes do not have access to synchronized clocks, so algorithms based on time-outs, for example, cannot be used. (In particular, the solutions in [6] are not applicable.) Finally, we do not postulate the ability to detect the death of a process, so it is impossible for one process to tell whether another has died (stopped entirely) or is just running very slowly."

So "asynchronous communication" actually means that there is no upper bound on how long a message can take to be delivered. You can have asynchronous communications and a perfect network: messages are always delivered, they are delivered in order and delivered only once, but it can take an infinite time for them to be delivered. This is the FLP model. (For previous rants on asynchronous, synchronous and partially synchronous see this blog entry: Asynchronous?.) Of course in synchronous communications you can have message losses, message delays and message re ordering, but it is easier to deal with in this environment: if I don't get a message in the next 30 seconds I know there has been a fault.

The FLP paper is important for two reasons: it says something fundamental about dealing with faults in distributed systems, and it demonstrated how to show some problems in distributed computing are impossible to solve. Dealing with faults in an asynchronous system is very difficult, because you cannot tell whether a processor is very slow or faulted. You need to use time, or fault detectors. To show that a problem is impossible to solve FLP uses proof by contradiction: show what properties an algorithm that solves the problem must have, and then show that there is some contradiction in the properties. After FLP people had lots of fun showing stuff was impossible.

The next phrase in the Chubby paper is: "(Practitioners should normally beware of protocols based on models that make stronger assumptions on the environment.)" This seems a bit rich! With respect to timing models, you can choose between synchronous, asynchronous or partially synchronous, then you can choose the fault model, for example Byzantine or non-Byzantine. As we will see Chubby system actually does make stronger assumptions about the timing model, it is actually depending on the system being partially synchronous.

The next line in the Chubby paper is: "Asynchronous consensus is solved by the Paxos protocol [12, 13].". The wording is a bit loose here, there are various versions of the consensus problem with different fault models and requirements, for example Uniform Consensus or even the Byzantine Generals, however most people would take this statement to mean the consensus problem as defined by FLP. But FLP showed consensus in an asynchronous system is impossible with only one faulty process! (Consensus is straight forward without faults, so is not really considered a problem).

Now skipping to the last sentence from the Chubby extract: "Paxos maintains safety without timing assumptions, but clocks must be introduced to ensure liveness; this overcomes the impossibility result of Fischer et al. [5]." The logic of the last part implies that Fischer et al. are wrong, or else Google have achieved the impossible! However if we wind back a bit, they introduce clocks and once you do that the system is no longer completely asynchronous. They don't overcome the impossibility result of Fischer et al., they are using a different model which isn't completely asynchronous.

So why all the confusion? Lamport clarifies it as follows: "Asynchronous consensus algorithms like Paxos maintain safety despite asynchrony, but are guaranteed to make progress only when the system becomes synchronous - meaning that messages are delivered in a bounded length of time." Paxos is an asynchronous algorithm because it doesn't use time, but the timing of events in the system must conspire for it to terminate.

Butler Lampson summarized the timing requirements nicely in "How to build a highly available system using consensus.":

"It [Paxos] terminates if there is a single leader for a long enough time during which the leader can talk to a majority of the agent processes twice. It may not terminate if there are always too many leaders (fortunate, since we know that guaranteed termination is impossible ).". The last part is FLP.

There is a pathological case in Paxos were two leaders compete to get a value chosen and you effectively end up with livelock, however it is not difficult in practice to avoid this. Lampson describes how to use a sloppy leadership election to try and make sure there is only one leader, it doesn't matter if the sloppy leadership election screws up because we know Paxos is safe with multiple leaders. The pathological case corresponds to the FLP result.

"Consensus in the Presence of Partial Synchrony" provides details on the conditions that are required in order to reach consensus.

I have only a practical interest in Paxos in order to solve the problems addressed by HARC. I am not sure I fully understand the proofs behind it, or FLP, but the main ideas behind it are important to get right. Which is why I am so disappointed by the Chubby paper, more people will read it than FLP, or the Paxos papers, and it will only cause confusion. As if the "The Part-Time Parliament" doesn't do enough of that!

Wednesday, February 07, 2007

Perfect Code

Jim Gray's disappearance has lead me to reflect on the things he has written that have stayed with me. One them is perfect code. In
Transaction Processing: Concepts and Techniques Jim talks about perfect code; code without bugs. He even gives an example, a simple function of a few lines that adds two numbers together. In fact it took him a couple of attempts to get it right! Jim makes the point that perfect code is possible, just very expensive. He goes on to talk about the cost of code in terms of dollars per line. NASA pay the most, but their code still has bugs! The astronauts board the shuttle with a bug list, an example of which was not to use the two keyboards on the shuttle at the same time because the inputs would be OR'd! Perfect code is possible, just really expensive.

Most people are surprised when I say that perfect code is possible, they just assume that all code contains bugs. I was reminded of this while following the Nearly All Binary Searches and Mergesorts are Broken thread on binary search implementation bugs. At the time I hit the library to check Knuth's version of binary search; he didn't let me down. Someone who invents their own ISA to express their algorithms isn't going to get caught by an overflow.

I am sure the attitude that all code contains bugs must have a detrimental effect on programmers, it has on me. It is also re-enforced by the open source mantra "release early, release often". Testing doesn't solve all the problems either: "Testing can only prove the existence of bugs", Dijkstra.

I recently re-read the History of System R. The name Franco Putzolu cropped up, Jim mentioned him in the acknowledgements of "Transaction Processing: Concepts and Techniques" as someone who had made huge contributions to the field of transactions and databases but who never got the public recognition as his name was not on a lot of the famous papers.
From the System R history (Copyright attached):

Mike Blasgen: The RSS didn't have any bugs. [laughter] No, it's true, the reason is because much of the RSS was written by Franco. No, it's really true; Franco never wrote a bug. Except for one,
right, Bruce? Did you find one?

Bruce Lindsay: One.

Mike Blasgen: He wrote about half of RSS, and I think we found one bug. And that was after nine years.

RSS was the storage part of System R and used B-Trees. It was 35,000 lines of code to solve some pretty hairy problems, and had only one bug! Remember too that the tool support back then must have been terrible compared to today. Pretty perfect then.

Copyright (c) 1995, 1997 by Paul McJones, Roger Bamford, Mike Blasgen, Don Chamberlin, Josephine Cheng, Jean-Jacques Daudenarde, Shel Finkelstein, Jim Gray, Bob Jolls, Bruce Lindsay, Raymond Lorie, Jim Mehl, Roger Miller, C. Mohan, John Nauman, Mike Pong, Tom Price, Franco Putzolu, Mario Schkolnick, Bob Selinger, Pat Selinger, Don Slutz, Irv Traiger, Brad Wade, and Bob Yost. You may copy this document in whole or in part without payment of fee provided that you acknowledge the authors and include this notice.