Wednesday, June 13, 2007

A brief history of Consensus, 2PC and Transaction Commit.

This is a potted history of consensus, transactions and 2PC. Reading the literature on consensus is difficult because the language changes (consensus was originally called agreement), the results come in an order that isn't logical, and the whole framework for describing distributed algorithms evolved in parallel with the work. Also, there are few books other than Lynch's Distributed Algorithms that cover the subject.

Papers are discussed in the order that makes most sense, not in the order they were published.

The first instance of the consensus problem that I am aware of is in Lamport's "Time, Clocks and the Ordering of Events in a Distributed System" (1978), though it is not explicitly declared as a consensus or agreement problem. In this paper Lamport discusses how messages take a finite time to travel between processors and draws an analogy with Einstein's special relativity. Discussing Einstein's theory with respect to distributed systems is popular recently in the blogsphere, but in 1978 Lamport give a complete analysis with space-time diagrams and all. The issue is that in a distributed system you cannot tell if event A happened before event B, unless A caused B in some way. Each observer can see events happen in a different order, except for events that cause each other, ie there is only a partial ordering of events in a distributed system. Lamport defines the "happens before" relationship and operator, and goes on to give an algorithm that provides a total ordering of events in a distributed system, so that each process sees events in the same order as every other process.

Lamport also introduces the concept of a distributed state machine: start a set of deterministic state machines in the same state and then make sure they process the same messages in the same order. Each machine is now a replica of the others. The key problem is making each replica agree what is the next message to process: a consensus problem. This is what the algorithm for creating a total ordering of events does, it provides an agreed ordering for the delivery of messages. However, the system is not fault tolerant; if one process fails that others have to wait for it to recover.

Around the same time as this paper, Gray described 2PC in "Notes on Database Operating Systems" (1979). Unfortunately 2PC would block if the TM (Transaction Manager) fails at the wrong time. Skeen showed in "NonBlocking Commit Protocols" (1981)that for a distributed transactions you needed a 3 phrase commit algorithm to avoid the blocking problems associated with 2PC. The problem was coming up with a nice 3PC algorithm, this would only take nearly 25 years!

Fischer, Lynch and Paterson showed that distributed consensus was impossible in an asynchronous system with just one faulty process in "Impossibility of distributed consensus with one faulty process" (1985), this famous result is known as the "FLP" result. By this time "consensus" was the name given to the problem of getting a bunch of processors to agree a value. In an asynchronous system (where processors run at arbitrary speeds and messages can take an arbitrarily long time to travel between processors) with a perfect network (all messages are delivered, messages arrive in order and can not be duplicated) distributed consensus is impossible with just one faulty process (even just a fail-stop). The kernel of the problem is that you cannot tell the difference between a process that has stopped and one that is running very slowly, making dealing with faults in an asynchronous system almost impossible. The paper was also important because it demonstrated how to show something was impossible: show that all algorithms that solve the problem must have some property, then show that this property is impossible, ie proof by contradiction. (This approach was only re-learned as Turing used it in the halting problem)

By this stage people realized that a distributed algorithm has two properties: safety and liveness. Safety means nothing bad happens, while liveness means that something good eventually happens. 2PC is an asynchronous consensus algorithm, all processes must agree on either commit or abort for a transaction. 2PC is safe: no bad data is ever written to the databases, but its liveness properties aren't great: if the TM fails at the wrong point the system will block.

Also by this stage people thought of distributed systems as being synchronous (processes run at known rates, and messages are delivered in known bounds of time) or asynchronous (processes run at unknown and arbitrary rates, and messages can take unbounded time to be delivered). The asynchronous case is more general than the synchronous case: an algorithm that works for an asynchronous system will also work for a synchronous system, but not vice versa. You can treat a synchronous system as a special case of an asynchronous system that just happens to have bounds on the time it takes to deliver a message.

Before FLP, there was the "The Byzantine Generals Problem" (1982) paper. In this form of the consensus problem the processes can lie, and they can actively try to deceive other processes. This problem looks harder than the FLP result, but it does have a solution for the synchronous case (though when the Byzantine Generals paper was written the distinction between asynchronous and synchronous systems was not explicit). The solution is expensive in the number of messages exchanged, and the number of rounds of messages required. The problem originally came from the aerospace industry: what would happen if sensors gave false information on an plane (clearly the system could be treated as synchronous).

In 1986 there was a get together of the distributed systems people who were interested in consensus and the transaction people. At the time the best consensus algorithm was the Byzantine Generals, but this was too expensive to use for transactions. Jim Gray wrote up a note on the meeting: "A Comparison of the Byzantine Agreement Problem and the Transaction Commit Problem." (1987) .

The paper contains this in the introduction :-)

"Prior to the conference, it was widely believed that the transaction commit problem faced by distributed systems is a degenerate form of the Byzantine Generals Problem studied by academe. Perhaps the most useful consequence of the conference was to show that these two problems have little in common."

Eventually distributed transactions would be seen as a version of consensus, called uniform consensus (see "Uniform consensus is harder than consensus" (2000)). With uniform consensus all processes must agree on a value, even the faulty ones - a transaction should only commit if all RMs are prepared to commit. Most forms of consensus are only concerned with having the non-faulty processes agree. Uniform consensus is more difficult than general consensus.

Eventually Lamport came up with the Paxos consensus algorithm, described in "The Part-Time Parliament" (submitted in 1990, published 1998). Unfortunately the analogy with Greek democracy failed badly with people finding the paper very difficult to understand, and the paper was ignored until its case was taken up by Butler Lampson in "How to Build a Highly Availability System using Consensus" (1996). This paper provides a good introduction to building fault tolerant systems and Paxos. Later Lamport would publish "Paxos Made Simple (2001).

The kernel of Paxos is that given a fixed number of processes, any majority of them must have at least one process in common. For example given three processes A, B and C the possible majorities are: AB, AC, or BC. If a decision is made when one majority is present eg AB, then at any time in the future when another majority is available at least one of the processes can remember what the previous majority decided. If the majority is AB then both processes will remember, if AC is present then A will remember and if BC is present then B will remember.

Paxos can tolerate lost messages, delayed messages, repeated messages, and messages delivered out of order. It will reach consensus if there is a single leader for long enough that the leader can talk to a majority of processes twice. Any process, including leaders, can fail and restart; in fact all processes can fail at the same time, the algorithm is still safe. There can be more than one leader at a time.

Paxos is an asynchronous algorithm; there are no explicit timeouts. However, it only reaches consensus when the system is behaving in a synchronous way, ie messages are delivered in a bounded period of time; otherwise it is safe. There is a pathological case where Paxos will not reach consensus, in accordance to FLP, but this scenario is relatively easy to avoid in practice.

Clearly dividing systems into synchronous and asynchronous is too broad a distinction, and Dwork, Lynch and Stockmeyer defined partially synchronous systems in "Consensus in the presence of partial synchrony" (1988) . There are two versions of partial synchronous system: in one processes run at speeds within a known range and messages are delivered in bounded time but the actual values are not known a priori; in the other version the range of speeds of the processes and the upper bound for message deliver are known a priori, but they will only start holding at some unknown time in the future. The partial synchronous model is a better model for the real world than either the synchronous or asynchronous model; networks function in a predicatable way most of the time, but occasionally go crazy.

Lamport and Gray went on to apply Paxos to the distributed transaction commit problem in "Consensus on Transaction Commit" (2005). They used Paxos to effectively replicate the TM of 2PC, and used an instance of Paxos for each RM involved in the transaction to agree whether that RM could commit the transaction. On the face of it, using an instance of Paxos per RM looks expensive, but it turns out that it is not. Paxos Commit will complete in two phases for the fault free case, ie it has the same message delay as 2PC, though more messages are exchanged. A third phase is only required if there is a fault, in accordance to the Skeen result. Given 2n+1 TM replicas Paxos Commit will complete with up to n faulty replicas. Paxos Commit does not use Paxos to solve the transaction commit problem directly, ie it is not used to solve uniform consensus, rather it is used to make the system fault tolerant.


Any argument that distributed transactions should not be used because 2PC is blocking is a void, because Paxos Commit addresses the blocking issue.

Recently there has been some discussion of the CAP conjecture: Consistency, Availability and Partition. The conjecture asserts that you cannot have all three in a distributed system: a system that is consistent, that can have faulty processes and that can handle a network partition.

We can examine CAP by equating consistency with consensus. For an asynchronous system we cannot reach consensus with one faulty process, FLP, so we cannot have consistency and availability for an asynchronous system!

Now take a Paxos system with three nodes: A, B and C. We can reach consensus if two nodes are working, ie we can have consistency and availability. Now if C becomes partitioned and C is queried, it cannot respond because it cannot communicate with the other nodes; it doesn't know whether it has been partitioned, or if the other two nodes are down, or if the network is being very slow. The other two nodes can carry on, because they can talk to each other and they form a majority. So for the CAP conjecture, Paxos does not handle a partition because C cannot respond to queries. However, we could engineer our way around this. If we are inside a data center we can use two independent networks (Paxos doesn't mind if messages are repeated). If we are on the internet, then we could have our client query all nodes A, B and C, and if C is partitioned the client can query A or B unless it is partitioned in a similar way to C.

For a synchronous network, if C is partitioned it can learn that it is partitioned if it does not receive messages in a fixed period of time, and thus can declare itself down to the client.

Paxos, Paxos Commit and HTTP/REST have been combined to build a highly available co-allocation system for Grid computing, details of which can be found here HARC, there are also more references in this paper: "Co-Allocation, Fault Tolerance and Grid Computing" (2006).

Thursday, March 01, 2007

What are ReferenceParameters for?

What are ReferenceParameters for?


This mini-rant was inspired by the W3C Workshop on Enterprise computing

You get ReferenceParameters in WS-Addressing EndpointReferences, also known as EPRs. Kinda like this:

<wsa:EndpointReference>
  <wsa:Address>http://www.crapola.com/SoapSink</wsa:Address>
  <wsa:ReferenceParameters>
    <SessionID>25324523</SessionID>
    <ResourceID>1434123421</ResourceID>
    <UserID>mark</UserID>
    <UserServiceRating>Gold</UserServiceRating>
  </wsa:ReferenceParameters>
</wsa:EndpointReference>

A few things about ReferenceParameters. First, given an EPR you should treat the ReferenceParameters as opaque; you should not reason about them. Second, the ReferenceParameters get stuffed into SOAP Headers, and are sent along to the service that the EPR addresses. Third, the thing in the wsa:Address is an identifier, not a location, you de-reference the identifier to a location before sending the message.

So what might you do with ReferenceParameters? Well, if you are using WSRF you might use them for a bit of identification. Have all the SOAP messages arrive at the endpoint identified by the wsa:Address element, then direct the messages to particular WS-Resources based on the ReferenceParameters; ResourceID in the example.

Or you might decide to use ReferenceParameters to support sessions. You want to track a particular client's usage of a service, then give him an EPR with a session identifier, SessionID in the example, embedded in the ReferenceParameters. The session identifier will be carried in the SOAP header each time the client sends a message to the service. You could even carry some information to identify the client; UserID or UserServiceRating in the example.

All these uses of ReferenceParameters are possible, though the W3C TAG frowns on the first. ReferenceParameters are a bit like HTTP cookies in functionality; cookies can do all the above for HTTP. So, what is the problem?

Well, given an EPR that has ReferenceParameters you should NEVER share it with anyone else. You cannot know what those ReferenceParameters are for. They could be there for some identification purpose, in which case it would be OK to share them, but you cannot know that for sure. They could actually be for identifying a particular session, or client. Sharing EPRs with ReferenceParameters would be like sharing your HTTP cookies; you simply wouldn't do it. Now, imagine a Web were you were not able to share URIs.

Friday, February 23, 2007

Beta Defining SOA, or adding to the pollution.

Defining SOA, or adding to the pollution.

Someone sent me this link to a movie on SOA. Those gold disks look pretty expensive. And I hope the red bouncing balls don't have to change colour, or shape; it could break the expensive gold disks.

My immediate response to anything about SOA is SnakeOil. This is because I have never found a good definition of SOA that I really liked; something really visceral that you can chew on. Mostly I have found the definitions to have too much hand waving, with high minded terms that I don't really understand. Or else, too obvious to be of any use.

REST is more tangible, for example, anything that can have an identity can be a resource. OK then, I have this thing, not really sure what this thing is, but at least I can give it a URI. However, REST does have a few glib phrases: "hypermedia as the engine of application state", but they add spice.

I have been challenged to provide my own definition of SOA. Normally, I am loathe to create definitions as it pollutes the world and causes confusion. It is always better to reference an existing definition. Even this concept is RESTful, if you reference an existing definition you get the same network effects as linking; the more people that "link" to that definition the more authoritative it becomes. This doesn't always work though, the definitive definition of REST, Dr. Fielding's thesis, is not the top hit in Google for the term REST. However, in any debate Dr. Fielding always has the last word on what REST means, though he may of course be wrong in what he asserts in his thesis; REST is his word, and his definition is the only correct one. Of course his definition may change over time, but this is just as unhealthy as creating new definitions in terms of confusing people.

So what is my definition of SOA. An architecture for a distributed system based on services, is nice and vapid, though logically sound. The key, then is to define a service.

A service interface is defined by the set of messages it can process, and the set of messages it emits. The set of messages it can process, MessagesIn, and the set of messages it emits, MessagesOut, can change over time. There may exist a mapping of a subset of MessagesIn to a subset of MessagesOut, such that if the service receives a message from this subset of MessagesIn it will emit a message from the subset of MessagesOut. A service interface description describes a subset of the MessagesIn that a service can process, a subset of MessagesOut that a service can emit and any mapping between the subsets of MessagesIn and MessagesOut that it describes. A service interface description only defines sets of messages at a particular time, so a service may have a series of service interface descriptions. A service interface description is incomplete if it does not include the compete set of MessagesIn and MessagesOut for a service at a particular time. Since it is only possible to communicate using messages, and as messages must take a finite time to travel between sender and receiver, a client can never know if a service interface description is still valid (assuming the service interface description is in the MessagesOut set). A service interface description is faulty if it describes a set of input or output messages that includes messages which are not in MessagesIn or MessagesOut.

A service may also have internal state, and can be modelled by a state machine. This state machine may or may not be deterministic. The mapping of input to output messages may depend on the internal state of the machine, and the mapping may change over time.

Service developers should make the set of messages MessagesIn as large as possible, preferably it should be an infinite set to make interoperability and service description easier. Service developers should constrain the size of MessagesOut to be as small as possible, preferably a set with no members, as it makes interoperability and service description easier. (The last two clauses are a form of Postal's Law). Service developers should make the internal state of their service as large as possible, preferably of infinite size because then they will be better than Google, and can charge a lot for advertising. If a service developer achieves this, an infinite set of MessagesIn, a zero size set of MessagesOut with an infinite amount of internal state, then they will have created God.

Tuesday, February 20, 2007

The WSRF CurrentTime is 5.41pm

CurrentTime kills caching for WSRF
We have been fixing a bug in our AJAX to REST/WSRF stuff.
The bug was that IE6 did not update the properties properly. Bruno discovered that IE6 was not invoking the HTTP GET because the URI had not changed, we had not included any cache control HTTP Headers, but I assumed that would mean "don't cache". The HTTP spec is hard going wrt caching but Section 13.4 states "If there is neither a cache validator nor an explicit expiration time associated with a response, we do not expect it to be cached, but certain caches MAY violate this expectation (for example, when little or no network connectivity is available)".The solution was to add HTTP Headers which explicitly said don't cache this. Adding metadata cannot be a bad thing I guess.

The next issue was should we do caching properly? There are a lot of advantages, especially for the AJAX client, as we could support conditional GETs. I had a vague idea how to do it too, put in a hook for the developer to attach his caching policy/code. But then I remembered CurrentTime; WSRF defines a ResourceProperty, CurrentTime, so that a client can find out the time when the WS-Resource had a certain set of properties. So one of our properties is always changing; so we cannot cache!

CurrentTime is a bad name, because almost by definition it is not the current time -- its some time later, Lamport's clocks. HTTP uses a HTTP Header, Date, to convey the same concept, and the concept really is the time at which the message was created. Its metadata about the message, it should not really be part of the message.

The good thing is I do not have to write the hooks for supporting caching, the bad thing is that we cannot do caching.

Wednesday, February 14, 2007

A Solution to a Distributed Systems Problem

The solution to this puzzle is:

Any protocol that solves this problem is equivalent to one
in which there are rounds of message exchanges: first A (say)
sends to B, next B sends to A, then A sends to B, and so on. We
show that in assuming the existence of a protocol to solve the
problem, we are able to derive a contradiction. This
establishes that no such protocol exists.

Select the protocol that solves the problem using the fewest
rounds. By assumption, such a protocol must exist and, by
construction, no protocol solving the problem using fewer rounds
exists. Without loss of generality suppose that m, the last
message sent by either process, is sent by A.

Observe that the action ultimately taken by A cannot depend on
whether m is recieved by B, because its receipt could never be
learned by A (since it is the last message). Thus, A's choice of
action alpha or beta does not depend on m. Next, observe that the
action ultimately taken by B cannot depend on whether m is
recieved by B, because B must make the same choice of action alpha
or beta even if m is lost (due to channel failure).

Having argued that the action chosen by A and B does not depend on
m, we conclude that m is superfluous. Thus, we can construct a new
protocol in which one fewer message is sent. However, the existence
of such a shorter protocol contradicts the assumption that our
original protocol used the fewest number of rounds.

A Distributed Systems Puzzle...

A Distributed Systems Puzzle
I like this little puzzle taken from Distributed Systems:

Two processes, A and B, communicate by sending and
receiving messages on a bidirectional channel. Neither
process can fail. However, the channel can experience
transient failures, resulting in loss of a subset
of the messages that have been sent. Devise a protocol
where either of two actions alpha or beta are possible,
but (i) both processes take the same action and (ii)
neither takes both actions.

Can you show that there is no solution?

Friday, February 09, 2007

Google does the Impossible!

Google does the Impossible

I often wondered whether Google would hire me, until now I had doubts. Google only hire people who are smarter than the average Googler, and given my lack of a CS background I would feel at a further disadvantage given a question like "What is the most efficient way to sort a million integers?"; I might blurt out "With a computer". With this kind of attitude you might start to think they are a bit arrogant.

However this paper on the Chubby Lock Service gives me hope! I was interested in it because it uses the Paxos algorithm, and I am pretty interested in Paxos from our work on HARC. This section in the Chubby paper introduction caught my eye.

Readers familiar with distributed computing will recognize the election of a primary among peers as an instance of the distributed consensus problem, and realize we require a solution using asynchronous communication; this term describes the behavior of the vast majority of real networks, such as Ethernet or the Internet, which allow packets to be lost, delayed, and reordered. (Practitioners should normally beware of protocols based on models that make stronger assumptions on the environment.) Asynchronous consensus is solved by the Paxos protocol [12, 13]. The same protocol was used by Oki and Liskov (see their paper on viewstamped replication [19]), an equivalence noted by others [14]. Indeed, all working protocols for asynchronous consensus we have so far encountered have Paxos at their core. Paxos maintains safety without timing assumptions, but clocks must be introduced to ensure liveness; this overcomes the impossibility result of Fischer et al. [5].

Lets start with the first sentence, "asynchronous communication; this term describes the behavior of the vast majority of real networks, such as Ethernet or the Internet, which allow packets to be lost, delayed, and reordered.". This definition of asynchronous communications, in the context of a discussion on Consensus, is wrong. Let's grab the Fischer et al. paper for our reference. This paper, famously known by the initials of the authors names as FLP, is one of the most important in distributed systems literature; we will see why it is so important in a bit. The full title of the paper is "Impossibility of Distributed Consensus with One Faulty Process",pretty arresting stuff! From the paper:

"In this paper, we show the surprising result that no completely asynchronous consensus protocol can tolerate even a single unannounced process death. We do not consider Byzantine failures, and we assume that the message system is reliable - it delivers all messages correctly and exactly once. Nevertheless, even with these assumptions, the stopping of a single process at an inopportune time can cause any distributed commit protocol to fail to reach agreement."

The paper goes on to describe an asynchronous system:

"Crucial to our proof is that processing is completely asynchronous; that is, we make no assumptions about the relative speeds of processes or about the delay time in delivering a message. We also assume that processes do not have access to synchronized clocks, so algorithms based on time-outs, for example, cannot be used. (In particular, the solutions in [6] are not applicable.) Finally, we do not postulate the ability to detect the death of a process, so it is impossible for one process to tell whether another has died (stopped entirely) or is just running very slowly."

So "asynchronous communication" actually means that there is no upper bound on how long a message can take to be delivered. You can have asynchronous communications and a perfect network: messages are always delivered, they are delivered in order and delivered only once, but it can take an infinite time for them to be delivered. This is the FLP model. (For previous rants on asynchronous, synchronous and partially synchronous see this blog entry: Asynchronous?.) Of course in synchronous communications you can have message losses, message delays and message re ordering, but it is easier to deal with in this environment: if I don't get a message in the next 30 seconds I know there has been a fault.

The FLP paper is important for two reasons: it says something fundamental about dealing with faults in distributed systems, and it demonstrated how to show some problems in distributed computing are impossible to solve. Dealing with faults in an asynchronous system is very difficult, because you cannot tell whether a processor is very slow or faulted. You need to use time, or fault detectors. To show that a problem is impossible to solve FLP uses proof by contradiction: show what properties an algorithm that solves the problem must have, and then show that there is some contradiction in the properties. After FLP people had lots of fun showing stuff was impossible.

The next phrase in the Chubby paper is: "(Practitioners should normally beware of protocols based on models that make stronger assumptions on the environment.)" This seems a bit rich! With respect to timing models, you can choose between synchronous, asynchronous or partially synchronous, then you can choose the fault model, for example Byzantine or non-Byzantine. As we will see Chubby system actually does make stronger assumptions about the timing model, it is actually depending on the system being partially synchronous.

The next line in the Chubby paper is: "Asynchronous consensus is solved by the Paxos protocol [12, 13].". The wording is a bit loose here, there are various versions of the consensus problem with different fault models and requirements, for example Uniform Consensus or even the Byzantine Generals, however most people would take this statement to mean the consensus problem as defined by FLP. But FLP showed consensus in an asynchronous system is impossible with only one faulty process! (Consensus is straight forward without faults, so is not really considered a problem).

Now skipping to the last sentence from the Chubby extract: "Paxos maintains safety without timing assumptions, but clocks must be introduced to ensure liveness; this overcomes the impossibility result of Fischer et al. [5]." The logic of the last part implies that Fischer et al. are wrong, or else Google have achieved the impossible! However if we wind back a bit, they introduce clocks and once you do that the system is no longer completely asynchronous. They don't overcome the impossibility result of Fischer et al., they are using a different model which isn't completely asynchronous.

So why all the confusion? Lamport clarifies it as follows: "Asynchronous consensus algorithms like Paxos maintain safety despite asynchrony, but are guaranteed to make progress only when the system becomes synchronous - meaning that messages are delivered in a bounded length of time." Paxos is an asynchronous algorithm because it doesn't use time, but the timing of events in the system must conspire for it to terminate.

Butler Lampson summarized the timing requirements nicely in "How to build a highly available system using consensus.":

"It [Paxos] terminates if there is a single leader for a long enough time during which the leader can talk to a majority of the agent processes twice. It may not terminate if there are always too many leaders (fortunate, since we know that guaranteed termination is impossible ).". The last part is FLP.

There is a pathological case in Paxos were two leaders compete to get a value chosen and you effectively end up with livelock, however it is not difficult in practice to avoid this. Lampson describes how to use a sloppy leadership election to try and make sure there is only one leader, it doesn't matter if the sloppy leadership election screws up because we know Paxos is safe with multiple leaders. The pathological case corresponds to the FLP result.

"Consensus in the Presence of Partial Synchrony" provides details on the conditions that are required in order to reach consensus.

I have only a practical interest in Paxos in order to solve the problems addressed by HARC. I am not sure I fully understand the proofs behind it, or FLP, but the main ideas behind it are important to get right. Which is why I am so disappointed by the Chubby paper, more people will read it than FLP, or the Paxos papers, and it will only cause confusion. As if the "The Part-Time Parliament" doesn't do enough of that!

Wednesday, February 07, 2007

Perfect Code

Jim Gray's disappearance has lead me to reflect on the things he has written that have stayed with me. One them is perfect code. In
Transaction Processing: Concepts and Techniques Jim talks about perfect code; code without bugs. He even gives an example, a simple function of a few lines that adds two numbers together. In fact it took him a couple of attempts to get it right! Jim makes the point that perfect code is possible, just very expensive. He goes on to talk about the cost of code in terms of dollars per line. NASA pay the most, but their code still has bugs! The astronauts board the shuttle with a bug list, an example of which was not to use the two keyboards on the shuttle at the same time because the inputs would be OR'd! Perfect code is possible, just really expensive.

Most people are surprised when I say that perfect code is possible, they just assume that all code contains bugs. I was reminded of this while following the Nearly All Binary Searches and Mergesorts are Broken thread on binary search implementation bugs. At the time I hit the library to check Knuth's version of binary search; he didn't let me down. Someone who invents their own ISA to express their algorithms isn't going to get caught by an overflow.

I am sure the attitude that all code contains bugs must have a detrimental effect on programmers, it has on me. It is also re-enforced by the open source mantra "release early, release often". Testing doesn't solve all the problems either: "Testing can only prove the existence of bugs", Dijkstra.

I recently re-read the History of System R. The name Franco Putzolu cropped up, Jim mentioned him in the acknowledgements of "Transaction Processing: Concepts and Techniques" as someone who had made huge contributions to the field of transactions and databases but who never got the public recognition as his name was not on a lot of the famous papers.
From the System R history (Copyright attached):

Mike Blasgen: The RSS didn't have any bugs. [laughter] No, it's true, the reason is because much of the RSS was written by Franco. No, it's really true; Franco never wrote a bug. Except for one,
right, Bruce? Did you find one?

Bruce Lindsay: One.

Mike Blasgen: He wrote about half of RSS, and I think we found one bug. And that was after nine years.


RSS was the storage part of System R and used B-Trees. It was 35,000 lines of code to solve some pretty hairy problems, and had only one bug! Remember too that the tool support back then must have been terrible compared to today. Pretty perfect then.
 


Copyright (c) 1995, 1997 by Paul McJones, Roger Bamford, Mike Blasgen, Don Chamberlin, Josephine Cheng, Jean-Jacques Daudenarde, Shel Finkelstein, Jim Gray, Bob Jolls, Bruce Lindsay, Raymond Lorie, Jim Mehl, Roger Miller, C. Mohan, John Nauman, Mike Pong, Tom Price, Franco Putzolu, Mario Schkolnick, Bob Selinger, Pat Selinger, Don Slutz, Irv Traiger, Brad Wade, and Bob Yost. You may copy this document in whole or in part without payment of fee provided that you acknowledge the authors and include this notice.