Friday, January 19, 2007

Idempotent Messages

There has been quite a bit of internal discussion of Pat Helland's paper at Mancs. Although Pat sees the usefulness of idempotent messages, I wonder if he has considered the fact that though a message may be idempotent, a sequence of idempotent messages may not be idempotent. Lamport thought about this in his paper on Generalized Consensus and Paxos, he also considered if messages could commute before bundling them. A HTTP GET will commute with another HTTP GET, but I am not sure about the other methods.

Below is a fragment of the discuss we are having on idempotent and at-least-once messaging...

Say a state machine has a possible sequence of state changes
A->B->C and once it reaches C it cannot change state, and A is the
inital state. The state machine accepts two types of message
<a->B and <b->C>. If the state machine is in state B, it can
ignore messages <a->B> because it inherently knows it has processed such a message before. Therefore it doesn't need to store message IDs to achieve at-least-once delivery.

In the above system the two messages are inherently idempotent,
as designed. Out of order delivery is interesting, if the
sequence <(A->B),(B->C)> is delivered out of order then both
messages will fail to cause a state change. Nothing bad happens?

If we replaced the two messages with a single <ChangeState> message, then it is not idempotent, and we would need to include message IDs
to achieve at-least-once delivery. Out of order delivery MIGHT now
cause unwanted side effects, depending on the system and what clients expect.

The second approach to messaging appears simpler because there is
only one message no matter how many states the state machine has,
the first case has N-1 messages were N is the number of states.

Clearly message design has a big impact on the behaviour of the system!!!

For an application to "know" that it has processed a message, it
must be in a state that it can only have reached by processing that
message. There seems to be two approaches to achieve this: design your system to use idempotent messages, or design it so that progress
from one state to another makes messages idempotent. An interesting question is whether these two approaches are equivalent? If you
used both approaches would you come up with the same design.

The messages <a->B> and <b->C> map to PUT(B) and PUT(C) in HTTP, while <ChangeState> maps to POST(ChangeState).

In a programming language the operation might be implemented as
stateMachine++. stateMachine++ is interesting because it looks like ,
but the underlying system (CPU and memory) views it differently:


retrieve value stored at address "stateMachine",
increment value,
put new value back to address "stateMachine"


However latency is zero, or at least is masked to appear as zero.
Partial failures are not possible. In a multi-thread environment
locks would have to be introduced, but without partial failures this
isn't a problem (a processor holding a lock cannot die without the
rest of the system noticing, or failing).

Asynchronous?


As I do not have a CS background I have always been confused
when people used the term asynchronous. This entry has been
prompted by Dave Orchard's description of  SOA

HTTP is a network protocol. HTTP is asynchronous. It uses a
Request-Response pattern, a client sends request and the server
sends a response. However there is nothing in the protocol about
how long the response will take, therefore it is an asynchronous
protocol. TCP is a synchronous protocol, there are various timeouts
specified in the standard and, agents are supposed to react in
specified ways to these timeouts.

However most HTTP libraries use synchronous function calls: a client
makes a call to a function in the HTTP library, the library sends a
request to a HTTP server and returns the response message to the
client using the return mechanism of the function. Most HTTP
libraries allow the client to pass a timeout value to the library,
if the server does not respond within the timeout then the library
reports an error to the client. The timeout is specified by the client,
not by the HTTP standard.

Some HTTP libraries support asynchronous function calls: for example
the client makes a function call to the HTTP library, the call does
not block but returns immediately, the client can do some other processing
before making another function call to pick up the HTTP response.
LWP::Parallel is an example of such a library. There are also terms like
blocking/non-blocking etc which can be used to describe function calls.

So there are two concepts of asynchronous/synchronous at play here: one
that is used by people like Lamport and Lynch to describe distributed systems
and one used by people to talk about programming models. As I don't have a
background in CS this confused me for a long time, I was never sure how people
were using the term. The issue is further complicated by the fact that if you
are building an asynchronous system, per Lamport and Lynch, then using an
asynchronous programming model is more suited to the task.

So reading Dave Orchard's blog entry on SOA I am wondering what he means
by advocating "asynchronous". He is describing an approach to
distributed computing called SOA, so I guess he must be talking about
the Lynch/Lamport definition of asynchronous. This has consequences,
namely that you won't be able to do very much.

In an asynchronous system it is impossible to reach consensus with just one
faulty processor, even with a perfect network. Consensus is the problem of
getting a set of processors to agree a value. Consider the case of submitting
a purchase order in an asynchronous system, you send the request but the
request can take an infinite time to reach the server, the server can take
an infinite time to process the message and the response can take an infinite
time to return. How can you tell if the server failed? On the Web this is
equivalent to hitting the submit button and nothing happening - what do you
do? Wait a bit longer, wait for an e-mail, re-try the submit button,
ring/e-mail the server administrator etc, some internal clock triggers an action.
In an asynchronous system you just wait, if you include timeouts then it is
no longer asynchronous.

There are three timing models in distributed systems: synchronous,
partially-synchronous, and asynchronous. For definitions see

Consensus in the Presence of Partial Synchrony
:



  • synchronous: In a synchronous system, there is a known fixed upper
    bound on the time required for a message to be sent from one processor
    to another and a known fixed upper bound on the relative speeds of
    different processors.


  • asynchronous: In an asynchronous system no fixed upper bounds exist
    on the time required for a message to be sent from one processor
    to another, nor on the relative speeds of different processors.


  • partially-synchronous: Fixed bounds are known to exist but are not
    know a priori, or in another version the fixed bounds are known
    but are only guaranteed to hold starting at some unknown time.


Designing algorithms for asynchronous systems is attractive because
they will work in any system, synchronous or partially-synchronous.
So using an asynchronous model for the internet makes sense, but you
are restricted in what you can do. It is interestingly to note that
some Web Service specifications are synchronous. For example WS-Security
uses time ranges to define how long a message is valid for,
messages that arrive outside this range are not processed. This causes
problems when systems that have a large clock skew, I have seen many weird
bugs arise due to this including messages arriving before they were sent!

So choosing an asynchronous timing model for your distributed system
may not be so attractive after all. Perhaps Dave meant an asynchronous
programming model? At first this doesn't make sense, since he is talking
about an architecture for a distributed system, how you write code
for such a system should be an orthogonal issue. However looking through
the discussions of SOA I can see a pattern.

Remote Procedure Call, RPC, as introduced in IETF RFC 707, introduced an
abstraction which allowed programmers to think that invoking a remote
service was the same as invoking a local function call. The abstraction
was extended by the concept of distributed objects, the programmer could
think of a remote object as being just like a local object. The idea
behind this thinking is that programmers are familiar with procedure
calls and objects, and so RPC and distributed objects will be an easy
path into distributed computing for a programmer already familiar with
procedure calls and objects. However this abstraction was shown to be
a bad one, perhaps most famously by Waldo et al. By pretending remote
things are just like local things, the programmer is lead to ignore
latency and partial failures. (Though it is hard to imagine people like
Jim Gray and Butler Lampson would fall into this trap.)

There has been a lot of debate in the past with issues like SOA != RPC
and WS != Distributed Objects. However I think there are two different
issues, one is the architecture of a distributed system, the other is
the programming model. I can take a good architecture, for example REST,
and use a bad programming model, for example RPC, to build a client
library.

A Web services toolkit can attempt to hide the complexity associated
with distributed systems, and it may initially be easy and familiar to use,
but ultimately it will betray the programmer by misleading him just like RPC.
Or you can design a better programming model, which does not mislead the
programmer but would probably be more a difficult programming environment.
A naive programmer might think the first is better because it is easier
to use. There is a balance to be struck.

An asynchronous programming style forces the programmer to
think about the system more carefully as a distributed system,
perhaps this is why Dave is advocating asynchronous. However to
me, someone who uses raw sockets a lot, it muddies the discussion
on architectures of distributed systems.

REST doesn't prescribe a programming model it is focused purely on
the architecture of the system, you can write bad HTTP clients and
services, or you can write good ones like LWP. SOA/WS-* has had its feet
stuck in the debate about programming models, which has caused
me confusion.

Monday, January 08, 2007

Pat Helland Discovers REST - the hard way!

Pat Helland has an interesting paper on scalability and transactions called: Life beyond Distributed Transactions: an Apostate’s Opinion. First, it raises questions about our use of Paxos Commit to solve the co-allocation problem in HARC. However, we don't expect co-allocation to ever involve more than a few resources and we do not expect a heavy demand on the service, also we are quite loose about serializability.

The more interesting aspect of the paper is the similarity of the ideas in it and the concepts in REST. The paper discusses "infinite scalability", yet never mentions the Web, which must be the best example of infinite scalability in a distributed application (Fielding uses the term anarchic scalability which I prefer as it sounds even more daunting). I guess this reflects Pat's background as a transaction guru; he is interested in scaling out the systems he knows.

The paper introduces the concept of "entities" which are identified by keys and to which messages are sent. This maps to resources and URIs, the issues of primary keys and secondary keys wrt to scalability can be handled more elegantly and transparently using DNS trickery and re-directs. The paper also talks about the usefulness of the concept of recognising that some messages are idempotent, for example reads - hello GET and PUT! He misses the usefulness of caching though, I am sure he might like it.

Also the seperation of the application into two layers, with only the lower layer needing to be scale aware maps to the Web. The developer writes the scale agnostic layer that interfaces with the scale aware layer through the "Scale Agnostic Programming Abstraction".
This means he does not have to worry about scaling issues and can leave them to the developer of the scale aware layer. On the Web, a Web developer can use a Yahoo API (the Scale Agnostic Programming Abstraction, which is of course RESTful), to write applciations which have no idea about the scale of Yahoo or how the internals of Yahoo works.