Idempotent Messages
There has been quite a bit of internal discussion of Pat Helland's paper at Mancs. Although Pat sees the usefulness of idempotent messages, I wonder if he has considered the fact that though a message may be idempotent, a sequence of idempotent messages may not be idempotent. Lamport thought about this in his paper on Generalized Consensus and Paxos, he also considered if messages could commute before bundling them. A HTTP GET will commute with another HTTP GET, but I am not sure about the other methods.
Below is a fragment of the discuss we are having on idempotent and at-least-once messaging...
Say a state machine has a possible sequence of state changes
A->B->C and once it reaches C it cannot change state, and A is the
inital state. The state machine accepts two types of message
<a->B and <b->C>. If the state machine is in state B, it can
ignore messages <a->B> because it inherently knows it has processed such a message before. Therefore it doesn't need to store message IDs to achieve at-least-once delivery.
In the above system the two messages are inherently idempotent,
as designed. Out of order delivery is interesting, if the
sequence <(A->B),(B->C)> is delivered out of order then both
messages will fail to cause a state change. Nothing bad happens?
If we replaced the two messages with a single <ChangeState> message, then it is not idempotent, and we would need to include message IDs
to achieve at-least-once delivery. Out of order delivery MIGHT now
cause unwanted side effects, depending on the system and what clients expect.
The second approach to messaging appears simpler because there is
only one message no matter how many states the state machine has,
the first case has N-1 messages were N is the number of states.
Clearly message design has a big impact on the behaviour of the system!!!
For an application to "know" that it has processed a message, it
must be in a state that it can only have reached by processing that
message. There seems to be two approaches to achieve this: design your system to use idempotent messages, or design it so that progress
from one state to another makes messages idempotent. An interesting question is whether these two approaches are equivalent? If you
used both approaches would you come up with the same design.
The messages <a->B> and <b->C> map to PUT(B) and PUT(C) in HTTP, while <ChangeState> maps to POST(ChangeState).
In a programming language the operation might be implemented as
stateMachine++. stateMachine++ is interesting because it looks like,
but the underlying system (CPU and memory) views it differently:
retrieve value stored at address "stateMachine",
increment value,
put new value back to address "stateMachine"
However latency is zero, or at least is masked to appear as zero.
Partial failures are not possible. In a multi-thread environment
locks would have to be introduced, but without partial failures this
isn't a problem (a processor holding a lock cannot die without the
rest of the system noticing, or failing).
Below is a fragment of the discuss we are having on idempotent and at-least-once messaging...
Say a state machine has a possible sequence of state changes
A->B->C and once it reaches C it cannot change state, and A is the
inital state. The state machine accepts two types of message
<a->B and <b->C>. If the state machine is in state B, it can
ignore messages <a->B> because it inherently knows it has processed such a message before. Therefore it doesn't need to store message IDs to achieve at-least-once delivery.
In the above system the two messages are inherently idempotent,
as designed. Out of order delivery is interesting, if the
sequence <(A->B),(B->C)> is delivered out of order then both
messages will fail to cause a state change. Nothing bad happens?
If we replaced the two messages with a single <ChangeState> message, then it is not idempotent, and we would need to include message IDs
to achieve at-least-once delivery. Out of order delivery MIGHT now
cause unwanted side effects, depending on the system and what clients expect.
The second approach to messaging appears simpler because there is
only one message no matter how many states the state machine has,
the first case has N-1 messages were N is the number of states.
Clearly message design has a big impact on the behaviour of the system!!!
For an application to "know" that it has processed a message, it
must be in a state that it can only have reached by processing that
message. There seems to be two approaches to achieve this: design your system to use idempotent messages, or design it so that progress
from one state to another makes messages idempotent. An interesting question is whether these two approaches are equivalent? If you
used both approaches would you come up with the same design.
The messages <a->B> and <b->C> map to PUT(B) and PUT(C) in HTTP, while <ChangeState> maps to POST(ChangeState).
In a programming language the operation might be implemented as
stateMachine++. stateMachine++ is interesting because it looks like
but the underlying system (CPU and memory) views it differently:
retrieve value stored at address "stateMachine",
increment value,
put new value back to address "stateMachine"
However latency is zero, or at least is masked to appear as zero.
Partial failures are not possible. In a multi-thread environment
locks would have to be introduced, but without partial failures this
isn't a problem (a processor holding a lock cannot die without the
rest of the system noticing, or failing).