<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-37470176</id><updated>2011-11-22T07:51:37.139-08:00</updated><title type='text'>Beta Thoughts</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>32</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-37470176.post-4919726269748387237</id><published>2010-03-04T02:45:00.000-08:00</published><updated>2010-03-04T02:58:11.744-08:00</updated><title type='text'>SIGPIPE</title><content type='html'>I was changing a &lt;a href="http://linux.die.net/man/2/send"&gt;send&lt;/a&gt; call to a &lt;a href="http://linux.die.net/man/2/writev"&gt;writev&lt;/a&gt; and ran into an annoying problem. On the send call I had set the MSG_NOSIGNAL flag to stop any SIGPIPE signals if the socket I was writing to was closed, unfortunately I cannot set the same flag on writev! (I had originally switched from using write to send to be able to set the flag) On some systems (BSD) you can set the socket option SO_NOSIGPIPE to get a similar effect, but on Linux it doesn't look like this option is supported. The other option is to set the system to ignore a SIGPIPE, but there are issues with this if you do it naively, see &lt;a href="http://krokisplace.blogspot.com/2010/02/suppressing-sigpipe-in-library.html"&gt;Suppressing SIGPIPE in a library&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-4919726269748387237?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/4919726269748387237/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=4919726269748387237' title='44 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/4919726269748387237'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/4919726269748387237'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2010/03/sigpipe.html' title='SIGPIPE'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>44</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-6199139409987163191</id><published>2010-03-03T04:00:00.000-08:00</published><updated>2010-03-03T04:07:18.073-08:00</updated><title type='text'>iphones and people</title><content type='html'>I have noticed the number of iphones increasing among the people I know. It is particularly interesting is that a lot of people who are non-geeks now have iphones, this is because the iphone is now available on cheaper mobile plans; the geeks were prepared to pay the premium to get the iphone early. I know of one person who now reads their e-mail because they have an iphone, previously they were too busy to sit down at a computer. This means that the iphone is actually increasing the number of people who are actively using the internet and the WWW!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-6199139409987163191?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/6199139409987163191/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=6199139409987163191' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/6199139409987163191'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/6199139409987163191'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2010/03/iphones-and-people.html' title='iphones and people'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-3041486269599760063</id><published>2010-02-24T02:52:00.000-08:00</published><updated>2010-02-24T03:05:09.546-08:00</updated><title type='text'>ExtraHop</title><content type='html'>Some guys I worked with at &lt;a href="http://www.f5.com/"&gt;F5 Networks&lt;/a&gt; created a start up called &lt;a href="http://www.f5.com/"&gt;ExtraHop&lt;/a&gt;. It was clear that whatever these guys did it was going to be impressive and run very fast - they did not disappoint. They built an appliance to monitor your network: hang it of your network, spend 15 minutes configuring it and there you go, it will learn all about your network and tell you when things are not running smoothly and why. It works at layer 7, so it knows when a database is not running properly, or when a CIFS server is mis-behaving, or when a HTTP server tps has dropped. The really amazing thing - it runs at 10GbE! Years talking to &lt;a href="http://www.f5.com/products/big-ip/"&gt;BIG-IP&lt;/a&gt; customers was not wasted. They also have a cool &lt;a href="https://www.networktimeout.com/"&gt;service&lt;/a&gt; where you can upload you pcaps and their magic software will analysis it for you!!&lt;br /&gt;&lt;br /&gt;There are also some &lt;a href="https://www.networktimeout.com/"&gt;videos&lt;/a&gt; of their stuff in action.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-3041486269599760063?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/3041486269599760063/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=3041486269599760063' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/3041486269599760063'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/3041486269599760063'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2010/02/extrahop.html' title='ExtraHop'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-476545747271048385</id><published>2010-02-23T02:47:00.000-08:00</published><updated>2010-02-23T03:27:57.110-08:00</updated><title type='text'>A History of malloc</title><content type='html'>I was reading up on &lt;a href="http://www.opengroup.org/onlinepubs/009695399/functions/malloc.html"&gt;malloc&lt;/a&gt;. I had naively assumed malloc was a system call, it's not. Under the covers malloc uses &lt;a href="http://linux.about.com/library/cmd/blcmdl2_brk.htm"&gt;brk or sbrk&lt;/a&gt; to request memory from the kernel. However, malloc does not always have to use brk/sbrk, if it has memory in its "free list" that can fulfill the request then no system call is needed.  So you do not always have to pay the price of a system call when you use malloc.&lt;br /&gt;&lt;br /&gt;Another interesting thing about malloc is that it cannot return memory to the kernel unless the memory that is freed is at the top of the heap. If you malloc some memory at the start of your program, malloc some more later and then free the original memory, malloc/free cannot return the original chunk of memory to the kernel until the second piece of memory is freed.&lt;br /&gt;&lt;br /&gt;The first malloc was written in "The Old Testament" - &lt;a href="http://en.wikipedia.org/wiki/The_C_Programming_Language_%28book%29"&gt;K&amp;amp;R&lt;/a&gt;, it's about 200 lines of code.  They managed the free list by using a union with the memory that was actually stored in the free list - this saved space, an important requirement when the amount of memory available was very limited.&lt;br /&gt;&lt;br /&gt;Poul-Henning Kamp re-wrote malloc for FreeBSB 2.2 and documented it in &lt;a href="http://www.google.co.uk/url?sa=t&amp;amp;source=web&amp;amp;ct=res&amp;amp;cd=1&amp;amp;ved=0CAkQFjAA&amp;amp;url=http%3A%2F%2Fwww.freebsd.dk%2Fpubs%2Fmalloc.pdf&amp;amp;ei=9rWDS4r1CJD-mQOrkMWSAg&amp;amp;usg=AFQjCNGd9Hy2Tg5VtEJOJj6lJASz6ednUQ&amp;amp;sig2=jCfgta9Dj448MuNHTNLj1g"&gt;Malloc(3) Revisited&lt;/a&gt;, this malloc is known as pkmalloc. By this time systems where using virtual memory, this meant that in the K&amp;amp;R approach  a chunk of memory on the free list could be paged out to disk, now the free list was embedded in these chunks of memory so when malloc came to look for memory on the free list it would have to page all this memory back in, killing performance!! Kamp's version of malloc was 1136 lines of code long and had a good reputation for performance.&lt;br /&gt;&lt;br /&gt;Then came fast multi-processor machines with large memory, and another re-write of malloc. Jason Evans re-wrote malloc for FreeBSB and his version is known as jemalloc, he wrote about it in &lt;a href="http://www.google.co.uk/url?sa=t&amp;amp;source=web&amp;amp;ct=res&amp;amp;cd=1&amp;amp;ved=0CAYQFjAA&amp;amp;url=http%3A%2F%2Fpeople.freebsd.org%2F%7Ejasone%2Fjemalloc%2Fbsdcan2006%2Fjemalloc.pdf&amp;amp;ei=7LeDS7jSH5bqmwP09PScAg&amp;amp;usg=AFQjCNHNlXYH4AKW4abWzTY9XrU1fEet7Q&amp;amp;sig2=5sAvaAW96xlGl0B5qGEnYg"&gt;A Scalable Concurrent Malloc(3) Implementation for FreeBSD&lt;/a&gt;. Now the issues are less about paging to disk but fast locks and worrying about NUMA issues (trying to allocate memory close to the CPU that you think will be using it). Firefox are attempting to use jemalloc internally for their memory management.&lt;br /&gt;&lt;br /&gt;There are quite a few malloc implementations out there, Google have one called tcmalloc in their &lt;a href="http://code.google.com/p/google-perftools/"&gt;perftools&lt;/a&gt; bundle. It's very easy to swap the malloc your code uses, all you have to do is link against the library with the new malloc. Though this can sometimes lead to &lt;a href="http://jpipes.com/index.php?/archives/296-Drizzle-Performance-Regression-Solved-TCMalloc-vs.-No-TCMalloc.html"&gt;trouble&lt;/a&gt;&lt;a&gt; :-)&lt;br /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-476545747271048385?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/476545747271048385/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=476545747271048385' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/476545747271048385'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/476545747271048385'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2010/02/history-of-malloc.html' title='A History of malloc'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-7788578300847953550</id><published>2010-02-19T03:14:00.000-08:00</published><updated>2010-02-19T03:19:46.744-08:00</updated><title type='text'>Ubuntu + D-Link + linkedin == Trouble</title><content type='html'>Hit this &lt;a href="https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/314713"&gt;bug&lt;/a&gt; yesterday when trying to access Linkedin, weird nearly all other web sites I visit don't exhibit this problem! Went with the set "MTU to 1360" hack. If I have time I will look into this some more - network bugs can be very weird.....&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-7788578300847953550?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/7788578300847953550/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=7788578300847953550' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/7788578300847953550'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/7788578300847953550'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2010/02/ubuntu-d-link-linkedin-trouble.html' title='Ubuntu + D-Link + linkedin == Trouble'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-2332990692726863122</id><published>2010-02-18T03:36:00.000-08:00</published><updated>2010-02-18T03:48:37.649-08:00</updated><title type='text'>Post Modern Programming</title><content type='html'>&lt;a href="http://www.varnish-cache.org/"&gt;Varnish&lt;/a&gt; is a reverse Web proxy cache. What makes it interesting is how it was &lt;a href="http://varnish-cache.org/wiki/ArchitectNotes"&gt;designed&lt;/a&gt;. The argument is that people are programming like it's 1975, treating RAM and disk as two separate memory pools, instead Varnish views them as a single memory pool with the RAM acting just like a cache. It does this by mmap'ing a large file, the threads just read and write to this memory happily unaware that it is being backed to disk. This has the advantages of reduced complexity, no requirement to manage a RAM and disk cache, and also a lot less system calls (no read/write to disk). It also means that they have to use lots of threads because any memory operation could cause a thread to block because of a page fault - so they have a thread per connection model. This is contrary to how a lot of people develop these kinds of applications, they have an event driven system with only a couple of threads, or just one thread,  and they focus on making sure that thread never blocks.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-2332990692726863122?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/2332990692726863122/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=2332990692726863122' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/2332990692726863122'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/2332990692726863122'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2010/02/post-modern-programming.html' title='Post Modern Programming'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-1657212861788947290</id><published>2010-02-17T04:23:00.000-08:00</published><updated>2010-02-17T04:29:22.120-08:00</updated><title type='text'>Am I a bad Web Citizen?</title><content type='html'>I noticed in my previous post I did not include any links. There were many things in the post I could have referenced, but I did not, because I was lazy. I am relying on the extra level of indirection that Google provides - if someone is interested in something then they can Google it. This is wrong for two reasons.&lt;br /&gt;&lt;br /&gt;First, I can include links that back up my case. Google may turn up links that weaken my case and reduce my credibility.&lt;br /&gt;&lt;br /&gt;Secondly, the Web depends on links and they are of fundamental importance to the Web.&lt;br /&gt;&lt;br /&gt;Mea Culpa&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-1657212861788947290?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/1657212861788947290/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=1657212861788947290' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/1657212861788947290'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/1657212861788947290'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2010/02/am-i-bad-web-citizen.html' title='Am I a bad Web Citizen?'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-263683166898563572</id><published>2010-02-17T03:15:00.000-08:00</published><updated>2010-02-17T04:21:46.117-08:00</updated><title type='text'>What does it mean?</title><content type='html'>I was reading Warren Buffet's entry on wikipedia the other day and came across this phrase: "&lt;span style="font-style: italic;"&gt;Price is what you pay, value is what you get&lt;/span&gt;".&lt;br /&gt;&lt;br /&gt;I don't understand the phrase, or rather I do understand the phrase, but don't get the message it is trying to convey. I haven't Google'd the phrase but left it to stew in my brain to see if I could come up with the significance of the phrase, unfortunately it is still stewing.&lt;br /&gt;&lt;br /&gt;The phrase is a tautology, it states a definition of two words, price and value, that is in no way controversial. So why does it make Mr Buffet's Wikipedia entry?  &lt;br /&gt;&lt;br /&gt;I really like phrases like this. They can convey so much information with so few words. I would not be surprised to find a book with the title "&lt;span style="font-style: italic;"&gt;Price is what you pay, value is what you get&lt;/span&gt;", or that its an essay question on some Economics degree course. You can use phrases like these as names for ideas,  names that are self explanatory.&lt;br /&gt;&lt;br /&gt;An equivalent phrase from the software development world is "&lt;span style="font-style: italic;"&gt;Premature development is the root of all evil&lt;/span&gt;" - you just have to quote this and everyone understands and rolls their eyes. (As an aside people often attribute this to Knuth, but it originally came from Tony Hoare and Knuth quoted him in "Structured Programming using GOTO statements", Knuth's repost to Dijstra's "GOTO considered harmful").&lt;br /&gt;&lt;br /&gt;An interesting difference between the two phrases is that one is (nearly) a tautology and the other is obviously untrue - premature optimisation did not throw up Hitler. The fact that it is a tautology I think adds value, it almost says the message I carry is also a tautology - which it probably isn't.&lt;br /&gt;&lt;br /&gt;I look forward to the day in some meeting I can say "&lt;span style="font-style: italic;"&gt;Price is what you pay, value is what you get&lt;/span&gt;", hopefully no one will say WTF do you mean?&lt;br /&gt;&lt;br /&gt;A personal favorite phrase of mine is "&lt;span style="font-style: italic;"&gt;The talent is in the choices&lt;/span&gt;", I have used it in a couple of talks I have given and I am pretty sure that is all people remember from the talks; that they remember anything from one of my talks I see as a success. I have never been able to find out who first said this, but I know Robert De Niro used it. The choices an actor makes reflect his talent, talented actors make good choices - apparently he agonized for three months as to whether or not to have a moustache in Godfather part II. A software engineer is often faced with choices about how to design or implement something, good engineer's make good choices and clearly experience plays a part in this. And sometimes an obvious choice is not the correct one, and good engineers will recognize this. This is also why Agile development is so successful, you make a bunch of choices, they turn out to be bad, you refactor.&lt;br /&gt;&lt;br /&gt;Phrases like these can also be used in a kind of harmful way. I remember at one meeting when we were talking about using a new technology that was rapidly evolving at the time (Web Services :-() and someone said we are building on sand, this was quickly and cleverly countered with "&lt;span style="font-style: italic;"&gt;Yes, but its a better quality sand&lt;/span&gt;". By the time people had processed this statement and released it was complete crap the debate had moved on.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-263683166898563572?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/263683166898563572/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=263683166898563572' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/263683166898563572'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/263683166898563572'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2010/02/what-does-it-mean.html' title='What does it mean?'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-4188776420268450409</id><published>2007-06-13T14:42:00.000-07:00</published><updated>2007-06-13T15:17:23.119-07:00</updated><title type='text'>A brief history of Consensus, 2PC and Transaction Commit.</title><content type='html'>This is a potted history of consensus, transactions and 2PC. Reading the literature on consensus is difficult because the language changes (consensus was originally called agreement), the results come in an order that isn't logical, and the whole framework for describing distributed algorithms evolved in parallel with the work. Also, there are few books other than Lynch's Distributed Algorithms that cover the subject.&lt;br /&gt;&lt;br /&gt;Papers are discussed in the order that makes most sense, not in the order they were published.&lt;br /&gt;&lt;br /&gt;The first instance of the consensus problem that I am aware of is in Lamport's &lt;a href="http://research.microsoft.com/users/lamport/pubs/time-clocks.pdf"&gt;"Time, Clocks and the Ordering of Events in a Distributed System" (1978)&lt;/a&gt;, though it is not explicitly declared as a consensus or agreement problem. In this paper Lamport discusses how messages take a finite time to travel between processors and draws an analogy with Einstein's special relativity. Discussing Einstein's theory with respect to distributed systems is popular recently in the blogsphere, but in 1978 Lamport give a complete analysis with space-time diagrams and all. The issue is that in a distributed system you cannot tell if event A happened before event B, unless A caused B in some way. Each observer can see events happen in a different order, except for events that cause each other, ie there is only a partial ordering of events in a distributed system. Lamport defines the "happens before" relationship and operator, and goes on to give an algorithm that provides a total ordering of events in a distributed system, so that each process sees events in the same order as every other process.&lt;br /&gt;&lt;br /&gt;Lamport also introduces the concept of a distributed state machine: start a set of deterministic state machines in the same state and then make sure they process the same messages in the same order. Each machine is now a replica of the others. The key problem is making each replica agree what is the next message to process: a consensus problem. This is what the algorithm for creating a total ordering of events does, it provides an agreed ordering for the delivery of messages. However, the system is not fault tolerant; if one process fails that others have to wait for it to recover.&lt;br /&gt;&lt;br /&gt;Around the same time as this paper, Gray described 2PC in &lt;a href="http://research.microsoft.com/%7EGray/papers/DBOS.pdf"&gt;"Notes on Database Operating Systems" (1979)&lt;/a&gt;. Unfortunately 2PC would block if the TM (Transaction Manager) fails at the wrong time. Skeen showed in &lt;a href="http://www.cs.cornell.edu/courses/cs614/2004sp/papers/Ske81.pdf"&gt;"NonBlocking Commit Protocols" (1981)&lt;/a&gt;that for a distributed transactions you needed a 3 phrase commit algorithm to avoid the blocking problems associated with 2PC. The problem was coming up with a nice 3PC algorithm, this would only take nearly 25 years!&lt;br /&gt;&lt;br /&gt;Fischer, Lynch and Paterson showed that distributed consensus was impossible in an asynchronous system with just one faulty process in &lt;a href="http://theory.lcs.mit.edu/tds/papers/Lynch/jacm85.pdf"&gt;"Impossibility of distributed consensus with one faulty process" (1985)&lt;/a&gt;, this famous result is known as the "FLP" result. By this time "consensus" was the name given to the problem of getting a bunch of processors to agree a value. In an asynchronous system (where processors run at arbitrary speeds and messages can take an arbitrarily long time to travel between processors) with a perfect network (all messages are delivered, messages arrive in order and can not be duplicated) distributed consensus is impossible with just one faulty process (even just a fail-stop). The kernel of the problem is that you cannot tell the difference between a process that has stopped and one that is running very slowly, making dealing with faults in an asynchronous system almost impossible. The paper was also important because it demonstrated how to show something was impossible: show that all algorithms that solve the problem must have some property, then show that this property is impossible, ie proof by contradiction. (This approach was only re-learned as Turing used it in the halting problem)&lt;br /&gt;&lt;br /&gt;By this stage people realized that a distributed algorithm has two properties: safety and liveness. Safety means nothing bad happens, while liveness means that something good eventually happens. 2PC is an asynchronous consensus algorithm, all processes must agree on either commit or abort for a transaction. 2PC is safe: no bad data is ever written to the databases, but its liveness properties aren't great: if the TM fails at the wrong point the system will block.&lt;br /&gt;&lt;br /&gt;Also by this stage people thought of distributed systems as being synchronous (processes run at known rates, and messages are delivered in known bounds of time) or asynchronous (processes run at unknown and arbitrary rates, and messages can take unbounded time to be delivered). The asynchronous case is more general than the synchronous case: an algorithm that works for an asynchronous system will also work for a synchronous system, but not vice versa. You can treat a synchronous system as a special case of an asynchronous system that just happens to have bounds on the time it takes to deliver a message.&lt;br /&gt;&lt;br /&gt;Before FLP, there was the &lt;a href="http://research.microsoft.com/users/lamport/pubs/byz.pdf"&gt;"The Byzantine Generals Problem" (1982)&lt;/a&gt; paper. In this form of the consensus problem the processes can lie, and they can actively try to deceive other processes. This problem looks harder than the FLP result, but it does have a solution for the synchronous case (though when the Byzantine Generals paper was written the distinction between asynchronous and synchronous systems was not explicit). The solution is expensive in the number of messages exchanged, and the number of rounds of messages required. The problem originally came from the aerospace industry: what would happen if sensors gave false information on an plane (clearly the system could be treated as synchronous).&lt;br /&gt;&lt;br /&gt;In 1986 there was a get together of the distributed systems people who were interested in consensus and the transaction people. At the time the best consensus algorithm was the Byzantine Generals, but this was too expensive to use for transactions. Jim Gray wrote up a note on the meeting: &lt;a href="http://research.microsoft.com/%7EGray/papers/TandemTR88.6_ComparisonOfByzantineAgreementAndTwoPhaseCommit.pdf"&gt;"A Comparison of the Byzantine Agreement Problem and the Transaction Commit Problem." (1987) &lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The paper contains this in the introduction :-)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;"Prior to the conference, it was widely believed that the transaction commit problem faced by distributed systems is a degenerate form of the Byzantine Generals Problem studied by academe. Perhaps the most useful consequence of the conference was to show that these two problems have little in common."&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Eventually distributed transactions would be seen as a version of consensus, called uniform consensus (see &lt;a href="http://infoscience.epfl.ch/getfile.py?recid=88273&amp;mode=best"&gt;"Uniform consensus is harder than consensus" (2000)&lt;/a&gt;). With uniform consensus all processes must agree on a value, even the faulty ones - a transaction should only commit if all RMs are prepared to commit. Most forms of consensus are only concerned with having the non-faulty processes agree. Uniform consensus is more difficult than general consensus.&lt;br /&gt;&lt;br /&gt;Eventually Lamport came up with the Paxos consensus algorithm, described in &lt;a href="http://research.microsoft.com/users/lamport/pubs/lamport-paxos.pdf"&gt;"The Part-Time Parliament" (submitted in 1990, published 1998)&lt;/a&gt;. Unfortunately the analogy with Greek democracy failed badly with people finding the paper very difficult to understand, and the paper was ignored until its case was taken up by Butler Lampson in &lt;a href="http://research.microsoft.com/lampson/58-Consensus/Acrobat.pdf"&gt;"How to Build a Highly Availability System using Consensus" (1996)&lt;/a&gt;. This paper provides a good introduction to building fault tolerant systems and Paxos. Later Lamport would publish &lt;a href="http://research.microsoft.com/users/lamport/pubs/paxos-simple.pdf"&gt;"Paxos Made Simple (2001)&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The kernel of Paxos is that given a fixed number of processes, any majority of them must have at least one process in common. For example given three processes A, B and C the possible majorities are: AB, AC, or BC.  If a decision is made when one majority is present eg AB, then at any time in the future when another majority is available at least one of the processes can remember what the previous majority decided. If the majority is AB then both processes will remember, if AC is present then A will remember and if BC is present then B will remember.&lt;br /&gt;&lt;br /&gt;Paxos can tolerate lost messages, delayed messages, repeated messages, and messages delivered out of order. It will reach consensus if there is a single leader for long enough that the leader can talk to a majority of processes twice. Any process, including leaders, can fail and restart; in fact all processes can fail at the same time, the algorithm is still safe. There can be more than one leader at a time.&lt;br /&gt;&lt;br /&gt;Paxos is an asynchronous algorithm; there are no explicit timeouts. However, it only reaches consensus when the system is behaving in a synchronous way, ie messages are delivered in a bounded period of time; otherwise it is safe. There is a pathological case where Paxos will not reach consensus, in accordance to FLP, but this scenario is relatively easy to avoid in practice.&lt;br /&gt;&lt;br /&gt;Clearly dividing systems into synchronous and asynchronous is too broad a distinction, and Dwork, Lynch and Stockmeyer defined partially synchronous systems in &lt;a href="http://theory.lcs.mit.edu/tds/papers/Lynch/jacm88.pdf"&gt;"Consensus in the presence of partial synchrony" (1988) &lt;/a&gt;. There are two versions of partial synchronous system: in one processes run at speeds within a known range and messages are delivered in bounded time but the actual values are not known a priori; in the other version the range of speeds of the processes and the upper bound for message deliver are known a priori, but they will only start holding at some unknown time in the future. The partial synchronous model is a better model for the real world than either the synchronous or asynchronous model; networks function in a predicatable way most of the time, but occasionally go crazy.&lt;br /&gt;&lt;br /&gt;Lamport and Gray went on to apply Paxos to the distributed transaction commit problem in &lt;a href="http://research.microsoft.com/research/pubs/view.aspx?tr_id=701"&gt;"Consensus on Transaction Commit" (2005)&lt;/a&gt;. They used Paxos to effectively replicate the TM of 2PC, and used an instance of Paxos for each RM involved in the transaction to agree whether that RM could commit the transaction. On the face of it, using an instance of Paxos per RM looks expensive, but it turns out that it is not. Paxos Commit will complete in two phases for the fault free case, ie it has the same message delay as 2PC, though more messages are exchanged. A third phase is only required if there is a fault, in accordance to the Skeen result. Given 2n+1 TM replicas Paxos Commit will complete with up to n faulty replicas. Paxos Commit does not use Paxos to solve the transaction commit problem directly, ie it is not used to solve uniform consensus, rather it is used to make the system fault tolerant.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Any argument that distributed transactions should not be used because 2PC is blocking is a void, because Paxos Commit addresses the blocking issue.&lt;br /&gt;&lt;br /&gt;Recently there has been some discussion of the CAP conjecture: Consistency, Availability and Partition. The conjecture asserts that you cannot have all three in a distributed system: a system that is consistent, that can have faulty processes and that can handle a network partition.&lt;br /&gt;&lt;br /&gt;We can examine CAP by equating consistency with consensus. For an asynchronous system we cannot reach consensus with one faulty process, FLP, so we cannot have consistency and availability for an asynchronous system!&lt;br /&gt;&lt;br /&gt;Now take a Paxos system with three nodes: A, B and C. We can reach consensus if two nodes are working, ie we can have consistency and availability. Now if C becomes partitioned and C is queried, it cannot respond because it cannot communicate with the other nodes; it doesn't know whether it has been partitioned, or if the other two nodes are down, or if the network is being very slow. The other two nodes can carry on, because they can talk to each other and they form a majority. So for the CAP conjecture, Paxos does not handle a partition because C cannot respond to queries. However, we could engineer our way around this. If we are inside a data center we can use two independent networks (Paxos doesn't mind if messages are repeated). If we are on the internet, then we could have our client query all nodes A, B and C, and if C is partitioned the client can query A or B unless it is partitioned in a similar way to C.&lt;br /&gt;&lt;br /&gt;For a synchronous network, if C is partitioned it can learn that it is partitioned if it does not receive messages in a fixed period of time, and thus can declare itself down to the client.&lt;br /&gt;&lt;br /&gt;Paxos, Paxos Commit and HTTP/REST have been combined to build a highly available co-allocation system for Grid computing, details of which can be found here &lt;a href="http://www.cct.lsu.edu/%7Emaclaren/HARC/"&gt;HARC&lt;/a&gt;, there are also more references in this paper: &lt;a href="http://www.allhands.org.uk/2006/proceedings/papers/624.pdf"&gt;"Co-Allocation, Fault Tolerance and Grid Computing" (2006)&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-4188776420268450409?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/4188776420268450409/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=4188776420268450409' title='17 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/4188776420268450409'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/4188776420268450409'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2007/06/brief-history-of-consensus-2pc-and.html' title='A brief history of Consensus, 2PC and Transaction Commit.'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>17</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-117279289674883959</id><published>2007-03-01T15:48:00.000-08:00</published><updated>2007-03-01T16:00:52.926-08:00</updated><title type='text'>What are ReferenceParameters for?</title><content type='html'>&lt;a href="http://betathoughts.blogspot.com/"&gt;What are ReferenceParameters for?&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;This mini-rant was inspired by the &lt;a href="http://www.w3.org/2007/01/wos-ec-program.html"&gt;W3C Workshop on Enterprise computing&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;You get ReferenceParameters in WS-Addressing EndpointReferences, also known as EPRs. Kinda like this:&lt;br /&gt;&lt;tt&gt;&lt;br /&gt;&amp;lt;wsa:EndpointReference&amp;gt;&lt;br /&gt;  &amp;lt;wsa:Address&amp;gt;http://www.crapola.com/SoapSink&amp;lt;/wsa:Address&amp;gt;&lt;br /&gt;  &amp;lt;wsa:ReferenceParameters&amp;gt;&lt;br /&gt;      &amp;lt;SessionID&amp;gt;25324523&amp;lt;/SessionID&amp;gt;&lt;br /&gt;      &amp;lt;ResourceID&amp;gt;1434123421&amp;lt;/ResourceID&amp;gt;&lt;br /&gt;      &amp;lt;UserID&amp;gt;mark&amp;lt;/UserID&amp;gt;&lt;br /&gt;    &amp;lt;UserServiceRating&amp;gt;Gold&amp;lt;/UserServiceRating&amp;gt;&lt;br /&gt;  &amp;lt;/wsa:ReferenceParameters&amp;gt;&lt;br /&gt;&amp;lt;/wsa:EndpointReference&amp;gt;&lt;br /&gt;&lt;/tt&gt;&lt;br /&gt;A few things about ReferenceParameters. First, given an EPR you should treat the ReferenceParameters as opaque; you should not reason about them. Second, the ReferenceParameters get stuffed into SOAP Headers, and are sent along to the service that the EPR addresses. Third, the thing in the wsa:Address is an identifier, not a location, you de-reference the identifier to a location before sending the message.&lt;br /&gt;&lt;br /&gt;So what might you do with ReferenceParameters? Well, if you are using WSRF you might use them for a bit of identification.  Have all the SOAP messages arrive at the endpoint identified by the wsa:Address element, then direct the messages to particular WS-Resources based on the ReferenceParameters; ResourceID in the example.&lt;br /&gt;&lt;br /&gt;Or you might decide to use ReferenceParameters to support sessions. You want to track a particular client's usage of a service, then give him an EPR with a session identifier, SessionID in the example, embedded in the ReferenceParameters. The session identifier will be carried in the SOAP header each time the client sends a message to the service. You could even carry some information to identify the client; UserID or UserServiceRating in the example.&lt;br /&gt;&lt;br /&gt;All these uses of ReferenceParameters are possible, though the W3C TAG frowns on the first. ReferenceParameters are a bit like HTTP cookies in functionality; cookies can do all the above for HTTP. So, what is the problem?&lt;br /&gt;&lt;br /&gt;Well, given an EPR that has ReferenceParameters you should NEVER share it with anyone else. You cannot know what those ReferenceParameters are for. They could be there for some identification purpose, in which case it would be OK to share them, but you cannot know that for sure. They could actually be for identifying a particular session, or client. Sharing EPRs with ReferenceParameters would be like sharing your HTTP cookies; you simply wouldn't do it. Now, imagine a Web were you were not able to share URIs.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-117279289674883959?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/117279289674883959/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=117279289674883959' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/117279289674883959'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/117279289674883959'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2007/03/what-are-referenceparameters-for.html' title='What are ReferenceParameters for?'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-117223760634829829</id><published>2007-02-23T05:33:00.001-08:00</published><updated>2007-02-23T05:34:31.713-08:00</updated><title type='text'>Beta Defining SOA, or adding to the pollution.</title><content type='html'>&lt;a href="http://betathoughts.blogspot.com/"&gt;Defining SOA, or adding to the pollution.&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Someone sent me this link to a movie on &lt;a href="http://www.jisc.ac.uk/media/avfiles/programmes/eframework/eframework_soa_animation.mov"&gt;SOA&lt;/a&gt;. Those gold disks look pretty expensive. And I hope the red bouncing balls don't have to change colour, or shape; it could break the expensive gold disks. &lt;br /&gt;&lt;br /&gt;My immediate response to anything about SOA is SnakeOil. This is because I have never found a good definition of SOA that I really liked; something really visceral that you can chew on. Mostly I have found the definitions to have too much hand waving, with high minded terms that I don't really understand. Or else, too obvious to be of any use. &lt;br /&gt;&lt;br /&gt;REST is more tangible, for example, anything that can have an identity can be a resource. OK then, I have this thing, not really sure what this thing is, but at least I can give it a URI. However, REST does have a few glib phrases: "hypermedia as the engine of application state", but they add spice.&lt;br /&gt;&lt;br /&gt;I have been challenged to provide my own definition of SOA. Normally, I am loathe to create definitions as it pollutes the world and causes confusion. It is always better to reference an existing definition. Even this concept is RESTful, if you reference an existing definition you get the same network effects as linking; the more people that "link" to that definition the more authoritative it becomes. This doesn't always work though, the definitive definition of REST, Dr. Fielding's thesis, is not the top hit in Google for the term REST. However, in any debate Dr. Fielding always has the last word on what REST means, though he may of course be wrong in what he asserts in his thesis; REST is his word, and his definition is the only correct one. Of course his definition may change over time, but this is just as unhealthy as creating new definitions in terms of confusing people.  &lt;br /&gt;&lt;br /&gt;So what is my definition of SOA. An architecture for a distributed system based on services, is nice and vapid, though logically sound. The key, then is to define a service. &lt;br /&gt;&lt;br /&gt;A service interface is defined by the set of messages it can process, and the set of messages it emits. The set of messages it can process, MessagesIn, and the set of messages it emits, MessagesOut, can change over time. There may exist a mapping of a subset of MessagesIn to a subset of MessagesOut, such that if the service receives a message from this subset of MessagesIn it will emit a message from the subset of MessagesOut. A service interface description describes a subset of the MessagesIn that a service can process, a subset of MessagesOut that a service can emit and any mapping between the subsets of MessagesIn and MessagesOut that it describes. A service interface description only defines sets of messages at a particular time, so a service may have a series of service interface descriptions. A service interface description is incomplete if it does not include the compete set of MessagesIn and MessagesOut for a service at a particular time. Since it is only possible to communicate using messages, and as messages must take a finite time to travel between sender and receiver, a client can never know if a service interface description is still valid (assuming the service interface description is in the MessagesOut set). A service interface description is faulty if it describes a set of input or output messages that includes messages which are not in MessagesIn or MessagesOut.&lt;br /&gt;&lt;br /&gt;A service may also have internal state, and can be modelled by a state machine. This state machine may or may not be deterministic. The mapping of input to output messages may depend on the internal state of the machine, and the mapping may change over time.   &lt;br /&gt;&lt;br /&gt;Service developers should make the set of messages MessagesIn as large as possible, preferably it should be an infinite set to make interoperability and service description easier. Service developers should constrain the size of MessagesOut to be as small as possible, preferably a set with no members, as it makes interoperability and service description easier. (The last two clauses are a form of &lt;a href="http://en.wikipedia.org/wiki/Postel's_Law"&gt;Postal's Law&lt;/a&gt;). Service developers should make the internal state of their service as large as possible, preferably of infinite size because then they will be better than Google, and can charge a lot for advertising. If a service developer achieves this, an infinite set of MessagesIn, a zero size set of MessagesOut with an infinite amount of internal state, then they will have created God.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-117223760634829829?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/117223760634829829/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=117223760634829829' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/117223760634829829'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/117223760634829829'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2007/02/beta-defining-soa-or-adding-to.html' title='Beta Defining SOA, or adding to the pollution.'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-117199512529188460</id><published>2007-02-20T10:12:00.000-08:00</published><updated>2007-02-20T10:12:05.380-08:00</updated><title type='text'>The WSRF CurrentTime is 5.41pm</title><content type='html'>&lt;a href="http://betathoughts.blogspot.com/"&gt;CurrentTime kills caching for WSRF&lt;/a&gt;&lt;br /&gt;We have been fixing a bug in our &lt;a href="http://www.realitygrid.org/publications/wsrfajax.pdf"&gt;AJAX to REST/WSRF stuff&lt;/a&gt;.&lt;br /&gt;The bug was that IE6 did not update the properties properly. &lt;a href="http://blog.harbulot.com/"&gt;Bruno&lt;/a&gt; discovered that IE6 was not invoking the HTTP GET because the URI had not changed, we had not included any cache control HTTP Headers, but I assumed that would mean "don't cache". The HTTP spec is hard going wrt caching but &lt;a HREF="http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.4"&gt;Section 13.4&lt;/a&gt; states &lt;it&gt;"If there is neither a cache validator nor an explicit expiration time associated with a response, we do not expect it to be cached, but certain caches MAY violate this expectation (for example, when little or no network connectivity is available)"&lt;/it&gt;.The solution was to add HTTP Headers which explicitly said don't cache this. Adding metadata cannot be a bad thing I guess.&lt;br /&gt;&lt;br /&gt;The next issue was should we do caching properly? There are a lot of advantages, especially for the AJAX client, as we could support conditional GETs. I had a vague idea how to do it too, put in a hook for the developer to attach his caching policy/code. But then I remembered CurrentTime; WSRF defines a ResourceProperty, CurrentTime, so that a client can find out the time when the WS-Resource had a certain set of properties. So one of our properties is always changing; so we cannot cache!&lt;br /&gt;&lt;br /&gt;CurrentTime is a bad name, because almost by definition it is not the current time -- its some time later, &lt;a href="http://research.microsoft.com/users/lamport/pubs/time-clocks.pdf"&gt;Lamport's clocks&lt;/a&gt;. HTTP uses a HTTP Header, Date, to convey the same concept, and the concept really is the time at which the message was created. Its metadata about the message, it should not really be part of the message.&lt;br /&gt;&lt;br /&gt;The good thing is I do not have to write the hooks for supporting caching, the bad thing is that we cannot do caching.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-117199512529188460?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/117199512529188460/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=117199512529188460' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/117199512529188460'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/117199512529188460'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2007/02/wsrf-currenttime-is-541pm.html' title='The WSRF CurrentTime is 5.41pm'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-117147300386939266</id><published>2007-02-14T09:10:00.000-08:00</published><updated>2007-02-14T09:10:03.873-08:00</updated><title type='text'>A Solution to a Distributed Systems Problem</title><content type='html'>The solution to this &lt;a href="http://betathoughts.blogspot.com/2007/02/distributed-systems-puzzle.html"&gt;puzzle&lt;/a&gt; is:&lt;br /&gt;&lt;br /&gt;  Any protocol that solves this problem is equivalent to one&lt;br /&gt;in which there are rounds of message exchanges: first A (say)&lt;br /&gt;sends to B, next B sends to A, then A sends to B, and so on. We&lt;br /&gt;show that in assuming the existence of a protocol to solve the&lt;br /&gt;problem, we are able to derive a contradiction. This &lt;br /&gt;establishes that no such protocol exists.&lt;br /&gt;&lt;br /&gt; Select the protocol that solves the problem using the fewest&lt;br /&gt;rounds. By assumption, such a protocol must exist and, by&lt;br /&gt;construction, no protocol solving the problem using fewer rounds&lt;br /&gt;exists. Without loss of generality suppose that m, the last &lt;br /&gt;message sent by either process, is sent by A.&lt;br /&gt;&lt;br /&gt; Observe that the action ultimately taken by A cannot depend on&lt;br /&gt;whether m is recieved by B, because its receipt could never be&lt;br /&gt;learned by A (since it is the last message). Thus, A's choice of &lt;br /&gt;action alpha or beta does not depend on m. Next, observe that the &lt;br /&gt;action ultimately taken by B cannot depend on whether m is &lt;br /&gt;recieved by B, because B must make the same choice of action alpha &lt;br /&gt;or beta even if m is lost (due to channel failure).&lt;br /&gt;&lt;br /&gt;  Having argued that the action chosen by A and B does not depend on&lt;br /&gt;m, we conclude that m is superfluous. Thus, we can construct a new &lt;br /&gt;protocol in which one fewer message is sent. However, the existence &lt;br /&gt;of such a shorter protocol contradicts the assumption that our &lt;br /&gt;original protocol used the fewest number of rounds.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-117147300386939266?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/117147300386939266/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=117147300386939266' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/117147300386939266'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/117147300386939266'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2007/02/solution-to-distributed-systems.html' title='A Solution to a Distributed Systems Problem'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-117147265935982917</id><published>2007-02-14T09:04:00.000-08:00</published><updated>2007-02-14T09:04:19.466-08:00</updated><title type='text'>A Distributed Systems Puzzle...</title><content type='html'>&lt;a href="http://betathoughts.blogspot.com/"&gt;A Distributed Systems Puzzle&lt;/a&gt;&lt;br /&gt;I like this little puzzle taken from &lt;a href="http://www.amazon.com/Distributed-Systems-2nd-Sape-Mullender/dp/0201624273"&gt;Distributed Systems&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;Two processes, A and B, communicate by sending and&lt;br /&gt;receiving messages on a bidirectional channel. Neither&lt;br /&gt;process can fail. However, the channel can experience&lt;br /&gt;transient failures, resulting in loss of a subset&lt;br /&gt;of the messages that have been sent. Devise a protocol&lt;br /&gt;where either of two actions alpha or beta are possible,&lt;br /&gt;but (i) both processes take the same action and (ii)&lt;br /&gt;neither takes both actions.&lt;br /&gt;&lt;br /&gt;Can you show that there is no solution?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-117147265935982917?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/117147265935982917/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=117147265935982917' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/117147265935982917'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/117147265935982917'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2007/02/distributed-systems-puzzle.html' title='A Distributed Systems Puzzle...'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-117106307438066873</id><published>2007-02-09T15:17:00.000-08:00</published><updated>2007-02-09T16:11:55.576-08:00</updated><title type='text'>Google does the Impossible!</title><content type='html'>&lt;a href="http://betathoughts.blogspot.com/"&gt;Google does the Impossible&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I often wondered whether Google would hire me, until now I had doubts. Google only hire people who are &lt;a href="http://googleresearch.blogspot.com/2006/03/hiring-lake-wobegonstrategy.html"&gt;smarter than the average Googler&lt;/a&gt;, and given my lack of a CS background I would feel at a further disadvantage given a question like &lt;a href="http://www.theregister.co.uk/2007/01/05/google_interview_tales/"&gt;"What is the most efficient way to sort a million integers?"&lt;/a&gt;; I might blurt out "With a computer". With this kind of attitude you might start to think they are a bit arrogant.&lt;br /&gt;&lt;br /&gt;However this paper on the &lt;a href="http://labs.google.com/papers/chubby.html"&gt;Chubby Lock Service&lt;/a&gt; gives me hope! I was interested in it because it uses the Paxos algorithm, and I am pretty interested in Paxos from our work on &lt;a href="http://www.allhands.org.uk/2006/proceedings/papers/624.pdf"&gt;HARC&lt;/a&gt;. This section in the Chubby paper introduction caught my eye.&lt;br /&gt;&lt;it&gt;&lt;br /&gt;&lt;/it&gt;&lt;i&gt;&lt;it&gt;Readers familiar with distributed computing will recognize the election of a primary among peers as an instance of the distributed consensus problem, and realize we require a solution using asynchronous communication; this term describes the behavior of the vast majority of real networks, such as Ethernet or the Internet, which allow packets to be lost, delayed, and reordered. (Practitioners should normally beware of protocols based on models that make stronger assumptions on the environment.) Asynchronous consensus is solved by the Paxos protocol [12, 13]. The same protocol was used by Oki and Liskov (see their paper on viewstamped replication [19]), an equivalence noted by others [14]. Indeed, all working protocols for asynchronous consensus we have so far encountered have Paxos at their core. Paxos maintains safety without timing assumptions, but clocks must be introduced to ensure liveness; this overcomes the impossibility result of Fischer et al. [5].&lt;/it&gt;&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Lets start with the first sentence, &lt;i&gt;&lt;it&gt;"asynchronous communication; this term describes the behavior of the vast majority of real networks, such as Ethernet or the Internet, which allow packets to be lost, delayed, and reordered."&lt;/it&gt;.&lt;/i&gt; This definition of asynchronous communications, in the context of a discussion on Consensus, is wrong. Let's grab the Fischer et al. paper for our reference. This paper, famously known by the initials of the authors names as FLP, is one of the most important in distributed systems literature; we will see why it is so important in a bit. The full title of the paper is &lt;a href="http://theory.lcs.mit.edu/tds/papers/Lynch/pods83-flp.pdf"&gt;"Impossibility of Distributed Consensus with One Faulty Process"&lt;/a&gt;,pretty arresting stuff! From the paper:&lt;br /&gt;&lt;it&gt;&lt;br /&gt;&lt;/it&gt;&lt;i&gt;&lt;it&gt;"In this paper, we show the surprising result that no completely asynchronous consensus protocol can tolerate even a single unannounced process death. We do not consider Byzantine failures, and we assume that the message system is reliable - it delivers all messages correctly and exactly once. Nevertheless, even with these assumptions, the stopping of a single process at an inopportune time can cause any distributed commit protocol to fail to reach agreement."&lt;/it&gt;&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;The paper goes on to describe an asynchronous system:&lt;br /&gt;&lt;it&gt;&lt;br /&gt;&lt;/it&gt;&lt;i&gt;&lt;it&gt;"Crucial to our proof is that processing is completely asynchronous; that is, we make no assumptions about the relative speeds of processes or about the delay time in delivering a message. We also assume that processes do not have access to synchronized clocks, so algorithms based on time-outs, for example, cannot be used. (In particular, the solutions in [6]  are not applicable.) Finally, we do not postulate the ability to detect the death of a process, so it is impossible for one process to tell whether another has died (stopped entirely) or is just running very slowly."&lt;/it&gt;&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;So "asynchronous communication" actually means that there is no upper bound on how long a message can take to be delivered. You &lt;b&gt;can&lt;/b&gt; have asynchronous communications and a perfect network: messages are always delivered, they are delivered in order and delivered only once, but it can take an infinite time for them to be delivered. This is the FLP model. (For previous rants on asynchronous, synchronous and partially synchronous see this blog entry: &lt;a href="http://betathoughts.blogspot.com/2007/01/asynchronous.html"&gt;Asynchronous?&lt;/a&gt;.) Of course in synchronous communications you can have message losses, message delays and message re ordering, but it is easier to deal with in this environment: if I don't get a message in the next 30 seconds I know there has been a fault.&lt;br /&gt;&lt;br /&gt;The FLP paper is important for two reasons: it says something fundamental about dealing with faults in distributed systems, and it demonstrated how to show some problems in distributed  computing are impossible to solve. Dealing with faults in an asynchronous system is very difficult, because you cannot tell whether a processor is very slow or faulted. You need to use time, or fault detectors. To show that a problem is impossible to solve FLP uses proof by contradiction: show what properties an algorithm that solves the problem must have, and then show that there is some contradiction in the properties. After FLP people had lots of fun showing stuff was impossible.&lt;br /&gt;&lt;br /&gt;The next phrase in the Chubby paper is: &lt;i&gt;&lt;it&gt;"(Practitioners should normally beware of protocols based on models that make stronger assumptions on the environment.)"&lt;/it&gt;&lt;/i&gt;&lt;it&gt;  &lt;it&gt;This seems a bit rich! With respect to timing models,  you can choose between synchronous, asynchronous or partially synchronous, then  you can choose the fault model, for example Byzantine or non-Byzantine. As we will see Chubby system actually does make stronger assumptions about the timing model, it is actually depending on the system being partially synchronous.&lt;br /&gt;&lt;br /&gt;The next line in the Chubby paper is: &lt;i&gt;&lt;it&gt;"Asynchronous consensus is solved by the Paxos protocol [12, 13]."&lt;/it&gt;&lt;/i&gt;. The wording is a bit loose here, there are various versions of the consensus problem with different fault models and requirements, for example &lt;a href="http://linkinghub.elsevier.com/retrieve/pii/S0196677403001652"&gt;Uniform Consensus &lt;/a&gt;or even the &lt;a href="http://research.microsoft.com/users/lamport/pubs/pubs.html#byz"&gt;Byzantine Generals&lt;/a&gt;, however most people would take this statement to mean the consensus problem as defined by FLP. But FLP showed consensus in an asynchronous system is impossible with only one faulty process! (Consensus is straight forward without faults, so is not really considered a problem).&lt;br /&gt;&lt;br /&gt;Now skipping to the last sentence from the Chubby extract: &lt;it&gt;"&lt;/it&gt;&lt;i&gt;&lt;it&gt;Paxos maintains safety without timing assumptions, but clocks must be introduced to ensure liveness; this overcomes the impossibility result of Fischer et al. [5]."&lt;/it&gt;&lt;/i&gt; The logic of the last part implies that Fischer et al. are wrong, or else Google have achieved the impossible! However if we wind back a bit, they introduce clocks and once you do that the system is no longer completely asynchronous. They don't overcome the impossibility result of Fischer et al., they are using a different model which isn't completely asynchronous.&lt;br /&gt;&lt;br /&gt;So why all the confusion? &lt;a href="http://research.microsoft.com/users/lamport/pubs/pubs.html#PaxosGST"&gt;Lamport clarifies it as follows&lt;/a&gt;: &lt;i&gt;&lt;it&gt;"Asynchronous consensus algorithms like Paxos maintain safety despite asynchrony, but are guaranteed to make progress only when the system becomes synchronous - meaning that messages are delivered in a bounded length of time."&lt;/it&gt;&lt;/i&gt;&lt;it&gt; &lt;/it&gt;Paxos is an asynchronous algorithm because it doesn't use time, but the timing of events in the system must conspire for it to terminate.&lt;br /&gt;&lt;br /&gt;Butler Lampson summarized the timing requirements nicely in &lt;a href="http://research.microsoft.com/Lampson/58-Consensus/Acrobat.pdf"&gt;"How to build a highly available system using consensus."&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;it&gt;&lt;i&gt;"It [Paxos] terminates if there is a single leader for a long enough time during which the leader can talk to a majority of the agent processes twice. It may not terminate if there are always too many leaders (fortunate, since we know that guaranteed termination is impossible )."&lt;/i&gt;. &lt;/it&gt;The last part is FLP.&lt;br /&gt;&lt;br /&gt;There is a pathological case in Paxos were two leaders compete to get a value chosen and you effectively end up with livelock, however it is not difficult in practice to avoid this. &lt;a href="http://research.microsoft.com/Lampson/58-Consensus/Acrobat.pdf"&gt;Lampson &lt;/a&gt;describes how to use a &lt;it&gt;sloppy&lt;/it&gt; leadership election to try and make sure there is only one leader, it doesn't matter if the sloppy leadership election screws up because we know Paxos is safe with multiple leaders. The pathological case corresponds to the FLP result.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://theory.lcs.mit.edu/tds/papers/Lynch/jacm88.pdf"&gt;"Consensus in the Presence of Partial Synchrony"&lt;/a&gt; provides details on the conditions that are required in order to reach consensus.&lt;br /&gt;&lt;br /&gt;I have only a practical interest in Paxos in order to solve the problems addressed by HARC. I am not sure I fully understand the proofs behind it, or FLP, but the main ideas behind it are important to get right. Which is why I am so disappointed by the Chubby paper, more people will read it than FLP, or the Paxos papers, and it will only cause confusion.  As if the &lt;a href="http://research.microsoft.com/users/lamport/pubs/lamport-paxos.pdf"&gt;"The Part-Time Parliament"&lt;/a&gt; doesn't do enough of that!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-117106307438066873?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/117106307438066873/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=117106307438066873' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/117106307438066873'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/117106307438066873'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2007/02/google-does-impossible.html' title='Google does the Impossible!'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-117086170057831571</id><published>2007-02-07T07:21:00.000-08:00</published><updated>2007-02-07T07:30:46.346-08:00</updated><title type='text'>Perfect Code</title><content type='html'>&lt;a href="http://www.helpfindjim.com/"&gt;Jim Gray's&lt;/a&gt; disappearance has lead me to reflect on the things he has written that have stayed with me. One them is perfect code. In&lt;br /&gt;&lt;a href="http://www.amazon.com/Transaction-Processing-Concepts-Techniques-Management/dp/1558601902"&gt;Transaction Processing: Concepts and Techniques&lt;/a&gt; Jim talks about perfect code; code without bugs. He even gives an example, a simple function of a few lines that adds two numbers together. In fact it took him a couple of attempts to get it right! Jim makes the point that perfect code is possible, just very expensive. He goes on to talk about the cost of code in terms of dollars per line. NASA pay the most, but their code still has bugs! The astronauts board the shuttle with a bug list, an example of which was not to use the two keyboards on the shuttle at the same time because the inputs would be OR'd! Perfect code is possible, just really expensive.&lt;br /&gt;&lt;br /&gt;Most people are surprised when I say that perfect code is possible, they just assume that all code contains bugs. I was reminded of this while following the &lt;a href="http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html"&gt;Nearly All Binary Searches and Mergesorts are Broken&lt;/a&gt; thread on binary search implementation bugs. At the time I hit the library to check Knuth's version of binary search; he didn't let me down. Someone who invents their own ISA to express their algorithms isn't going to get caught by an overflow.&lt;br /&gt;&lt;br /&gt;I am sure the attitude that all code contains bugs must have a detrimental effect on programmers, it has on me. It is also re-enforced by the open source mantra "release early, release often". Testing doesn't solve all the problems either: "Testing can only prove the existence of bugs", Dijkstra.&lt;br /&gt;&lt;br /&gt;I recently re-read the &lt;a href="http://www.mcjones.org/System_R/"&gt;History of System R&lt;/a&gt;. The name Franco Putzolu cropped up, Jim mentioned him in the acknowledgements of "Transaction Processing: Concepts and Techniques" as someone who had made huge contributions to the field of transactions and databases but who never got the public recognition as his name was not on a lot of the famous papers.&lt;br /&gt;From the System R history (Copyright attached):&lt;br /&gt;&lt;tt&gt;&lt;br /&gt;Mike Blasgen: The RSS didn't have any bugs. [laughter] No, it's true, the reason is because much of the RSS was written by Franco. No, it's really true; Franco never wrote a bug. Except for one,&lt;br /&gt;right, Bruce? Did you find one?&lt;br /&gt;&lt;br /&gt;Bruce Lindsay: One.&lt;br /&gt;&lt;br /&gt;Mike Blasgen: He wrote about half of RSS, and I think we found one bug. And that was after nine years.&lt;br /&gt;&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;RSS was the storage part of System R and used B-Trees. It was 35,000 lines of code to solve some pretty hairy problems, and had only one bug! Remember too that the tool support back then must have been terrible compared to today. Pretty perfect then.&lt;br /&gt; &lt;br /&gt;&lt;p&gt;&lt;br /&gt;Copyright (c) 1995, 1997 by Paul McJones, Roger Bamford, Mike Blasgen, Don Chamberlin, Josephine Cheng, Jean-Jacques Daudenarde, Shel Finkelstein, Jim Gray, Bob Jolls, Bruce Lindsay, Raymond Lorie, Jim Mehl, Roger Miller, C. Mohan, John Nauman, Mike Pong, Tom Price, Franco Putzolu, Mario Schkolnick, Bob Selinger, Pat Selinger, Don Slutz, Irv Traiger, Brad Wade, and Bob Yost. You may copy this document in whole or in part without payment of fee provided that you acknowledge the authors and include this notice.&lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-117086170057831571?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/117086170057831571/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=117086170057831571' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/117086170057831571'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/117086170057831571'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2007/02/perfect-code.html' title='Perfect Code'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-116922809598492426</id><published>2007-01-19T09:34:00.000-08:00</published><updated>2007-01-19T09:40:56.316-08:00</updated><title type='text'>Idempotent Messages</title><content type='html'>There has been quite a bit of internal discussion of &lt;a href="http://betathoughts.blogspot.com/2007/01/pat-helland-discovers-rest-hard-way.html"&gt;Pat Helland's paper&lt;/a&gt; at Mancs. Although Pat sees the usefulness of idempotent messages, I wonder if he has considered the fact that though a message may be idempotent, a sequence of idempotent messages may not be idempotent. Lamport thought about this in his paper on &lt;a href="http://research.microsoft.com/research/pubs/view.aspx?type=Technical%20Report&amp;id=884"&gt;Generalized Consensus and Paxos&lt;/a&gt;, he also considered if messages could commute before bundling them. A HTTP GET will commute with another HTTP GET, but I am not sure about the other methods.&lt;br /&gt;&lt;br /&gt;Below is a fragment of the discuss we are having on idempotent and at-least-once messaging...&lt;br /&gt;&lt;br /&gt;Say a state machine has a possible sequence of state changes&lt;br /&gt;A-&amp;gt;B-&amp;gt;C and once it reaches C it cannot change state, and A is the&lt;br /&gt;inital state. The state machine accepts two types of message&lt;br /&gt;&amp;lt;a-&amp;gt;B and &amp;lt;b-&amp;gt;C&amp;gt;. If the state machine is in state B, it can&lt;br /&gt;ignore messages &amp;lt;a-&amp;gt;B&amp;gt; because it inherently knows it has processed such a message before. Therefore it doesn't need to store message IDs to achieve at-least-once delivery.&lt;br /&gt;&lt;br /&gt;In the above system the two messages are inherently idempotent,&lt;br /&gt;as designed. Out of order delivery is interesting, if the&lt;br /&gt;sequence &amp;lt;(A-&amp;gt;B),(B-&amp;gt;C)&amp;gt; is delivered out of order then both&lt;br /&gt;messages will fail to cause a state change. Nothing bad happens?&lt;br /&gt;&lt;br /&gt;If we replaced the two messages with a single &amp;lt;ChangeState&amp;gt; message, then it is not idempotent, and we would need to include message IDs&lt;br /&gt;to achieve at-least-once delivery. Out of order delivery MIGHT now&lt;br /&gt;cause unwanted side effects, depending on the system and what clients expect.&lt;br /&gt;&lt;br /&gt;The second approach to messaging appears simpler because there is&lt;br /&gt;only one message no matter how many states the state machine has,&lt;br /&gt;the first case has N-1 messages were N is the number of states.&lt;br /&gt;&lt;br /&gt;Clearly message design has a big impact on the behaviour of the system!!!&lt;br /&gt;&lt;br /&gt;For an application to "know" that it has processed a message, it&lt;br /&gt;must be in a state that it can only have reached by processing that&lt;br /&gt;message. There seems to be two approaches to achieve this: design your system to use idempotent messages, or design it so that progress&lt;br /&gt;from one state to another makes messages idempotent. An interesting question is whether these two approaches are equivalent? If you&lt;br /&gt;used both approaches would you come up with the same design.&lt;br /&gt;&lt;br /&gt;The messages &amp;lt;a-&amp;gt;B&amp;gt; and &amp;ltb-&amp;gt;C&amp;gt; map to PUT(B) and PUT(C) in HTTP, while &amp;lt;ChangeState&amp;gt; maps to POST(ChangeState).&lt;br /&gt;&lt;br /&gt;In a programming language the operation might be implemented as&lt;br /&gt;stateMachine++. stateMachine++ is interesting because it looks like &lt;changestate&gt;,&lt;br /&gt;but the underlying system (CPU and memory) views it differently:&lt;br /&gt;&lt;br /&gt;&lt;tt&gt;&lt;br /&gt;retrieve value stored at address "stateMachine",&lt;br /&gt;increment value,&lt;br /&gt;put new value back to address "stateMachine"&lt;br /&gt;&lt;/tt&gt;&lt;br /&gt;&lt;br /&gt;However latency is zero, or at least is masked to appear as zero.&lt;br /&gt;Partial failures are not possible. In a multi-thread environment&lt;br /&gt;locks would have to be introduced, but without partial failures this&lt;br /&gt;isn't a problem (a processor holding a lock cannot die without the&lt;br /&gt;rest of the system noticing, or failing).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-116922809598492426?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/116922809598492426/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=116922809598492426' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116922809598492426'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116922809598492426'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2007/01/idempotent-messages.html' title='Idempotent Messages'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-116921339583114288</id><published>2007-01-19T05:29:00.000-08:00</published><updated>2007-01-19T05:36:35.113-08:00</updated><title type='text'>Asynchronous?</title><content type='html'>&lt;div align="justify"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div align="justify"&gt;As I do not have a CS background I have always been confused&lt;br /&gt;when people used the term asynchronous. This entry has been&lt;br /&gt;prompted by Dave Orchard's description of  &lt;a href="http://www.pacificspirit.com/blog/2007/01/11/soa_principles"&gt;SOA&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;HTTP is a network protocol. HTTP is asynchronous. It uses a&lt;br /&gt;Request-Response pattern, a client sends request and the server&lt;br /&gt;sends a response. However there is nothing in the protocol about&lt;br /&gt;how long the response will take, therefore it is an asynchronous&lt;br /&gt;protocol. TCP is a synchronous protocol, there are various timeouts&lt;br /&gt;specified in the standard and, agents are supposed to react in&lt;br /&gt;specified ways to these timeouts.&lt;br /&gt;&lt;br /&gt;However most HTTP libraries use synchronous function calls: a client&lt;br /&gt;makes a call to a function in the HTTP library, the library sends a&lt;br /&gt;request to a HTTP server and returns the response message to the&lt;br /&gt;client using the return mechanism of the function. Most HTTP&lt;br /&gt;libraries allow the client to pass a timeout value to the library,&lt;br /&gt;if the server does not respond within the timeout then the library&lt;br /&gt;reports an error to the client. The timeout is specified by the client,&lt;br /&gt;not by the HTTP standard.&lt;br /&gt;&lt;br /&gt;Some HTTP libraries support asynchronous function calls: for example&lt;br /&gt;the client makes a function call to the HTTP library, the call does&lt;br /&gt;not block but returns immediately, the client can do some other processing&lt;br /&gt;before making another function call to pick up the HTTP response.&lt;br /&gt;&lt;a href="http://search.cpan.org/~marclang/ParallelUserAgent-2.57/lib/LWP/Parallel.pm"&gt;LWP::Parallel&lt;/a&gt; is an example of such a library. There are also terms like&lt;br /&gt;blocking/non-blocking etc which can be used to describe function calls.&lt;br /&gt;&lt;br /&gt;So there are two concepts of asynchronous/synchronous at play here: one&lt;br /&gt;that is used by people like &lt;a href="http://research.microsoft.com/users/lamport/"&gt;Lamport&lt;/a&gt; and &lt;a href="http://theory.lcs.mit.edu/~lynch/"&gt;Lynch&lt;/a&gt; to describe distributed systems&lt;br /&gt;and one used by people to talk about programming models. As I don't have a&lt;br /&gt;background in CS this confused me for a long time, I was never sure how people&lt;br /&gt;were using the term. The issue is further complicated by the fact that if you&lt;br /&gt;are building an asynchronous system, per Lamport and Lynch, then using an&lt;br /&gt;asynchronous programming model is more suited to the task. &lt;br /&gt;&lt;br /&gt;So reading Dave Orchard's blog entry on &lt;a href="http://www.pacificspirit.com/blog/2007/01/11/soa_principles"&gt;SOA &lt;/a&gt;I am wondering what he means&lt;br /&gt;by advocating "asynchronous". He is describing an approach to&lt;br /&gt;distributed computing called SOA, so I guess he must be talking about&lt;br /&gt;the Lynch/Lamport definition of asynchronous. This has consequences,&lt;br /&gt;namely that you won't be able to do very much.&lt;br /&gt;&lt;br /&gt;In an asynchronous system it is impossible to reach consensus with just one&lt;br /&gt;faulty processor, even with a perfect network. Consensus is the problem of&lt;br /&gt;getting a set of processors to agree a value. Consider the case of submitting&lt;br /&gt;a purchase order in an asynchronous system, you send the request but the&lt;br /&gt;request can take an infinite time to reach the server, the server can take&lt;br /&gt;an infinite time to process the message and the response can take an infinite&lt;br /&gt;time to return. How can you tell if the server failed? On the Web this is&lt;br /&gt;equivalent to hitting the submit button and nothing happening - what do you&lt;br /&gt;do? Wait a bit longer, wait for an e-mail, re-try the submit button,&lt;br /&gt;ring/e-mail the server administrator etc, some internal clock triggers an action.&lt;br /&gt;In an asynchronous system you just wait, if you include timeouts then it is&lt;br /&gt;no longer asynchronous.&lt;br /&gt;&lt;br /&gt;There are three timing models in distributed systems: synchronous,&lt;br /&gt;partially-synchronous, and asynchronous. For definitions see&lt;br /&gt;&lt;a href="http://theory.lcs.mit.edu/tds/papers/Lynch/jacm88.pdf"&gt;&lt;br /&gt;Consensus in the Presence of Partial Synchrony&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;&lt;br /&gt;&lt;b&gt;synchronous:&lt;/b&gt; In a synchronous system, there is a known fixed upper&lt;br /&gt;bound on the time required for a message to be sent from one processor&lt;br /&gt;to another and a known fixed upper bound on the relative speeds of&lt;br /&gt;different processors.&lt;br /&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;br /&gt;&lt;b&gt;asynchronous:&lt;/b&gt; In an asynchronous system no fixed upper bounds exist&lt;br /&gt;on the time required for a message to be sent from one processor&lt;br /&gt;to another, nor on the relative speeds of different processors.&lt;br /&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;br /&gt;&lt;b&gt;partially-synchronous:&lt;/b&gt; Fixed bounds are known to exist but are not&lt;br /&gt;know a priori, or in another version the fixed bounds are known&lt;br /&gt;but are only guaranteed to hold starting at some unknown time.&lt;br /&gt;&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;br /&gt;Designing algorithms for asynchronous systems is attractive because&lt;br /&gt;they will work in any system, synchronous or partially-synchronous.&lt;br /&gt;So using an asynchronous model for the internet makes sense, but you&lt;br /&gt;are restricted in what you can do. It is interestingly to note that&lt;br /&gt;some Web Service specifications are synchronous. For example WS-Security&lt;br /&gt;uses time ranges to define how long a message is valid for,&lt;br /&gt;messages that arrive outside this range are not processed. This causes&lt;br /&gt;problems when systems that have a large clock skew, I have seen many weird&lt;br /&gt;bugs arise due to this including messages arriving before they were sent!&lt;br /&gt;&lt;br /&gt;So choosing an asynchronous timing model for your distributed system&lt;br /&gt;may not be so attractive after all. Perhaps Dave meant an asynchronous&lt;br /&gt;programming model? At first this doesn't make sense, since he is talking&lt;br /&gt;about an architecture for a distributed system, how you write code&lt;br /&gt;for such a system should be an orthogonal issue. However looking through&lt;br /&gt;the discussions of SOA I can see a pattern.&lt;br /&gt;&lt;br /&gt;Remote Procedure Call, RPC, as introduced in &lt;a href="http://www.faqs.org/rfcs/rfc707.html"&gt;IETF RFC 707&lt;/a&gt;, introduced an&lt;br /&gt;abstraction which allowed programmers to think that invoking a remote&lt;br /&gt;service was the same as invoking a local function call. The abstraction&lt;br /&gt;was extended by the concept of distributed objects, the programmer could&lt;br /&gt;think of a remote object as being just like a local object. The idea&lt;br /&gt;behind this thinking is that programmers are familiar with procedure&lt;br /&gt;calls and objects, and so RPC and distributed objects will be an easy&lt;br /&gt;path into distributed computing for a programmer already familiar with&lt;br /&gt;procedure calls and objects. However this abstraction was shown to be&lt;br /&gt;a bad one, perhaps most famously by &lt;a href="http://research.sun.com/techrep/1994/abstract-29.html"&gt;Waldo et al&lt;/a&gt;. By pretending remote&lt;br /&gt;things are just like local things, the programmer is lead to ignore&lt;br /&gt;latency and partial failures. (Though it is hard to imagine people like&lt;br /&gt;&lt;a href="http://research.microsoft.com/~Gray/"&gt;Jim Gray&lt;/a&gt; and &lt;a href="http://research.microsoft.com/Lampson/"&gt;Butler Lampson&lt;/a&gt; would fall into this trap.)&lt;br /&gt;&lt;br /&gt;There has been a lot of debate in the past with issues like SOA != RPC&lt;br /&gt;and WS != Distributed Objects. However I think there are two different&lt;br /&gt;issues, one is the architecture of a distributed system, the other is&lt;br /&gt;the programming model. I can take a good architecture, for example REST,&lt;br /&gt;and use a bad programming model, for example RPC, to build a client&lt;br /&gt;library.&lt;br /&gt;&lt;br /&gt;A Web services toolkit can attempt to hide the complexity associated&lt;br /&gt;with distributed systems, and it may initially be easy and familiar to use,&lt;br /&gt;but ultimately it will betray the programmer by misleading him just like RPC.&lt;br /&gt;Or you can design a better programming model, which does not mislead the&lt;br /&gt;programmer but would probably be more a difficult programming environment.&lt;br /&gt;A naive programmer might think the first is better because it is easier&lt;br /&gt;to use. There is a balance to be struck.&lt;br /&gt;&lt;br /&gt;An asynchronous programming style forces the programmer to&lt;br /&gt;think about the system more carefully as a distributed system,&lt;br /&gt;perhaps this is why Dave is advocating asynchronous. However to&lt;br /&gt;me, someone who uses raw sockets a lot, it muddies the discussion&lt;br /&gt;on architectures of distributed systems.  &lt;br /&gt;&lt;br /&gt;REST doesn't prescribe a programming model it is focused purely on&lt;br /&gt;the architecture of the system, you can write bad HTTP clients and&lt;br /&gt;services, or you can write good ones like &lt;a href="http://search.cpan.org/perldoc?LWP"&gt;LWP&lt;/a&gt;. SOA/WS-* has had its feet&lt;br /&gt;stuck in the debate about programming models, which has caused&lt;br /&gt;me confusion.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-116921339583114288?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/116921339583114288/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=116921339583114288' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116921339583114288'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116921339583114288'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2007/01/asynchronous.html' title='Asynchronous?'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-116827810112626328</id><published>2007-01-08T08:57:00.000-08:00</published><updated>2007-01-08T09:41:41.170-08:00</updated><title type='text'>Pat Helland Discovers REST - the hard way!</title><content type='html'>Pat Helland has an interesting paper on scalability and transactions called: &lt;A HREF="http://www-db.cs.wisc.edu/cidr/cidr2007/papers/P15.pdf"&gt; Life beyond Distributed Transactions: an Apostate’s Opinion&lt;/A&gt;. First, it raises questions about our use of Paxos Commit to solve the co-allocation problem in HARC. However, we don't expect co-allocation to ever involve more than a few resources and we do not expect a heavy demand on the service, also we are quite loose about serializability. &lt;br /&gt;&lt;br /&gt;The more interesting aspect of the paper is the similarity of the ideas in it and the concepts in REST. The paper discusses "infinite scalability", yet never mentions the Web, which must be the best example of infinite scalability in a distributed application (Fielding uses the term anarchic scalability which I prefer as it sounds even more daunting). I guess this reflects Pat's background as a transaction guru; he is interested in scaling out the systems he knows.&lt;br /&gt;&lt;br /&gt;The paper introduces the concept of "entities" which are identified by keys and to which messages are sent. This maps to resources and URIs, the issues of primary keys and secondary keys wrt to scalability can be handled more elegantly and transparently using DNS trickery and re-directs. The paper also talks about the usefulness of the concept of recognising that some messages are idempotent, for example reads - hello GET and PUT! He misses the usefulness of caching though, I am sure he might like it.&lt;br /&gt;&lt;br /&gt;Also the seperation of the application into two layers, with only the lower layer needing to be scale aware maps to the Web. The developer writes the scale agnostic layer that interfaces with the scale aware layer through the "Scale Agnostic Programming Abstraction".&lt;br /&gt;This means he does not have to worry about scaling issues and can leave them to the developer of the scale aware layer. On the Web, a Web developer can use a Yahoo API (the Scale Agnostic Programming Abstraction, which is of course RESTful), to write applciations which have no idea about the scale of Yahoo or how the internals of Yahoo works.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-116827810112626328?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/116827810112626328/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=116827810112626328' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116827810112626328'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116827810112626328'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2007/01/pat-helland-discovers-rest-hard-way.html' title='Pat Helland Discovers REST - the hard way!'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-116559474315530179</id><published>2006-12-08T08:19:00.000-08:00</published><updated>2006-12-08T08:19:04.626-08:00</updated><title type='text'>SAML and URNs</title><content type='html'>&lt;a href="http://betathoughts.blogspot.com/"&gt;SAML&lt;/a&gt;&lt;br /&gt;I am reading through the &lt;A HREF="http://www.oasis-open.org/committees/security/"&gt;SAML&lt;/A&gt; specifications in an effort to understand &lt;A HREF="http://shibboleth.internet2.edu/"&gt;Shibboleth&lt;/A&gt;. &lt;A HREF="http://openid.net/"&gt;OpenID&lt;/A&gt; seems to be a competing technology to solve the &lt;A HREF="http://en.wikipedia.org/wiki/Single_sign_on"&gt;single sign on(SSO)&lt;/A&gt; problem. OpenID is a grassroots effect, while Shibboleth is industry lead, so it will be interesting to see who wins this one. Shibboleth offers more functionality by going beyond just supporting idenity, it can also provide attributes about the user that can be used by a service for making authorization decisions. The cost of this is complexity and performance: lots of XML that has to be signed and encrypted.&lt;br /&gt;&lt;br /&gt;One interesting thing in the SAML specs is that it states that Web proxies should not cache certain messages. Is it worth while saying this, can you really trust proxies to do what you ask them? (Byzantine faults) What are the consequences if they do cache them?&lt;br /&gt;&lt;br /&gt;The real point of this post is this URI from SAML 2.0:      &lt;br /&gt;&lt;br /&gt;   urn:oasis:names:tc:SAML:2.0:status:Success&lt;br /&gt;&lt;br /&gt;Tried clicking on it? &lt;A HREF="http://www.w3.org/2001/tag/doc/URNsAndRegistries-50.html"&gt;First it should be a http URI&lt;/A&gt;&lt;br /&gt;&lt;br /&gt;Second I am concerned about the 2.0. Has success been redefined since SAML 1.0? &lt;br /&gt;&lt;br /&gt;Now for a bit of HTTP bashing. &lt;A HREF="http://diveintomark.org/archives/2006/12/07/rest-for-toddlers"&gt;Even toddlers know what HTTP 200, 404, 201 etc mean&lt;/A&gt;. But can we reuse them in other specs. Maybe if they were URIs! Why are the error codes in HTTP not URIs? Anything that can have a URI can be a resource, surely 404 deserves to have URI.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-116559474315530179?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/116559474315530179/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=116559474315530179' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116559474315530179'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116559474315530179'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2006/12/saml-and-urns.html' title='SAML and URNs'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-116506352369830020</id><published>2006-12-02T04:45:00.000-08:00</published><updated>2006-12-02T04:45:24.626-08:00</updated><title type='text'>WS-Security, simple enough to shoot yourself in the foot with.</title><content type='html'>&lt;a href="http://betathoughts.blogspot.com/"&gt;WS-Security, simple enough to shoot yourself in the foot with.&lt;/a&gt;&lt;br /&gt;Gunnar Peterson's critiques &lt;A HREF="http://1raindrop.typepad.com/1_raindrop/2006/12/rest_security_o.html"&gt;REST security in terms of WS-Security&lt;/A&gt;, I just have to comment. Frist I have helped implement WS-Security in Perl, in particular signing, so I got to know the specs pretty well. Some good things about message level security and WS-Security:&lt;br /&gt;&lt;br /&gt;1) The WS-Security spec is pretty good and well thought out by people who know the issues. It was a learning experience for me.&lt;br /&gt;&lt;br /&gt;2) Message level security is much more flexible than transport layer security. You can go beyond simple client/server architecture,&lt;br /&gt;so you can go beyond what REST. For example in the Grid world you could use message level security to build a resource broker:&lt;br /&gt;client submits a job submission message to a resource broker, resource broker picks resource and sends message to resource, resource authenicates message and runs job. (OK there is a whole issue of what the hell you do with the WS-Addressing headers, if the client signs the wsa:To then how do you forward the message on).&lt;br /&gt;&lt;br /&gt;Now the difficult stuff: how many people know why you must sign the X509 token you use to sign a message? Reason: mutiple certificates &lt;br /&gt;can share the same public key but have different authorization levels, you don't want someone switching certificates and getting different access rights. How many people know you need to include a timestamp to avoid repeat attacks when signing messages? You will also need a unique message ID along with that timestamp, luckily WS-Addressing provides one - did you know that you need to sign WS-Addressing headers to secure messages? You better record those timestamps &amp; message IDs, so you are gonna need transactional writes. Also signing/encrypting at the XML level is pretty slow - so many bits to sign, all that XML Canonicalization etc, so you might need a few extra servers; of course with them all accessing the same stable storage for every message to check message IDs and timestamps. Oh, and better make sure the client/server clocks are synch'd, cause it can be a pain to debug if not.&lt;br /&gt;&lt;br /&gt;Of course the toolkits will handle this for you. Wrong. In WSRF::Lite it automatically included and signed the X509 cert and timestamp, part of WS-Security spec strongly recommends this. Then someone from a huge(TM) company asked if this could be turned off on the client side as the service developed using some big name toolkit didn't use this. I pointed out that this could be a security vulnerability and that they should also be signing WS-Addressing headers, I got a "well I duno anything about that, but the client wants it this way" - well I hope they are using that gigantic hole in the firewall, AKA SSL. People too often think that just turning on WS-Security will make their application secure. &lt;br /&gt;&lt;br /&gt;As for REST, XML signature and encryption can be used in REST. In fact it might be eashier as you don't get caught in the flat structure of SOAP Header/Body - you can use a Russian doll model (or onion model) of embedding XML in XML, each service peels of the layer it is interested in before passing the message onto the next service. No service has to understand the XML in any other layer but its own. This is the model that we had planned for &lt;A HREF="http://www.allhands.org.uk/2006/proceedings/papers/624.pdf"&gt;HARC&lt;/A&gt;, it will be interesting to see if it works ;-)&lt;br /&gt;&lt;br /&gt;For more pain on this subject see: &lt;A HREF="http://betathoughts.blogspot.com/2006/11/taverna-and-security.html"&gt;Taverna and Security&lt;/A&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-116506352369830020?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/116506352369830020/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=116506352369830020' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116506352369830020'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116506352369830020'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2006/12/ws-security-simple-enough-to-shoot.html' title='WS-Security, simple enough to shoot yourself in the foot with.'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-116429339739424131</id><published>2006-11-23T06:49:00.000-08:00</published><updated>2006-11-23T06:49:57.443-08:00</updated><title type='text'>A REST hall of shame...</title><content type='html'>&lt;a href="http://betathoughts.blogspot.com/"&gt;More GET madness!&lt;/a&gt;&lt;br /&gt;You might think I am looking for examples of the misuse of GET but they just keep appearing, and from Apache products too. This time it is Tomcat which can be managed, and even parts of it undeployed, with HTTP &lt;A HREF="http://tomcat.apache.org/tomcat-6.0-doc/manager-howto.html#Supported Manager Commands"&gt;GET&lt;/A&gt;. Maybe there should be a REST hall of shame.&lt;br /&gt;&lt;br /&gt;The GETs are ment to be used from scripts written by sys admins. In a universe far, far away Grid people are trying to solve this using &lt;A HREF="http://msdn.microsoft.com/library/en-us/dnglobspec/html/ws-management.pdf"&gt;WS-Management&lt;/A&gt; or &lt;A HREF="http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=wsdm"&gt;WSDM&lt;/A&gt;, which without looking at either I can guarantuee are going to pretty complex. There must be some middle ground.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-116429339739424131?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/116429339739424131/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=116429339739424131' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116429339739424131'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116429339739424131'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2006/11/rest-hall-of-shame.html' title='A REST hall of shame...'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-116421242925812072</id><published>2006-11-22T08:20:00.000-08:00</published><updated>2006-11-22T08:20:29.576-08:00</updated><title type='text'>There is no reference implementation of WSRF</title><content type='html'>&lt;a href="http://betathoughts.blogspot.com/"&gt;There is NO reference implementation of WSRF.&lt;/a&gt;&lt;br /&gt;Reading this &lt;A HREF="http://sc06.supercomputing.org/schedule/event_detail.php?evid=5017"&gt;paper&lt;/A&gt; on co-allocation presented at SC06 I came across this:&lt;br /&gt;&lt;br /&gt;&lt;it&gt;We have implemented a prototype of the proposed system&lt;br /&gt;as a set of cooperative Grid services using Globus Toolkit 4&lt;br /&gt;(GT4) [Globus ], that is a reference implementation of WS-&lt;br /&gt;Resource Framework [WSRF ],...&lt;/it&gt;&lt;br /&gt;&lt;br /&gt;GT4 is not a reference implementation of WSRF, there is no reference implementation and GT4 is not WSRF compliant. The reviewers should have caught this.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-116421242925812072?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/116421242925812072/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=116421242925812072' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116421242925812072'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116421242925812072'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2006/11/there-is-no-reference-implementation.html' title='There is no reference implementation of WSRF'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-116421057517326313</id><published>2006-11-22T07:49:00.000-08:00</published><updated>2006-11-22T07:49:35.736-08:00</updated><title type='text'>Bruno Blogs</title><content type='html'>&lt;a href="http://betathoughts.blogspot.com/"&gt;Bruno Blogs&lt;/a&gt;&lt;br /&gt;Bruno from &lt;A HREF="http://www.mcc.ac.uk/"&gt;Manchester Computing&lt;/A&gt; has started bloging with a &lt;A HREF="http://blog.harbulot.com/post/2006/11/22/Experiences-with-WSRF"&gt;salvo at WSRF&lt;/A&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-116421057517326313?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/116421057517326313/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=116421057517326313' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116421057517326313'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116421057517326313'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2006/11/bruno-blogs.html' title='Bruno Blogs'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-116404208791323382</id><published>2006-11-20T09:01:00.000-08:00</published><updated>2006-11-20T09:01:27.913-08:00</updated><title type='text'>The Lost Update and HTTP</title><content type='html'>&lt;a href="http://betathoughts.blogspot.com/"&gt;The Lost Update and HTTP&lt;/a&gt;&lt;br /&gt;This is a very good description of how to handle the "lost updates" problem when using HTTP. I have never understood etags until now: &lt;a href="http://www.w3.org/1999/04/Editing/"&gt;Editing the Web - Detecting the Lost Update Problem Using Unreserved Checkout&lt;/a&gt;. WS-RF and WS-RT of course do not have support for this, you would need to compose with WS-AtomicTransaction or WS-BA.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-116404208791323382?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/116404208791323382/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=116404208791323382' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116404208791323382'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116404208791323382'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2006/11/lost-update-and-http.html' title='The Lost Update and HTTP'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-116404199431050011</id><published>2006-11-20T08:59:00.000-08:00</published><updated>2006-11-20T08:59:54.313-08:00</updated><title type='text'>Me, A Web Fundamentalist!</title><content type='html'>&lt;a href="http://betathoughts.blogspot.com/"&gt;Web Fundamentalist&lt;/a&gt;&lt;br /&gt;I think I might be one of Ian's &lt;A HREF="http://ianfoster.typepad.com/blog/2006/11/the_web_thought.html"&gt;Web Fundamentalists&lt;/a&gt;!&lt;br /&gt;&lt;br /&gt;I will take it as a complement :-)&lt;br /&gt;&lt;br /&gt;His defintion of fundamentalism is worth repeating:&lt;br /&gt;&lt;br/&gt;&lt;br /&gt;&lt;it&gt;&lt;br /&gt;Fundamentalism is a continuing historical phenomenon, characterized by a sense of embattled alienation in the midst of the surrounding culture, even where the culture may be nominally influenced by the adherents' religion. The term can also refer specifically to the belief that one's religious texts are infallible ...&lt;br /&gt;&lt;/it&gt;&lt;br /&gt;&lt;br/&gt;&lt;br /&gt;Within in the Web community I guess that is a good way to catagorise the REST people: they think a lot of the Web is being done badly eg cookies and GETs with side effects. We also have our &lt;A HREF="http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm"&gt;religious texts&lt;/A&gt;, which we think is infallible.&lt;br /&gt;&lt;br /&gt;However Ian is mixing the &lt;A HREF="http://jim.webber.name/"&gt;MEST&lt;/A&gt; &lt;A HREF="http://savas.parastatidis.name/"&gt;heads&lt;/A&gt; (aka POST everything through the one URL) with the REST people, and tarring them with the one brush. Sorry Ian, REST v MEST is a seperate battle ;-)&lt;br /&gt;&lt;br /&gt;Unfortunately Ian doesn't 'get' REST. The unifrom interface is not just a set of HTTP methods, first HTTP is just protocol to which REST has been applied to the design of, not REST itself, and second the uniform interface goes beyond just operations; it includes things such as how you identify resources|objects|Grid services. Many of the ideas behind OGSI can be mapped to REST, it was just a question of &lt;br /&gt;application. Ian your a RESTafarian, you just don't know it ;-)&lt;br /&gt;&lt;br /&gt;In the end there is not much difference between OGSI, WS-RF and WS-RT; as if to illustrate the point people are even talking about using XSLT to create interoperability between WS-RF and WS-RT! There is not enough difference to waste all those years on anyway. In fact the point could be made that OGSI is better than WS-RF/WS-RT because it has support for identification which the others lack.&lt;br /&gt;&lt;br /&gt;The whole stateful/stateless argument was a red herring. OGSI is a stateless protocol, just like HTTP and NFS. If you needed stateful interactions on top of OGSI you could compose with WS-Context, just like you can use cookies with HTTP.&lt;br /&gt;&lt;br /&gt;Is REST the right approach to Grid computing? For some parts yes, for others no (Jabber is an interesting option that we have been playing with). REST is an architectural style for building a large scale distributed hyper-media system, where that overlaps with Grid computing it should be used, where it does not a different approach should be used. Would it be possible for a Grid architectural style? That is the huge challenge which I guess the &lt;A HREF="http://www.ogf.org/gf/group_info/view.php?group=ogsa-wg"&gt;OGSA&lt;/A&gt; group is facing. &lt;br /&gt;&lt;br /&gt;For me, I have demonstrated that for &lt;A HREF="http://www.realitygrid.org/"&gt;RealityGrid&lt;/A&gt; a REST approach was better than the WSRF approach: &lt;A HREF="http://www.realitygrid.org/publications/wsrfajax.pdf"&gt;Combining AJAX and WSRF for Web-browser based Grid clients&lt;/A&gt;.  I have also demonstrated that REST can be used to solve&lt;br /&gt;complex Grid problems like the co-allocation problem in a fault tolerant way that is beyond the fu&lt;br /&gt;nctionality provided by the WS-* specs:&lt;br /&gt;&lt;A HREF="http://www.allhands.org.uk/2006/proceedings/papers/624.pdf"&gt;Co-Allocation, Fault Tolerance and Grid Computing&lt;/A&gt;, addressing the charge that it is only suitable for trivial problems. &lt;br /&gt;&lt;br /&gt;Well at least Ian called me a Web fundamentalist in a RESTful way. I might use it in my e-mail signature, maybe he could include it in a reference for me too :-)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-116404199431050011?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/116404199431050011/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=116404199431050011' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116404199431050011'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116404199431050011'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2006/11/me-web-fundamentalist.html' title='Me, A Web Fundamentalist!'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-116404181479650156</id><published>2006-11-20T08:56:00.000-08:00</published><updated>2006-11-20T08:56:54.796-08:00</updated><title type='text'>Axis2: This Ain't REST</title><content type='html'>&lt;a href="http://betathoughts.blogspot.com/"&gt;This Ain't REST&lt;/a&gt;&lt;br /&gt;I have been playing around with Axis2 to create some sample services for the Taverna people to chew on when I looked closer at the "REST support"....&lt;br /&gt;&lt;br /&gt;&lt;tt&gt;&lt;br /&gt;Running the Client&lt;br /&gt;==================&lt;br /&gt;- From your browser, If you point to the following URL:&lt;br /&gt;http://localhost:8080/axis2/rest/StockQuoteService/getPrice?symbol=IBM&lt;br /&gt;&lt;br /&gt;You will get the following response:&lt;br /&gt;&lt;ns:getPriceResponse&gt;&lt;ns:return&gt;42.0&lt;/ns:return&gt;&lt;/ns:getPriceResponse&gt;&lt;br /&gt;&lt;br /&gt;- If you invoke the update method like so:&lt;br /&gt;http://localhost:8080/axis2/rest/StockQuoteService/update?symbol=IBM&amp;price=100&lt;br /&gt;&lt;br /&gt;And then execute the first getPrice url. You can see that the price got updated.&lt;br /&gt;&lt;/tt&gt; &lt;br /&gt;&lt;br /&gt;So you use GET to update the price! I think not. GET is safe, it should have no&lt;br /&gt;side effects. This is bad because it is a toolkit advocating an approach, and&lt;br /&gt;to think that it comes from the Apache foundation.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-116404181479650156?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/116404181479650156/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=116404181479650156' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116404181479650156'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116404181479650156'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2006/11/axis2-this-aint-rest.html' title='Axis2: This Ain&apos;t REST'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-116404168806810114</id><published>2006-11-20T08:54:00.000-08:00</published><updated>2006-11-20T08:54:48.100-08:00</updated><title type='text'>Beta Thoughts</title><content type='html'>&lt;a href="http://betathoughts.blogspot.com/"&gt;Taverna Meeting&lt;/a&gt;&lt;br /&gt;Had a good meeting with June and some of the developers of Taverna to discuss security and the requirements. They are keen to improve support for security within Taverna but will be doing it on a case by case basis, ie you provide a service that uses some form of security and they will add support for it to Taverna. &lt;br /&gt;&lt;br /&gt;I will create a service that uses HTTPS with mutual authenication and WS-Addressing EPRs and see if the Taverna guys can connect to it. Maybe later decorate the WSDL with some WS-SecurityPolicy.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-116404168806810114?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/116404168806810114/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=116404168806810114' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116404168806810114'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116404168806810114'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2006/11/beta-thoughts.html' title='Beta Thoughts'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-116317814767351566</id><published>2006-11-10T09:02:00.000-08:00</published><updated>2006-11-10T09:02:27.686-08:00</updated><title type='text'>Three weeks down, two to go....</title><content type='html'>&lt;a href="http://betathoughts.blogspot.com/"&gt;Three weeks down, two to go&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Elise as completed her third week of radiotherapy! The time has flown by so fast it is hard to believe, only two more weeks to go. She has had no side effects yet: skin damage, bowel problems, eating problems nor tiredness, which is great. And to think how worried we were at the start...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-116317814767351566?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/116317814767351566/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=116317814767351566' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116317814767351566'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116317814767351566'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2006/11/three-weeks-down-two-to-go.html' title='Three weeks down, two to go....'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-116317501413260408</id><published>2006-11-10T08:10:00.000-08:00</published><updated>2006-11-10T08:10:14.133-08:00</updated><title type='text'>Preview doesn' work too good...</title><content type='html'>&lt;a href="http://betathoughts.blogspot.com/"&gt;Beta Thoughts&lt;/a&gt;&lt;br /&gt;Mmmh, looking at my &lt;A HREF="http://betathoughts.blogspot.com/2006/11/taverna-and-security.html"&gt;last post&lt;/a&gt; it seems the preview doesn't work too well. Must find something better...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-116317501413260408?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/116317501413260408/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=116317501413260408' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116317501413260408'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116317501413260408'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2006/11/preview-doesn-work-too-good.html' title='Preview doesn&apos; work too good...'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-116317260857193791</id><published>2006-11-10T07:28:00.000-08:00</published><updated>2006-11-10T08:06:52.233-08:00</updated><title type='text'>Taverna and Security</title><content type='html'>What does Taverna need to support security?&lt;br/&gt;&lt;br/&gt;Read over 200 pages of WS-* specs to try and understand what is required for &lt;a href="http://taverna.sourceforge.net/"&gt;Taverna &lt;/a&gt;to support security. The horror, the horror.&lt;br/&gt;&lt;br/&gt;Taverna is a workflow tool that came out of the &lt;a href="http://www.mygrid.org.uk/"&gt;MyGrid &lt;/a&gt;project that&lt;br/&gt; has got a lot of good press. We are thinking about using it for the &lt;a href="http://www.nanocmos.ac.uk/"&gt;NanoCMOS&lt;/a&gt; project. Unfortunately Taverna doesn't do security, and we have a strong requirement for security; the good news is that they are working on it!&lt;br/&gt;&lt;br/&gt;Downloaded and played with Taverna, and it looks pretty good. &lt;br/&gt;Noticed that many of the services for which Taverna is pre-configured have lots of get* type operations. We need one &lt;a href="http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm"&gt;GET&lt;/a&gt; to rule them all!&lt;br/&gt;&lt;br/&gt;I have been asked to look into what is needed by Taverna to support&lt;br/&gt;security given my experience implementing &lt;a href="http://www.sve.man.ac.uk/Research/AtoZ/ILCT"&gt;WS-Security for Perl&lt;/a&gt;.&lt;br/&gt;Beyond support for &lt;a href="http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=wss"&gt;WS-Security&lt;/a&gt;, &lt;a href="http://www.ibm.com/developerworks/library/ws-secon/"&gt;WS-SecureConversation&lt;/a&gt; and &lt;a href="http://www.ietf.org/rfc/rfc2818.txt"&gt;HTTPS&lt;/a&gt; it looks like the following specs are  important: &lt;br/&gt;&lt;br/&gt;&lt;a href="http://www.w3.org/2002/ws/addr/"&gt;WS-Addressing&lt;/a&gt; (signed SOAP Headers for end-2-end security, also &lt;br/&gt;policy can be stuck in the  meta-data of the WS-Addressing EPRs), &lt;br/&gt;&lt;br/&gt;&lt;a href="http://www.w3.org/Submission/WS-Policy/"&gt;WS-Policy&lt;/a&gt; and &lt;a href="http://www.oasis-open.org/committees/download.php/15979/oasis-wssx-ws-securitypolicy-1.0.pdf"&gt;WS-SecurityPolicy&lt;/a&gt;(the language for declaring security policy, I thought this was pretty cool as it included support for saying things like "I want mutual authentication over HTTPS" etc.),&lt;br/&gt;&lt;a href="http://www.w3.org/Submission/WS-PolicyAttachment/"&gt;&lt;br/&gt;WS-PolicyAttachments&lt;/a&gt; (were you find policy, for example tells you how to extend WSDL to include policy)&lt;br/&gt;&lt;br/&gt;&lt;A href="http://download.boulder.ibm.com/ibmdl/pub/software/dw/specs/ws-mex/"&gt;WS-MetaDataExchange&lt;/A&gt; (policy can be found and retrieved using WS-MEX, unfortunate dependence on &lt;A HREF="http://devresource.hp.com/drc/specifications/wsrt/WS-ResourceTransfer-v1.pdf"&gt;WS-RT&lt;/A&gt; which is getting a &lt;A HREF="http://lists.w3.org/Archives/Public/www-tag/2006Oct/0061.html"&gt;rough reception&lt;/A&gt; at the W3C).&lt;br /&gt;&lt;br/&gt;&lt;br /&gt;Of course WS-SecurityPolicy had no support for declaring policy with&lt;br /&gt;respect to &lt;A HREF="http://www.ietf.org/rfc/rfc3820.txt"&gt;Proxy certificates&lt;/A&gt;, more work for the &lt;A HREF="http://www.ogf.org/"&gt;OGF&lt;/A&gt; then. Other specs that may have an impact, but which I haven't looked at are&lt;br /&gt;&lt;A HREF="http://www.oasis-open.org/committees/security/"&gt;SAML&lt;/A&gt;, &lt;A HREF="http://www.verisign.com/wss/WS-Trust.pdf"&gt;WS-Trust&lt;/A&gt;, and I am sure there are others.&lt;br /&gt;&lt;br/&gt;&lt;br /&gt;If all this stuff worked and the tools consumed it with eash then it would be pretty powerful stuff, big if though.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-116317260857193791?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/116317260857193791/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=116317260857193791' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116317260857193791'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116317260857193791'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2006/11/taverna-and-security.html' title='Taverna and Security'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-37470176.post-116316810309987134</id><published>2006-11-10T06:10:00.000-08:00</published><updated>2006-11-10T06:38:38.576-08:00</updated><title type='text'>Globus GT 4.0.3 Not WSRF compliant!</title><content type='html'>Downloaded &lt;a href="http://www.globus.org/toolkit/"&gt;Globus GT4.0.3&lt;/a&gt; to evaluate it for the &lt;a href="http://labserv.nesc.gla.ac.uk/projects/nanoCMOS/index.html"&gt;NanoCMOS&lt;/a&gt; project. Unfortunately it uses really old versions of the &lt;a href="http://www.oasis-open.org/committees/wsrf/"&gt;WSRF&lt;/a&gt; and &lt;a href="http://www.w3.org/2002/ws/addr/"&gt;WS-Addressing&lt;/a&gt; specifications. This means that GT 4.0.3. is not WSRF compliant. It also means that services developed with GT 4.0.3 are not WSRF compliant either!&lt;br/&gt;&lt;br/&gt;I &lt;a href="http://www.globus.org/mail_archive/gt-user/2006/11/msg00054.html"&gt;asked&lt;/a&gt; when it would become compliant, but have got no response yet.&lt;br/&gt;&lt;br/&gt;I cannot recommend that we use GT4 until we know what its timelines are. Once it does become compliant there is a strong chance older versions will not interoperate, making existing code and services obselete. Also many other WS-* tools are using the correct version of WS-Addressing, so they won't play nicely with GT4.&lt;br/&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/37470176-116316810309987134?l=betathoughts.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://betathoughts.blogspot.com/feeds/116316810309987134/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=37470176&amp;postID=116316810309987134' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116316810309987134'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/37470176/posts/default/116316810309987134'/><link rel='alternate' type='text/html' href='http://betathoughts.blogspot.com/2006/11/globus-gt-403-not-wsrf-compliant.html' title='Globus GT 4.0.3 Not WSRF compliant!'/><author><name>Mark Mc Keown</name><uri>http://www.blogger.com/profile/08219476216804177189</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>3</thr:total></entry></feed>
