Wednesday, February 24, 2010

ExtraHop

Some guys I worked with at F5 Networks created a start up called ExtraHop. It was clear that whatever these guys did it was going to be impressive and run very fast - they did not disappoint. They built an appliance to monitor your network: hang it of your network, spend 15 minutes configuring it and there you go, it will learn all about your network and tell you when things are not running smoothly and why. It works at layer 7, so it knows when a database is not running properly, or when a CIFS server is mis-behaving, or when a HTTP server tps has dropped. The really amazing thing - it runs at 10GbE! Years talking to BIG-IP customers was not wasted. They also have a cool service where you can upload you pcaps and their magic software will analysis it for you!!

There are also some videos of their stuff in action.

Tuesday, February 23, 2010

A History of malloc

I was reading up on malloc. I had naively assumed malloc was a system call, it's not. Under the covers malloc uses brk or sbrk to request memory from the kernel. However, malloc does not always have to use brk/sbrk, if it has memory in its "free list" that can fulfill the request then no system call is needed. So you do not always have to pay the price of a system call when you use malloc.

Another interesting thing about malloc is that it cannot return memory to the kernel unless the memory that is freed is at the top of the heap. If you malloc some memory at the start of your program, malloc some more later and then free the original memory, malloc/free cannot return the original chunk of memory to the kernel until the second piece of memory is freed.

The first malloc was written in "The Old Testament" - K&R, it's about 200 lines of code. They managed the free list by using a union with the memory that was actually stored in the free list - this saved space, an important requirement when the amount of memory available was very limited.

Poul-Henning Kamp re-wrote malloc for FreeBSB 2.2 and documented it in Malloc(3) Revisited, this malloc is known as pkmalloc. By this time systems where using virtual memory, this meant that in the K&R approach a chunk of memory on the free list could be paged out to disk, now the free list was embedded in these chunks of memory so when malloc came to look for memory on the free list it would have to page all this memory back in, killing performance!! Kamp's version of malloc was 1136 lines of code long and had a good reputation for performance.

Then came fast multi-processor machines with large memory, and another re-write of malloc. Jason Evans re-wrote malloc for FreeBSB and his version is known as jemalloc, he wrote about it in A Scalable Concurrent Malloc(3) Implementation for FreeBSD. Now the issues are less about paging to disk but fast locks and worrying about NUMA issues (trying to allocate memory close to the CPU that you think will be using it). Firefox are attempting to use jemalloc internally for their memory management.

There are quite a few malloc implementations out there, Google have one called tcmalloc in their perftools bundle. It's very easy to swap the malloc your code uses, all you have to do is link against the library with the new malloc. Though this can sometimes lead to trouble :-)

Friday, February 19, 2010

Ubuntu + D-Link + linkedin == Trouble

Hit this bug yesterday when trying to access Linkedin, weird nearly all other web sites I visit don't exhibit this problem! Went with the set "MTU to 1360" hack. If I have time I will look into this some more - network bugs can be very weird.....

Thursday, February 18, 2010

Post Modern Programming

Varnish is a reverse Web proxy cache. What makes it interesting is how it was designed. The argument is that people are programming like it's 1975, treating RAM and disk as two separate memory pools, instead Varnish views them as a single memory pool with the RAM acting just like a cache. It does this by mmap'ing a large file, the threads just read and write to this memory happily unaware that it is being backed to disk. This has the advantages of reduced complexity, no requirement to manage a RAM and disk cache, and also a lot less system calls (no read/write to disk). It also means that they have to use lots of threads because any memory operation could cause a thread to block because of a page fault - so they have a thread per connection model. This is contrary to how a lot of people develop these kinds of applications, they have an event driven system with only a couple of threads, or just one thread, and they focus on making sure that thread never blocks.

Wednesday, February 17, 2010

Am I a bad Web Citizen?

I noticed in my previous post I did not include any links. There were many things in the post I could have referenced, but I did not, because I was lazy. I am relying on the extra level of indirection that Google provides - if someone is interested in something then they can Google it. This is wrong for two reasons.

First, I can include links that back up my case. Google may turn up links that weaken my case and reduce my credibility.

Secondly, the Web depends on links and they are of fundamental importance to the Web.

Mea Culpa

What does it mean?

I was reading Warren Buffet's entry on wikipedia the other day and came across this phrase: "Price is what you pay, value is what you get".

I don't understand the phrase, or rather I do understand the phrase, but don't get the message it is trying to convey. I haven't Google'd the phrase but left it to stew in my brain to see if I could come up with the significance of the phrase, unfortunately it is still stewing.

The phrase is a tautology, it states a definition of two words, price and value, that is in no way controversial. So why does it make Mr Buffet's Wikipedia entry?

I really like phrases like this. They can convey so much information with so few words. I would not be surprised to find a book with the title "Price is what you pay, value is what you get", or that its an essay question on some Economics degree course. You can use phrases like these as names for ideas, names that are self explanatory.

An equivalent phrase from the software development world is "Premature development is the root of all evil" - you just have to quote this and everyone understands and rolls their eyes. (As an aside people often attribute this to Knuth, but it originally came from Tony Hoare and Knuth quoted him in "Structured Programming using GOTO statements", Knuth's repost to Dijstra's "GOTO considered harmful").

An interesting difference between the two phrases is that one is (nearly) a tautology and the other is obviously untrue - premature optimisation did not throw up Hitler. The fact that it is a tautology I think adds value, it almost says the message I carry is also a tautology - which it probably isn't.

I look forward to the day in some meeting I can say "Price is what you pay, value is what you get", hopefully no one will say WTF do you mean?

A personal favorite phrase of mine is "The talent is in the choices", I have used it in a couple of talks I have given and I am pretty sure that is all people remember from the talks; that they remember anything from one of my talks I see as a success. I have never been able to find out who first said this, but I know Robert De Niro used it. The choices an actor makes reflect his talent, talented actors make good choices - apparently he agonized for three months as to whether or not to have a moustache in Godfather part II. A software engineer is often faced with choices about how to design or implement something, good engineer's make good choices and clearly experience plays a part in this. And sometimes an obvious choice is not the correct one, and good engineers will recognize this. This is also why Agile development is so successful, you make a bunch of choices, they turn out to be bad, you refactor.

Phrases like these can also be used in a kind of harmful way. I remember at one meeting when we were talking about using a new technology that was rapidly evolving at the time (Web Services :-() and someone said we are building on sand, this was quickly and cleverly countered with "Yes, but its a better quality sand". By the time people had processed this statement and released it was complete crap the debate had moved on.