I was recently interviewed by Jon Brodkin for an article on Ars Technica called “Amazon’s HPC cloud: supercomputing for the 99%”; I suppose I was the dissenting voice. :-) It’s actually a pretty good article talking about the current state of Amazon’s cloud for doing high-performance computing, and devotes a decent amount of space to the shortcomings in the interconnect.
At one point the article summarizes the results of some benchmarks I did to compare 10-gigabit Ethernet on EC2’s cluster compute instances to an R Systems cluster with QDR Infiniband.
Tests run by R Systems using an MPI benchmark from Ohio State showed latencies in the passing of small messages (up to 4KB) to be between 1.4 and 6 microseconds in an InfiniBand-based cluster. Comparatively, Amazon’s 10 Gigabit Ethernet connections produced latencies of 100 to 111 microseconds. For passing of larger messages (4MB), bandwidth hit 3,031 megabytes per second with InfiniBand, and only 484 megabytes per second on Amazon.
I used the micro-benchmarks from the Mvapich team. In case it’s interesting to anyone, the full results can be found below.
Note that no real optimizations were done with either network, so these are pretty “naive” numbers.