Network caps in cloud environments

Bastian Blank - 12 August 2017

Providing working network is not easy. Providing enough troughput is not easy either.

Providing working network is not easy. All the cloud providers seem to know how to do that most of the time. Providing enough troughput is not easy either. Here it get's interresting as the cloud providers tackle that problem with completely different results.

There are essentially three large cloud providers. The oldest and mostly known cloud provider is Amazon Web Services (AWS). Behind that follow Microsoft with Azure and the Google Cloud Platform (GCP). Some public instances of OpenStack exist, but they simply don't count anyway. So we remain with three and they tackle this problem with widely different results.

Now, what network troughput is necessary for real world systems anyway? An old friend gives the advice: 1Gbps per Core of uncongested troughput within the complete infrastructure is the minimum. A generalization of this rule estimates around 1bps per clock cycle and core, so a 2GHz core would need 2Gbps. Do you even get a high enough network cap at your selected cloud provider to fill any of these estimates?

Our first provider, AWS, publishes a nice list of network caps for some of there instance types. The common theme in this list is: for two cores (all the *.large types) you get 500Mbps, for four cores (*.xlarge) you get 750Mbps and for eight cores (*.2xlarge) you get 1000Mbps. This is way below our estimate shown above and does not even raise linear with the number of cores. But all of this does not really matter anyway, as the performance of AWS is the worst of the three providers.

Our second provider, Azure, seems to not publish any real information about network caps at all. From my own knowledge it is 50MBps (500Mbps) per core for at least the smaller instances. At least is scales linear with instance size, but is still way below our estimates.

Our third provider, GCP, documents a simple rule for network caps: 2Gbps per core. This matches what we estimated.

Now the most important question: does this estimate really work and can we actually fill it. The answer is not easy. A slightly synthetic test of a HTTP server with cached static content showed that it can easily reach 7Gbps on a 2GHz Intel Skylake core. So yes, it gives a good estimate on what network troughput is needed for real world applications. However we still could easily file pipe that is larger by a factor of three.