A lot of the customers I talk to who are considering virtualizing their network ask what the “overhead” or “performance penalty” is for running NSX. The general answer is “so low that you probably won’t be able to measure it”. For some people, that’s not enough. They want to know an exact figure for, say, the latency added for east-west VM communication when the NSX Distributed Firewall is enabled. Or how much latency is added by using VXLANs instead of native VLANs with the software VTEPs built into NSX? Or what is the latency introduced when I go from a VM on a VXLAN through an Edge Security Gateway to a physical server on a native VLAN?
The reason these are difficult questions to answer is that its highly dependent on the hardware involved. This includes the CPU/RAM/Bus of the hypervisors themselves, as well as the NIC they use. In the interest of science (and my customers!), I ran some empirical tests in my lab so that I could definitively give numbers – at least for the hardware setup I tested.
The testing harness used is very simple. Two Dell R420s running vSphere 6.0U1 – nothing special, they were originally purchased for ~$1500 each in Q3 2012 so I feel they do a good job representing the low end of what sort of equipment you’ll find in a production datacenter. The physical setup couldn’t be more simple:
ESXi host specifications
These were purchased in Q3 2012. These are 3 year old servers that might cost you $1200 today.
Dual E5-2420 6C CPUs
64GB DDR3 1600MT/s RAM
Intel C600 Chipset
Emulex OCe11102-N dual port 10Gb NIC
Brand new these go for $2400, but I got mine on Ebay for less than $1000.
Dell Powerconnect 8024f 24 port 10GbE switch
4 VMs per host used in testing, all configured as follows:
1 VMXNET3 vNIC
Windows 2008 R2
Items of note
- I disabled all the power saving stuff in the BIOS. This is a personal thing really, because I’ve had tests like this get polluted because of weird power saving behavior
- I used the async driver for the NICs – I’ve seen the in box drivers that come with vSphere do weird stuff like disable RSS even if the NIC has support for it onboard.
The main goal of the testing was to figure out the latency various NSX features would add in a real production scenario. I tested this using netperf against the raw interfaces of multiple VMs, as well as using a script that basically did a bunch of wgets against an ASP.NET page on the source VM, which in turn did a simple T-SQL query against a small database running on the target VM.
On the target VMs: netserver
On the source VMs: netperf -H <target server ip> -t TCP_RR -l 120 — -r 1024,1024
This test is a rather simple measurement of the round trip time (RTT) between two VMs. Each transaction = a new TCP connection initated from the client to the server, then 1024 bytes are transfered from source to target, then 1024 bytes from target to source. Then the connection is closed and the next one begins.
RTT in milliseconds [aka ms] = (1 / # of transactions per second reported by netperf) * 1000
RTT in microseconds [aka usec] = (RTT in ms) * 1000
How much latency does the Distributed Firewall by itself introduce?
In this case, we’re referring to deployments where only the NSX Distributed Firewall is used – no VXLANs, no ESGs, no DLRs, just normal VLANs with the DFW module doing stateful firewalling. Before we do any testing, it probably makes sense to explain where the NSX Distributed Firewall (aka DFW) inserts itself into the communication path:
The following diagram shows a two host ESXi cluster with a total of 4 VMs. Each host has a physical NIC connected to a physical switch, but that only comes into play if VMs on different hosts want to talk. If two VMs on the same host want to communicate, no physical network IO occurs – it is all simulated in the Virtual Switch. This kind of setup performs at line rate in most cases – the kinks in virtual switches were worked out years ago. This arrangement is pretty standard in most datacenters these days.
The next diagram shows where the NSX Distributed Firewall inserts itself – into the communication paths that already exist between VMs and the Virtual Switch, as well as between physical NICs and the virtual switch.
The virtual switch is already having to do all of the packet forwarding for VMs, just like a physical switch would do for physical servers. Since the ESXi kernel is already looking at the packets, its no big deal to add some code that says “hey while you’re in there, do some L4 stateful firewalling”. This is the fundamental reason that the NSX Distributed Firewall performs so well.
4 basic scenarios were tested:
Bottom line is this: enabling the NSX Distributed firewall with 100 rules per VM only adds 13.3-15.7 microseconds of latency to the RTT.
Some folks at this point express interest in understanding specifically where the non-DFW portions of the latency are coming from. The short answer is that its mostly coming from the OS inside the VMs.
Here is a breakdown of the latency for each step:
This is what it looks like relative to the other sources of latency:
In summary, DFW only adds a small percentage to the latency that is already happening in your virtual networking environment right now – which itself is negligible.
That’s it for Part 1. Upcoming posts in this series will deal with VXLAN, ESGs, DLRs, and L2 Bridging.