I've set up a test system with two Dell R730 servers, each of them with a ConnectX-3 Pro NIC card and connected by a 40GE cable. I've followed the BIOS tuning for R730 (https://community.mellanox.com/docs/DOC-2631) and the VMA Performance Tuning Guide (https://community.mellanox.com/docs/DOC-2797) very carefully and made sure I understood everything I did. Then I run the VMA latency test with:
sudo LD_PRELOAD=libvma.so VMA_SPEC=latency numactl --cpunodebind=1 taskset -c 33 sockperf sr -i 192.168.48.2
and
sudo LD_PRELOAD=libvma.so VMA_SPEC=latency numactl --cpunodebind=1 taskset -c 33 sockperf pp -i 192.168.48.2 -t 10
I've checked that the NIC cards are in the right slot with 16x PCIE width and are in NUMA node #1. However, the test gives me a surprisingly high MAXIMUM latency of 160us while the average is only 1us:
Test Result of UDP ping-pong with VMA |
---|
sockperf: ---> <MAX> observation = 162.336 sockperf: ---> percentile 99.999 = 6.488 sockperf: ---> percentile 99.990 = 4.949 sockperf: ---> percentile 99.900 = 2.099 sockperf: ---> percentile 99.000 = 1.705 sockperf: ---> percentile 90.000 = 1.409 sockperf: ---> percentile 75.000 = 1.356 sockperf: ---> percentile 50.000 = 1.179 sockperf: ---> percentile 25.000 = 1.135 sockperf: ---> <MIN> observation = 1.075 |
So I was wondering what could be the cause of this very high worst case latency, and what could be done to reduce it. I've also done another test without VMA. While it gives me a higher average latency of 6us the worst case latency is not so bad:
Test Result of UDP ping-pong without VMA |
---|
sockperf: ---> <MAX> observation = 21.201 sockperf: ---> percentile 99.999 = 9.604 sockperf: ---> percentile 99.990 = 8.219 sockperf: ---> percentile 99.900 = 7.626 sockperf: ---> percentile 99.000 = 6.796 sockperf: ---> percentile 90.000 = 6.318 sockperf: ---> percentile 75.000 = 6.147 sockperf: ---> percentile 50.000 = 5.937 sockperf: ---> percentile 25.000 = 5.848 sockperf: ---> <MIN> observation = 5.561 |
So is this because of the VMA itself or there're something else to suspect? The OS I am using is Ubuntu 14.04 with low-latency kernel 3.17.
Any advice is appreciated!
Regards,
Hongyuan