Hi,
I was hoping that someone would be able to help me.
I have 15 HP DL380 gen9 servers running Windows Server 2012 R2 each with dual port HP Connect-X3 VPI cards connected to a Mellanox SX6036 switch.
The line speed is correctly displayed as 32Gbps (QDR) but we are not getting anywhere near that performance. Real-world RDMA speeds seems to max out at 25Gbps (which is OK but could be better) but the maximum speed we seem to get with IPoIB is 1Gbps. Below is a ntttcp test:
c:\temp\NTttcp-v5.31\x64>ntttcp.exe -r -m 8,*,10.167.255.111 -rb 2M -a 16 -t 30
Copyright Version 5.31
Network activity progressing...
Thread Time(s) Throughput(KB/s) Avg B / Compl
====== ======= ================ =============
0 30.062 1896.880 65536.000
1 30.046 1414.365 63434.262
2 30.124 1459.567 64410.918
3 30.093 2473.399 65536.000
4 30.062 1847.914 65536.000
5 30.062 2452.531 65365.777
6 30.047 2726.406 63310.495
7 30.047 2890.476 61291.891
##### Totals: #####
Bytes(MEG) realtime(s) Avg Frame Size Throughput(MB/s)
================ =========== ============== ================
503.877392 30.069 3966.649 16.757
Throughput(Buffers/s) Cycles/Byte Buffers
===================== =========== =============
268.118 395.755 8062.038
DPCs(count/s) Pkts(num/DPC) Intr(count/s) Pkts(num/intr)
============= ============= =============== ==============
81.845 54.124 5762.646 0.769
Packets Sent Packets Received Retransmits Errors Avg. CPU %
============ ================ =========== ====== ==========
7198 133199 2 6 15.137
We have a similar environment which is slightly different. 9 HP DL380 , Connect-X3 cards connected to a IS5022 switch. This environment performs as expected:
c:\temp\NTttcp-v5.31\x64>ntttcp.exe -s -m 8,*,192.168.84.10 -l 128k -a 2 -t 30
Copyright Version 5.31
Network activity progressing...
Thread Time(s) Throughput(KB/s) Avg B / Compl
====== ======= ================ =============
0 30.000 469060.267 131072.000
1 30.000 359970.133 131072.000
2 30.000 446084.267 131072.000
3 30.000 437909.333 131072.000
4 30.000 348608.000 131072.000
5 30.000 387993.600 131072.000
6 30.000 348654.933 131072.000
7 30.000 357444.267 131072.000
##### Totals: #####
Bytes(MEG) realtime(s) Avg Frame Size Throughput(MB/s)
================ =========== ============== ================
92452.875000 30.000 4037.074 3081.762
Throughput(Buffers/s) Cycles/Byte Buffers
===================== =========== =============
24654.100 0.614 739623.000
DPCs(count/s) Pkts(num/DPC) Intr(count/s) Pkts(num/intr)
============= ============= =============== ==============
51664.567 1.719 72494.233 1.225
Packets Sent Packets Received Retransmits Errors Avg. CPU %
============ ================ =========== ====== ==========
24013397 2663789 15 6 6.892
The only major difference between the two environments is the switch. I’m pretty sure that the SX6036 is configured correctly but there must be something wrong if we are getting a throughput of 16MBps compared with 3081MBps!
Any help on this issue would be much appreciated. I can provide switch config and more details if required.
Thanks,
Zak