hi
i agree with Justin. check your PCIE slot is PCIE-3 x 16 lane. as those new cards are PCIE 3 based. If they are in a PCIE2 slot they will only connect to 8 lanes, and thus be operating at 1/2 speed.
Also check your version of VMWare is @ 799733 with 1.8.1.0 driver
Next ensure the cards firmware is upto date.> Firmware for ConnectX®-3 EN - Mellanox Technologies
I strongly advise using a FDR switch. Especially for ipoib, the default opensm is not configured by default as optimized. So search on optimizing opensm. ie QOS policy which is best done on a switch rather than software opensm. You should read the OFED Manual which has the details on opensm settings.
if your on ofed 2 > http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_User_Manual_v2.0-2.0.5.pdf check out chapter 8 performance. and chap 9 OPENSM
Ofed 2 has improved ipoib performance in Datagram mode 4096 MTU. Connected mode 64K Mtu (or unreliable mode) BE VERY AWARE THAT VMWARE DRIVER ONLY WORKS IN DATAGRAM MODE!!
IPoIB can run in two modes of operation: Connected mode and Datagram mode. By default, IPoIB is set to work in Datagram, except for Connect-IB™ adapter card which uses IPoIB with Connected mode as default.By default OFED 2 works in Datagram Mode
As a rule of thumb we have ConnectX1 & 2 (DDR and QDR) Mellanox/HP cards in HP blades in 20GB and 40GB infrastructure. On 20Gb/s DDR we can get 1900MB/s no problems, and in 40Gb/s QDR 3800MB/s. Thats raw i/o benchmark from i/o meter. Each C7000 Chassis has Dual DDR or QDR Switches, connected to core switches via min 2 paths.
However in real world terms on vmware esxi5 using iSCSI over IPOIB over DDR we get 700-1100MB/s and on QDR we get 1200-1900MB/s. BUT CPU IO IS VERY HIGH! on the vm running the tests, so a dual core or 4 core VM is required for high io loads. Latency is poorer on IPOIB also. Thus we Prefer SRP which we get near full buss speed of DDR or QDR, low latency, and low cpu overhead on SRP. (RDMA transport)
We run DRBD SANS with ALUA over SRP using SCST. which is very, very fast. DDR = 1500-1900MB/s and QDR 2900-3900MB/s (backend is LSI Raid5 with 8 x SATA3-SSD drives with (80,000 iops each) LESSFS BCache mounted no BTRFS LVM's)
We happily get 500,000 IOPS and with some modifications to our SSD LSI Raid expect to hit 1,000,000 IOPS with the NEW LSI SSD cards, which are also PCIE3 using 16 lanes.
Have fun,
Bruce M