Hello everyone,
I'm chasing a bit of assistance in troubleshooting SRP over an Infiniband setup which I have at home. Essentially I'm not seeing the disk I/O performance I was expecting between my SRP Initiator and Target and want to troubleshoot where the problem could be. I wanted to start at the Infiniband infrastructure and working up from there. If I can verify that my Infiniband is setup correctly and performing as it should, I can start to troubleshoot the additional technologies and protocols involved.
Some basic information first:
SRP Target: Oracle Solaris v11.1 Server with ZFS pools as LU (Logical Units).
SRP Initiator: VMware ESXi v5.5.
Mellanox MHGH28-XTC (MT25418) cards are being used in both the Infiniband devices above. A CX4 cable is used to directly connect between them.
Now to the best of my knowledge, the drivers, VIBs and configuration has all been done correctly and I'm at the point where my ESXi v5.5 can actually see the LU, mount it and I can store data on there. At this stage, it seems to be purely a performance issue which I'm trying to resolve.
Some CLI outputs below:
STORAGE-SERVER
STORAGE-SERVER:/# ibstat
CA 'mlx4_0'
CA type: 0
Number of ports: 2
Firmware version: 2.9.1000
Hardware version: 160
Node GUID: 0x001a4bffff0c6214
System image GUID: 0x001a4bffff0c6217
Port 1:
State: Active
Physical state: LinkUp
Rate: 20
Base lid: 2
LMC: 0
SM lid: 1
Capability mask: 0x00000038
Port GUID: 0x001a4bffff0c6215
Link layer: IB
Port 2:
State: Down
Physical state: Polling
Rate: 10
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x00000038
Port GUID: 0x001a4bffff0c6216
Link layer: IB
VM-HYPER:
/opt/opensm/bin # ./ibstat
CA 'mlx4_0'
CA type: MT25418
Number of ports: 2
Firmware version: 2.7.0
Hardware version: a0
Node GUID: 0x001a4bffff0cb178
System image GUID: 0x001a4bffff0cb17b
Port 1:
State: Active
Physical state: LinkUp
Rate: 20
Base lid: 1
LMC: 0
SM lid: 1
Capability mask: 0x0251086a
Port GUID: 0x001a4bffff0cb179
Link layer: InfiniBand
Port 2:
State: Down
Physical state: Polling
Rate: 8
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x0251086a
Port GUID: 0x001a4bffff0cb17a
Link layer: InfiniBand
The "LIDs" in the above outputs indicate that the SM (Subnet Manager) is working as far as I'm aware.
From the SRP target, I can see the other Infiniband host:
STORAGE-SERVER:/# ibhosts
Ca : 0x001a4bffff0cb178 ports 2 "****************** HCA-1"
Ca : 0x001a4bffff0c6214 ports 2 "MT25408 ConnectX Mellanox Technologies"
I thought I'd start with using the "ibping" utility to verify Infiniband connectivity. This is where I got some really strange results:
Firstly, I could not get the ibping daemon running on the SRP initiator (ESXi) at all. The command would execute, but then just return to the shell:
/opt/opensm/bin # ./ibping -S
/opt/opensm/bin #
So I tried to switch to running the ibping daemon on the SRP target (Oracle Solaris), which seemed to work as it should and it appeared to be awaiting some pings to come through. Great! Now going back to the SRP initiator, I ran the ibping utility with the LID of the SRP target. But it was unsuccessful:
/opt/opensm/bin # ./ibping -L 2
ibwarn: [3502756] _do_madrpc: recv failed: Resource temporarily unavailable
ibwarn: [3502756] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 2)
ibwarn: [3502756] _do_madrpc: recv failed: Resource temporarily unavailable
ibwarn: [3502756] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 2)
ibwarn: [3502756] _do_madrpc: recv failed: Resource temporarily unavailable
...
..
.
--- (Lid 2) ibping statistics ---
10 packets transmitted, 0 received, 100% packet loss, time 9360 ms
rtt min/avg/max = 0.000/0.000/0.000 ms
OK, let's try the Port GUID of the SRP target instead of the LID:
/opt/opensm/bin # ./ibping -G 0x001a4bffff0c6215
ibwarn: [3504924] _do_madrpc: recv failed: Resource temporarily unavailable
ibwarn: [3504924] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 1)
ibwarn: [3504924] ib_path_query_via: sa call path_query failed
./ibping: iberror: failed: can't resolve destination port 0x001a4bffff0c6215
I restarted the ibping daemon on the SRP target with 1 level of debugging, and re-ran the pings from the client (SRP initiator). I can see that the pings are actually reaching the SRP target and a reply is being sent:
STORAGE-SERVER:/# ibping -S -d
ibdebug: [11188] ibping_serv: starting to serve...
ibdebug: [11188] ibping_serv: Pong: STORAGE-SERVER
ibwarn: [11188] mad_respond_via: dest Lid 1
ibwarn: [11188] mad_respond_via: qp 0x1 class 0x32 method 129 attr 0x0 mod 0x0 datasz 0 off 0 qkey 80010000
ibdebug: [11188] ibping_serv: Pong: STORAGE-SERVER
ibwarn: [11188] mad_respond_via: dest Lid 1
ibwarn: [11188] mad_respond_via: qp 0x1 class 0x32 method 129 attr 0x0 mod 0x0 datasz 0 off 0 qkey 80010000
ibdebug: [11188] ibping_serv: Pong: STORAGE-SERVER
ibwarn: [11188] mad_respond_via: dest Lid 1
ibwarn: [11188] mad_respond_via: qp 0x1 class 0x32 method 129 attr 0x0 mod 0x0 datasz 0 off 0 qkey 80010000
ibdebug: [11188] ibping_serv: Pong: STORAGE-SERVER
ibwarn: [11188] mad_respond_via: dest Lid 1
ibwarn: [11188] mad_respond_via: qp 0x1 class 0x32 method 129 attr 0x0 mod 0x0 datasz 0 off 0 qkey 80010000
ibdebug: [11188] ibping_serv: Pong: STORAGE-SERVER
ibwarn: [11188] mad_respond_via: dest Lid 1
ibwarn: [11188] mad_respond_via: qp 0x1 class 0x32 method 129 attr 0x0 mod 0x0 datasz 0 off 0 qkey 80010000
The strangest observation is yet to come however. If I run the ibping on the client with 2 levels of debug, I get a few replies in the final statistics output when the ibping is terminated (this does not work under single level of debugging in my experience):
/opt/opensm/bin # ./ibping -L -dd 2
...
..
.
ibdebug: [3508744] ibping: Ping..
ibwarn: [3508744] ib_vendor_call_via: route Lid 2 data 0x3ffcebc7aa0
ibwarn: [3508744] ib_vendor_call_via: class 0x132 method 0x1 attr 0x0 mod 0x0 datasz 216 off 40 res_ex 1
ibwarn: [3508744] mad_rpc_rmpp: rmpp (nil) data 0x3ffcebc7aa0
ibwarn: [3508744] umad_set_addr: umad 0x3ffcebc7570 dlid 2 dqp 1 sl 0, qkey 80010000
ibwarn: [3508744] _do_madrpc: >>> sending: len 256 pktsz 320
send buf
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0001 8001 0000 0002 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0132 0101 0000 0000 0000 0000 4343 c235
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 1405 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
ibwarn: [3508744] umad_send: fd 3 agentid 1 umad 0x3ffcebc7570 timeout 1000
ibwarn: [3508744] umad_recv: fd 3 umad 0x3ffcebc7170 timeout 1000
ibwarn: [3508744] umad_recv: mad received by agent 1 length 320
ibwarn: [3508744] _do_madrpc: rcv buf:
rcv buf
0132 0181 0000 0000 0000 00ac 4343 c234
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 1405 6763 2d73 746f 7261
6765 312e 6461 726b 7265 616c 6d2e 696e
7465 726e 616c 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
ibwarn: [3508744] umad_recv: fd 3 umad 0x3ffcebc7170 timeout 1000
ibwarn: [3508744] umad_recv: mad received by agent 1 length 320
ibwarn: [3508744] _do_madrpc: rcv buf:
rcv buf
0132 0181 0000 0000 0000 00ac 4343 c235
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 1405 6763 2d73 746f 7261
6765 312e 6461 726b 7265 616c 6d2e 696e
7465 726e 616c 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
ibwarn: [3508744] mad_rpc_rmpp: data offs 40 sz 216
rmpp mad data
6763 2d73 746f 7261 6765 312e 6461 726b
7265 616c 6d2e 696e 7465 726e 616c 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000
Pong from STORAGE-SERVER (Lid 2): time 7.394 ms
ibdebug: [3508744] report: out due signal 2
--- STORAGE-SERVER (Lid 2) ibping statistics ---
10 packets transmitted, 3 received, 70% packet loss, time 9556 ms
rtt min/avg/max = 7.394/12.335/15.344 ms
I'm stumped. Anyone have any ideas on what is going on or how to troubleshoot further?