Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all articles
Browse latest Browse all 6226

Trouble with ConnectX-3 VPI VFs with SR-IOV

$
0
0

Hi,

 

I am trying to get VFs working on the IB card to pass through to KVM guests. Following through the steps in HowTo Configure SR-IOV for ConnectX-3 with KVM (InfiniBand), I get in trouble after restarting openibd in step "Enable SR-IOV on the MLNX_OFED Driver"  with the following snippets from dmesg output (see attachment for further detail):

 

[  37.547412] mlx4_core: device is working in RoCE mode: Roce V1

[   37.572033] mlx4_core: gid_type 1 for UD QPs is not supported by the devicegid_type 0 was chosen instead

[   37.623776] mlx4_core: UD QP Gid type is: V1

[   39.430768] mlx4_core 0000:41:00.0: Enabling SR-IOV with 4 VFs

[   39.562398] pci 0000:41:00.1: [15b3:1004] type 00 class 0x028000

[   39.569757] mlx4_core: Initializing 0000:41:00.1

[   39.597827] mlx4_core 0000:41:00.1: enabling device (0000 -> 0002)

[   39.627547] mlx4_core 0000:41:00.1: Detected virtual function - running in slave mode

[   39.684547] mlx4_core 0000:41:00.1: PF is not ready - Deferring probe

[   39.714917] pci 0000:41:00.1: Driver mlx4_core requests probe deferral

[   39.744881] pci 0000:41:00.2: [15b3:1004] type 00 class 0x028000

[   39.752156] mlx4_core: Initializing 0000:41:00.2

[   39.782028] mlx4_core 0000:41:00.2: enabling device (0000 -> 0002)

[   39.813140] mlx4_core 0000:41:00.2: Skipping virtual function:2

[   39.843525] pci 0000:41:00.3: [15b3:1004] type 00 class 0x028000

[   39.850805] mlx4_core: Initializing 0000:41:00.3

[   39.879927] mlx4_core 0000:41:00.3: enabling device (0000 -> 0002)

[   39.909787] mlx4_core 0000:41:00.3: Skipping virtual function:3

[   39.939078] pci 0000:41:00.4: [15b3:1004] type 00 class 0x028000

[   39.946361] mlx4_core: Initializing 0000:41:00.4

[   39.974914] mlx4_core 0000:41:00.4: enabling device (0000 -> 0002)

[   40.004714] mlx4_core 0000:41:00.4: Skipping virtual function:4

[   40.033411] mlx4_core 0000:41:00.0: Running in master mode

 

--- Stacks of MSI/MSI-X messages later ---

 

[   40.582243] mlx4_core: Initializing 0000:41:00.1

[   40.610237] mlx4_core 0000:41:00.1: enabling device (0000 -> 0002)

[   40.639442] mlx4_core 0000:41:00.1: Detected virtual function - running in slave mode

[   40.694489] mlx4_core 0000:41:00.1: Sending reset

[   40.722845] mlx4_core 0000:41:00.0: Received reset from slave:1

[   40.750438] mlx4_core 0000:41:00.1: Sending vhcr0

[   40.777898] AMD-Vi: Event logged [IO_PAGE_FAULT device=41:00.1 domain=0x0000 address=0x00000037f7bde000 flags=0x0050]

[   40.833233] AMD-Vi: Event logged [IO_PAGE_FAULT device=41:00.1 domain=0x0000 address=0x00000037f7bde040 flags=0x0050]

[   40.890985] AMD-Vi: Event logged [IO_PAGE_FAULT device=41:00.1 domain=0x0000 address=0x00000037f7bde080 flags=0x0050]

[   40.949797] AMD-Vi: Event logged [IO_PAGE_FAULT device=41:00.1 domain=0x0000 address=0x00000037f7bde0c0 flags=0x0050]

[   46.047238] mlx4_core 0000:41:00.0: command 0x2e failed: fw status = 0x1

[   46.077884] mlx4_core 0000:41:00.0: mlx4_master_process_vhcr: Failed reading vhcr ret: 0xfffffffb

[   46.139267] mlx4_core 0000:41:00.0: Failed processing vhcr for slave:1, resetting slave

[   46.203088] mlx4_core 0000:41:00.0: Turn on internal error to force reset, slave=1, cmd=0x5

[   46.268572] mlx4_core 0000:41:00.0: slave:1 is out of sync, cmd=0x5, last command=0x0, reset is needed

[   46.336826] mlx4_core 0000:41:00.0: Turn on internal error to force reset, slave=1, cmd=0x5

[   46.406515] mlx4_core 0000:41:00.0: slave:1 is out of sync, cmd=0x5, last command=0x0, reset is needed

[   46.476511] mlx4_core 0000:41:00.0: Turn on internal error to force reset, slave=1, cmd=0x5

[   46.546482] mlx4_core 0000:41:00.1: HCA minimum page size:1

[   46.582122] mlx4_core 0000:41:00.0: slave:1 is out of sync, cmd=0x5, last command=0x0, reset is needed

[   46.653173] mlx4_core 0000:41:00.0: Turn on internal error to force reset, slave=1, cmd=0x5

[   46.725318] mlx4_core 0000:41:00.1: The host supports neither eth nor rdma interfaces

[   46.799557] mlx4_core 0000:41:00.1: QUERY_FUNC_CAP general command failed, aborting (-93)

[   46.873709] mlx4_core 0000:41:00.1: Failed to obtain slave caps

[   46.911030] mlx4_core 0000:41:00.0: Received reset from slave:1

[   46.948493] mlx4_core: probe of 0000:41:00.1 failed with error -93

 

I am concerned about the AMD-Vi messages, googling doesn't really offer many relevant answers. Running Ubuntu Trusty 14.04 (3.16 kernel, tried 4.2) with latest 3.3 OFED (tried 3.2 as well).

 

The card is a dual port CX3 VPI with port 1 connected at FDR:

PSID:                MT_1090120019

 

The hypervisor is a Dell C6145 sled with latest firmware. SR-IOV is enabled in BIOS as well as IOMMU in grub. I'm coming from Intel land and not too familiar with AMD, does this look right or should I get something additional regarding IOMMU/HW virt/SR-IOV:

 

[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.16.0-71-generic root=UUID=bc67403d-a8e1-4e30-bf48-36ffeecd04e0 ro iommu=pt

[    4.167159] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40

[    4.167163] AMD-Vi: Found IOMMU at 0000:40:00.2 cap 0x40

[    4.167166] AMD-Vi: Interrupt remapping enabled

[    4.167664] AMD-Vi: Initialized for Passthrough Mode

 

I do get the cards in lspci, but they seem non-functional:

 

41:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

41:00.1 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

41:00.2 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

41:00.3 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

41:00.4 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

 

modprobe options for mlnx4_core:

 

options mlx4_core num_vfs=4 port_type_array=1,1 probe_vf=1

(changing probe_vf=0 doesn't help, no interfaces with probe_vf=1)

 

Thanks for any suggestions!

 

Cheers


Viewing all articles
Browse latest Browse all 6226

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>