Omni-Path vs. Mellanox

December 14, 2015, 11:40 am

≫ Next: Re: Mellanox IS 5030 Licenses

≪ Previous: OFED Github repo deleted?

Dear Mellanox Gurus,

Would you please advice the pros/cons of Omni-Path vs. Mellanox + Ethernet vs. InfiniBand?

Thanks in advance + Happy Monday

Henry

>> Learn about InfiniBand benefits for free on the Mellanox Academy

↧

Re: Mellanox IS 5030 Licenses

November 14, 2017, 3:51 pm

≫ Next: The status of Mellanox SN2100'S LED

≪ Previous: Omni-Path vs. Mellanox

Hi Jae-Hoon,

Thank you for your reply. I contacted support and got the switches upgraded to the latest firmware. For others seeking the latest and probably the final version of the firmware, it is Firmware (latest. EOL. No more upgrades): EFM_PPC_M405EX EFM_1.1.3004.

↧

The status of Mellanox SN2100'S LED

November 15, 2017, 3:01 am

≫ Next: Add Cisco switch to Mellanox NEO trouble.

≪ Previous: Re: Mellanox IS 5030 Licenses

Hello,everyone

I want to know the status of Mellanox SN2100'S LED.Today,I use the 100G cables in the switch and connect it between port 1 to port 3 ,then the led links up,but the status is flap,the flap frequency is one second;but when I ues the 100G SR4 optical transceiver ,the flap frequency is once in 30 seconds,this is the first time i use the SN2100, So is it normal ? which status should it be? Thanks for your help in advance.

↧

Add Cisco switch to Mellanox NEO trouble.

November 15, 2017, 4:01 am

≫ Next: Re: The status of Mellanox SN2100'S LED

≪ Previous: The status of Mellanox SN2100'S LED

Good afternoon Colleagues!

I ask you to help me with the connection of CICCO switches in Mellanox NEO.

After the start of provisioning and waiting, there is an error of timeout. Here's what's in the logs:

2017-11-15 20:54:06.273 job INFO performAction created a new job (20) for Provisioning

2017-11-15 20:54:06.300 job INFO performAction created sub-job (20.1) for device: 10.10.0.4

2017-11-15 20:54:06.301 job INFO Preparing job notification for job (20 - Provisioning), status:(New), progress: (0)

2017-11-15 20:54:06.301 job INFO job: (20.1), status: (New), progress: (0), device: (10.10.0.4)

2017-11-15 20:54:06.301 zmq INFO Send Message Topic:notification, category:notifications/jobs

2017-11-15 20:54:06.355 netservice INFO Performing action run_cli on devices

2017-11-15 20:54:06.355 netservice INFO commandline: [u'show running-config']

2017-11-15 20:54:06.355 netservice INFO arguments: {"globals": {}, "devices": {}}

2017-11-15 20:54:07.103 cli-facility INFO running : /opt/neo/providers/common/bin/providers/common/tools/clifacility/cli_facility.pyo --hosts /tmp/tmpMADHzY/devices.csv --listen-port 60374 --file /tmp/tmpMADHzY/commands.txt --pool-size 30 --operation-timeout 120

2017-11-15 20:54:07.115 cli-facility INFO Starting thread pool: size=30

2017-11-15 20:54:07.119 cli-facility INFO Handling Host: 10.10.0.4

2017-11-15 20:56:07.121 cli-facility WARNING Killing thread for context: 10.10.0.4

2017-11-15 20:56:07.121 cli-facility WARNING Timeout operation for Host(s): 10.10.0.4

2017-11-15 20:56:07.121 cli-facility INFO Result is ready...

2017-11-15 20:56:07.132 cli-facility INFO Sending result to client...

2017-11-15 20:56:07.132 job WARNING job 20.1 failed: Timeout while communicating with: 10.10.0.4

2017-11-15 20:56:07.133 job INFO updateJobStatus updated job (20.1) with status 32772

2017-11-15 20:56:07.133 job INFO Preparing job notification for job (20 - Provisioning), status:(Completed With Errors), progress: (100)

2017-11-15 20:56:07.133 job INFO job: (20.1), status: (Completed With Errors - Timeout while communicating with: 10.10.0.4), progress: (100), device: (10.10.0.4)

2017-11-15 20:56:07.135 cli-facility INFO Stopping thread pool

2017-11-15 20:56:07.134 zmq INFO Send Message Topic:notification, category:notifications/jobs

Thank you in advance!

↧

Re: The status of Mellanox SN2100'S LED

November 15, 2017, 4:21 am

≫ Next: ConnectX-4 25Gbps link not detected with Arista 7150 switch

≪ Previous: Add Cisco switch to Mellanox NEO trouble.

Hi Ben,

When you say flap, you mean led blinking?

blink every second is probably STP bpdus.

blink every 30 seconds is LLDP maybe?

Image may be NSFW.
Clik here to view.

↧

ConnectX-4 25Gbps link not detected with Arista 7150 switch

November 15, 2017, 5:56 am

≫ Next: Re: The status of Mellanox SN2100'S LED

≪ Previous: Re: The status of Mellanox SN2100'S LED

Hi,

I am noticing that when my connectx-4 25Gbps card is connected to the Arista 7150 switch (10Gbps) ports, the link is not getting detected.

Based on the documents I read, my understanding is that the auto-negotiate should successfully negotiate the link to 10Gbps.

I also tried to disable auto negotiate and set the port speed to 10Gbps and still it is not working.

The details:

Arista switch:

localhost>show version

Software image version: 4.12.7.1

Architecture: i386

Host:

linux-6cof:~ # cat /etc/SuSE-release

SUSE Linux Enterprise Server 12 (x86_64)

VERSION = 12

PATCHLEVEL = 3

# This file is deprecated and will be removed in a future service pack or release.

# Please check /etc/os-release for details about this release.

linux-6cof:~ # uname -r

4.4.73-5-default

linux-6cof:~ # ethtool eth1

Settings for eth1:

Supported ports: [ FIBRE Backplane ]

Supported link modes: 1000baseKX/Full

10000baseKR/Full

25000baseCR/Full

25000baseKR/Full

25000baseSR/Full

Supported pause frame use: Symmetric

Supports auto-negotiation: Yes

Advertised link modes: 10000baseKR/Full

Advertised pause frame use: Symmetric

Advertised auto-negotiation: No

Speed: Unknown!

Duplex: Unknown! (255)

Port: FIBRE

PHYAD: 0

Transceiver: internal

Auto-negotiation: off

Supports Wake-on: d

Wake-on: d

Link detected: no

linux-6cof:~ # modinfo mlx5_core

filename: /lib/modules/4.4.73-5-default/kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko

version: 3.0-1

license: Dual BSD/GPL

description: Mellanox Connect-IB, ConnectX-4 core driver

author: Eli Cohen <eli@mellanox.com>

Image may be NSFW.
Clik here to view.

↧

Re: The status of Mellanox SN2100'S LED

November 15, 2017, 8:14 pm

≫ Next: Achieving 40Gbps with Ethernet mode on ConnectX-3 VPI

≪ Previous: ConnectX-4 25Gbps link not detected with Arista 7150 switch

Hi Eddie,

yes, the flap means led blinking,and it's very regular,I will ping IP address between two switches to observe the STP bpdus, It can help us to analyse the status of led,I will tell you the result when I finish,Thanks so much.

Image may be NSFW.
Clik here to view.

↧

Achieving 40Gbps with Ethernet mode on ConnectX-3 VPI

November 16, 2017, 11:33 am

≫ Next: HPE IB Adapter not recognized by Windows

≪ Previous: Re: The status of Mellanox SN2100'S LED

Hi,

i'm trying to get some more understanding of achieving near line speed of 40Gbps on the following adapter card:

[root@compute8 scripts]# lspci -vv -s 07:00.0 | grep "Part number" -A 3

[PN] Part number: MCX354A-FCBT

[EC] Engineering changes: A4

[SN] Serial number: MT1334U01416

[V0] Vendor specific: PCIe Gen3 x8.

Some system info

[root@compute8 scripts]# cat /etc/centos-release

CentOS Linux release 7.3.1611 (Core)

[root@compute8 scripts]# ofed_info -s

MLNX_OFED_LINUX-4.1-1.0.2.0:

[root@compute8 scripts]# uname -r

3.10.0-514.26.2.el7.x86_64

2 identical HP ProLiant DL360p Gen8 servers equipped with 2 Quad Core Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz CPUs and 32GB RAM. Performance profile is network-throughput (using tuned) on both servers.

The ConnectX-3 cards are connected back to back (no switch) with a Mellanox FDR copper cable (1m long). It's been put into Ethernet mode and i've followed the recommended optimization guide at Performance Tuning for Mellanox Adapters .

The problem is achieving anything near line speed of 40Gbps. I've tested with iperf2 since iperf, iperf2, iperf3 recommends it (and not to use iperf3):

Server side:

[root@compute7 ~]# iperf -v

Client side:

[root@compute8 scripts]# iperf -c 192.168.100.1 -P2

------------------------------------------------------------

Client connecting to 192.168.100.1, TCP port 5001

TCP window size: 325 KByte (default)

------------------------------------------------------------

[ 3] local 192.168.100.2 port 54430 connected with 192.168.100.1 port 5001

[ 4] local 192.168.100.2 port 54432 connected with 192.168.100.1 port 5001

[ ID] Interval Transfer Bandwidth

[ 3] 0.0-10.0 sec 17.1 GBytes 14.7 Gbits/sec

[ 4] 0.0-10.0 sec 17.1 GBytes 14.7 Gbits/sec

[SUM] 0.0-10.0 sec 34.1 GBytes 29.3 Gbits/sec

[root@compute8 scripts]# iperf -c 192.168.100.1 -P3

------------------------------------------------------------

Client connecting to 192.168.100.1, TCP port 5001

TCP window size: 325 KByte (default)

------------------------------------------------------------

[ 5] local 192.168.100.2 port 54438 connected with 192.168.100.1 port 5001

[ 4] local 192.168.100.2 port 54434 connected with 192.168.100.1 port 5001

[ 3] local 192.168.100.2 port 54436 connected with 192.168.100.1 port 5001

[ ID] Interval Transfer Bandwidth

[ 5] 0.0-10.0 sec 14.4 GBytes 12.4 Gbits/sec

[ 4] 0.0-10.0 sec 15.2 GBytes 13.1 Gbits/sec

[ 3] 0.0-10.0 sec 15.3 GBytes 13.1 Gbits/sec

[SUM] 0.0-10.0 sec 44.9 GBytes 38.6 Gbits/sec

[root@compute8 scripts]# iperf -c 192.168.100.1 -P4

------------------------------------------------------------

Client connecting to 192.168.100.1, TCP port 5001

TCP window size: 325 KByte (default)

------------------------------------------------------------

[ 6] local 192.168.100.2 port 54446 connected with 192.168.100.1 port 5001

[ 4] local 192.168.100.2 port 54440 connected with 192.168.100.1 port 5001

[ 5] local 192.168.100.2 port 54444 connected with 192.168.100.1 port 5001

[ 3] local 192.168.100.2 port 54442 connected with 192.168.100.1 port 5001

[ ID] Interval Transfer Bandwidth

[ 6] 0.0-10.0 sec 11.0 GBytes 9.47 Gbits/sec

[ 4] 0.0-10.0 sec 12.4 GBytes 10.6 Gbits/sec

[ 5] 0.0-10.0 sec 13.0 GBytes 11.2 Gbits/sec

[ 3] 0.0-10.0 sec 8.09 GBytes 6.95 Gbits/sec

[SUM] 0.0-10.0 sec 44.5 GBytes 38.2 Gbits/sec

So it seems a minimum of 3 threads is needed to get close to line speed. Increasing threads above 3 doesn't improve anything. While running the above tests, i was able to observe a lot of changes in /proc/interrupts for ens2 ( port 1 of ConnectX-3), which means there interrupts being generated to request CPU time. This should not happen when RDMA is in use and i've confirmed RDMA is working using some of the tools (ib_send_bw, rping, udaddy, rdma_server etc) mentioned in HowTo Enable, Verify and Troubleshoot RDMA.

Why are these Mellanox utilities performing as intended? Is the answer built in RDMA support?

Further, using perf gives me some details:

[root@compute8 scripts]# perf stat -e cpu-migrations,context-switches,task-clock,cycles,instructions,cache-references,cache-misses iperf -c 192.168.100.1 -P4

------------------------------------------------------------

Client connecting to 192.168.100.1, TCP port 5001

TCP window size: 325 KByte (default)

------------------------------------------------------------

[ 6] local 192.168.100.2 port 54470 connected with 192.168.100.1 port 5001

[ 4] local 192.168.100.2 port 54464 connected with 192.168.100.1 port 5001

[ 3] local 192.168.100.2 port 54466 connected with 192.168.100.1 port 5001

[ 5] local 192.168.100.2 port 54468 connected with 192.168.100.1 port 5001

[ ID] Interval Transfer Bandwidth

[ 6] 0.0-10.0 sec 10.6 GBytes 9.08 Gbits/sec

[ 4] 0.0-10.0 sec 11.5 GBytes 9.85 Gbits/sec

[ 3] 0.0-10.0 sec 12.4 GBytes 10.7 Gbits/sec

[ 5] 0.0-10.0 sec 10.1 GBytes 8.69 Gbits/sec

[SUM] 0.0-10.0 sec 44.6 GBytes 38.3 Gbits/sec

Performance counter stats for 'iperf -c 192.168.100.1 -P4':

126 cpu-migrations # 0.005 K/sec

11,934 context-switches # 0.446 K/sec

26730.400620 task-clock (msec) # 2.666 CPUs utilized

63,926,425,845 cycles # 2.392 GHz

25,417,772,891 instructions # 0.40 insn per cycle

1,786,983,037 cache-references # 66.852 M/sec

446,840,327 cache-misses # 25.005 % of all cache refs

10.025755759 seconds time elapsed

For instance, i observe the high numbers of CPU context switches that are very costly.

But then after some research efforts i discovered http://ftp100.cewit.stonybrook.edu/rperf. Using rperf server and client, i was able to achieve near line speed without any further effort and increased threads:

Server side:

root@compute7 ~]# rperf -s -p 5001 -l 500M -H

...

Client side:

[root@compute8 scripts]# perf stat -e cpu-migrations,context-switches,task-clock,cycles,instructions,cache-references,cache-misses rperf -c $IP -p 5001 -H -G pw -l 500M -i 2

------------------------------------------------------------

RDMA Client connecting to 192.168.100.1, TCP port 5001

TCP window size: -1.00 Byte (default)

------------------------------------------------------------

[ 4] local 192.168.100.2 port 40580 connected with 192.168.100.1 port 5001

[ ID] Interval Transfer Bandwidth

[ 4] 0.0- 2.0 sec 8.79 GBytes 37.7 Gbits/sec

[ 4] 2.0- 4.0 sec 9.28 GBytes 39.8 Gbits/sec

[ 4] 4.0- 6.0 sec 8.79 GBytes 37.7 Gbits/sec

[ 4] 6.0- 8.0 sec 9.28 GBytes 39.8 Gbits/sec

[ 4] 8.0-10.0 sec 9.28 GBytes 39.8 Gbits/sec

[ 4] 0.0-10.1 sec 45.9 GBytes 39.1 Gbits/sec

Performance counter stats for 'rperf -c 192.168.100.1 -p 5001 -H -G pw -l 500M -i 2':

13 cpu-migrations # 0.007 K/sec

1,348 context-switches # 0.734 K/sec

1836.487997 task-clock (msec) # 0.152 CPUs utilized

4,393,238,230 cycles # 2.392 GHz

9,201,275,892 instructions # 2.09 insn per cycle

26,855,320 cache-references # 14.623 M/sec

23,419,862 cache-misses # 87.208 % of all cache refs

12.084867922 seconds time elapsed

Note the numbers in CPU context switches that are very low compared to the iperf run and the low CPU utlization task-clock (msec). Monitoring /proc/loadavg also showed low CPU utilization.

Another important observation i did is that there are just a few interrupts generated as seen from /proc/interrupts and more importantly the transmitted and received packets are unchanged monitoring /proc/net/dev from the client side while rperf is being run. This clearly indicates that RDMA is being used here to move data from the application and directly into the server, bypassing the kernel.

Also, what i've found out is that the tuning of MTU is 9000 is vital, with default settings, even rperf is performing pretty bad!

According to Vangelis post Cannot get 40Gbps on Ethernet mode with ConnectX-3 VPI ,he was able to achieve near line speed using iperf2 without the need of using multi threads. Why am i not able to any longer? Is RDMA support removed from iperf2 (and it's been there at some point earlier)?

What's important in the end is not the benchmarks, but the actual application the systems will be running on Linux with such an above mentioned setup. How and will they be able to take advantage of RDMA and achieve anything close to the line speed? Do each of the running applications explicitly support RDMA in order to achieve the high speeds?

↧

HPE IB Adapter not recognized by Windows

November 16, 2017, 8:24 pm

≫ Next: Hyper-V/Win2016 Switch Embedded Teaming on VPI IPoIB Interface with Multiple PKey

≪ Previous: Achieving 40Gbps with Ethernet mode on ConnectX-3 VPI

Hello,

I realize I may be better posting on the HP forum, but frankly I have always found this forum the most knowledgeable when it comes to anything Infiniband.

I have a number of HPE BL460c blades that I have installed the HP FDR IB Adapters 545M(702213-B21). I have installed the IB Drivers from HPE v5.35. Strangely I could not find an Installation Guide or a Manual for this card.

The Adapters show up as Unknown Devices in Windows. Tried Win12 R2 and Win16 with same results.

I have update the firmware on the adapter to the current 10.16.1058.

I have been working on this for a couple days and cannot get the cards to work.

I thought someone here might be able to help shed some insight on what I my try.

Thanks,

Todd

Image may be NSFW.
Clik here to view.

↧

Hyper-V/Win2016 Switch Embedded Teaming on VPI IPoIB Interface with Multiple PKey

November 17, 2017, 5:47 am

≫ Next: Cannot set RX/TX channels on ConnectX4 MT27700

≪ Previous: HPE IB Adapter not recognized by Windows

With Windows Server 2016, documentations are available related to SET on MLNX card in 40GB ethernet mode with multiple VLAN with support of RDMA. Is this possible with MLNX card in IB mode running IPoIB with multiple PKey if the infrastructure is based on IB and has IB-IP gateway available? part_man.exe creates new NIC for each PKey specified. Do you create multiple SET per PKey based on those "virtual" NIC created by part_man.exe? Will the SET still have the capability of RDMA?

Image may be NSFW.
Clik here to view.

↧

Cannot set RX/TX channels on ConnectX4 MT27700

November 20, 2017, 6:03 am

≫ Next: Is SX6012 switch compatible with Intel Omnipath Interface card?

≪ Previous: Hyper-V/Win2016 Switch Embedded Teaming on VPI IPoIB Interface with Multiple PKey

Hallo,

I recently got a ConnectX4, and I found that I cannot set RX / TX channels using ethtool as I was able to do with the ConnectX3 Pro. If I check, I see:

# ethtool -l ens5f0
Channel parameters for ens5f0:
Pre-set maximums:
RX:             0
TX:             0
Other:          512
Combined:       36
# ethtool -L ens5f0 tx 3
Cannot set device channel parameters: Invalid argument

which seems a bit weird. I had a look at the documentation, and found nothing in this respect. Is this something expected? I am using 4.1.1.0.2, and:

CA 'mlx5_0'
        CA type: MT4115
        Number of ports: 1
        Firmware version: 12.20.1030

Thanks a lot for any help!

L3D

Image may be NSFW.
Clik here to view.

↧

Is SX6012 switch compatible with Intel Omnipath Interface card?

November 20, 2017, 11:27 am

≫ Next: Re: Mellanox ConnectX-3 Disable RDMA

≪ Previous: Cannot set RX/TX channels on ConnectX4 MT27700

Sorry if it's a noob question. Can I connect Intel Omnipath host interface card (PCIe) to the Mellanox SX6012 switch?

I think I configured everything on the workstation, but when I plug in the cable to Mellanox switch, I never see the lights lit up.

Status of the OPA interface card shows it's either "polling" or "disconnected".

Thanks!

Image may be NSFW.
Clik here to view.

↧

Re: Mellanox ConnectX-3 Disable RDMA

November 21, 2017, 5:17 am

≫ Next: including different OS

≪ Previous: Is SX6012 switch compatible with Intel Omnipath Interface card?

You can change to ETH + ETH mode via firmware tools from Mellanox...:)

BR,

Jae-Hoon Choi

Image may be NSFW.
Clik here to view.

↧

including different OS

November 21, 2017, 11:48 pm

≫ Next: ConnectX-3 with Cisco ACI

≪ Previous: Re: Mellanox ConnectX-3 Disable RDMA

IHAC,

2 questions,

1,SX36xx series managment module included different Mellanox OS version.

then upper OS version copy to lower OS version?

2, I need to get the release note of Mellanox OS3.5.1006-000 and

what does it support documents of CRA and plz get of CRA's link to me

thanks,

regrads,

tetsu

↧

ConnectX-3 with Cisco ACI

November 22, 2017, 6:24 am

≫ Next: mlx4_en flow control

≪ Previous: including different OS

Hello Mellanox people

We have built an openstack implementation on a Cisco ACI network. Cisco 9k's in spine/leaf topology. All the cloud nodes are configured with dual port ConnectX-3 nics at 40 Gb. The ports are LACP bonds to separate leaf switches. They are cabled with twinax. We plan on growing this particular clouds compute base by > %400, meaning we will add another 100+ compute nodes, adding more 9k's and more mellanox nic's. we are concerned about this as we have so many problems with what we have right now.

We are having huge problems, to the point where many applications have stopped. They either crash or timeout waiting on io. Storage for our cloud is over the network, in the form of a ceph cluster. So network latency is important.

Host configuration:

NIC: Mellanox Ethernet Controller MT27500 - ConnectX-3 Dual Port 40Gbe QSFP+ - Device 0079

Red Hat 7.2 kernel 3.10.0-514 (which is in the 7.3 tree)

Mellanox driver: Stock driver 2.2-1 from red hat kernel package

/etc/modprobe.d/mlx4.conf left as is and tried:

options mlx4_en pfctx=3 pfcrx=3

Symtoms:

Network latency, and possibly even packet loss. Can't prove this yet, but I believe packets are disappearing. Causes outages and outright failures of client applications and services.

Ceph (storage cluster) has huge problems. Random 5-10 second outages. I believe its packet loss. Red Hat says our ceph problems are caused by network problems and won't support it until its fixed.. So they agree with me!

Cloud hosts drop millions of packets. Packet drop rate is directly proportional to data rates.

Cloud hosts send a lot of pause frames, even at low data rates. 50-100/s Is this normal?

Cloud hosts receives no pause frames

Our network support people say they are seeing a ton of pause frames, large number of buffer drops on switch uplinks

Question:

Does anyone have any experience with Mellanox cards in a Cisco ACI environment?

flow control. The default configuration for the ConnectX-3 card flow control is LLFC or 802.3x port based. That is my understanding, please correct me if I'm wrong. Cisco ACI only supports 802.1Qbb priority based flow control. Would incompatible flow control, as in layer 2 congestion management manifest itself as what we are seeing?

I tried to enable the priority based flow control via the driver configuration. But couldn't tell if it was enabled or not. The cards sends out pause frames, but can't tell what type. Below shows the setting:

cat /sys/module/mlx4_en/parameters/pfctx

cat /sys/module/mlx4_en/parameters/pfcrx

My theory is we have a flow control problem, the switch is confused, the cards as well. My only concern is, even at low data rates, the cards are sending out 50-100 pause frames a second. At higher rates, 100's a second. Is this normal?

When under load, say 5-7 Gb/s bursts of traffic, we get 100-200 dropped packets a second. One system could have 100 million dropped packets on the bond and / or physical nics.

If its not a flow control problem, what could be another root cause, so to speak. Sorry, this is a big question with a lot of factors.

Any help is a huge help. We will be growing this environment, but don't want to commit to these specific switches/NIC's until we can get this working.

Cheers

Rocke

↧

mlx4_en flow control

November 22, 2017, 6:32 am

≫ Next: Re: SR-IOV in ESXi and vSphere 6.5

≪ Previous: ConnectX-3 with Cisco ACI

Hello

release 2.2-1 for red hat driver shows support for 802.1Qbb priority based flow control. Is the default LLFC or port based? Which flow control is enabled by default? I have tried making changes to the /etc/modprobe.d/mlx4.conf to set PFC flow control.

My entry:

options mlx4_en pfctx=3 pfcrx=3

Will this work?

Image may be NSFW.
Clik here to view.

↧

Re: SR-IOV in ESXi and vSphere 6.5

November 23, 2017, 11:23 am

≫ Next: Ubuntu does not probe VFs on ConnectX-3 Infiniband HCA

≪ Previous: mlx4_en flow control

Is there any progress with the SRV-IOV ? We already have ESXi 6.5 u1 and still do not work: /. In native driver there is still no support for max_vfs (esxcli system module parameters list -m nmlx4_core), test on ConnectX-3 EN:

enable_64b_cqe_eqe

int

Enable 64 byte CQEs/EQEs when the the FW supports this

Values : 1 - enabled, 0 - disabled

Default: 0

enable_dmfs

int

Enable Device Managed Flow Steering

Values : 1 - enabled, 0 - disabled

Default: 1

enable_qos

int

Enable Quality of Service support in the HCA

Values : 1 - enabled, 0 - disabled

Default: 0

enable_rocev2

int

Enable RoCEv2 mode for all devices

Values : 1 - enabled, 0 - disabled

Default: 0

enable_vxlan_offloads int

Enable VXLAN offloads when supported by NIC

Values : 1 - enabled, 0 - disabled

Default: 1

log_mtts_per_seg

int

Log2 number of MTT entries per segment

Values : 1-7

Default: 3

log_num_mgm_entry_size int

Log2 MGM entry size, that defines the number of QPs per MCG, for example: value 10 results in 248 QP per MGM entry

Values : 9-12

Default: 12

msi_x

int

Enable MSI-X

Values : 1 - enabled, 0 - disabled

Default: 1

mst_recovery

int

Enable recovery mode(only NMST module is loaded)

Values : 1 - enabled, 0 - disabled

Default: 0

rocev2_udp_port

int

Destination port for RoCEv2

Values : 1-65535 for RoCEv2

Default: 4791

↧

Ubuntu does not probe VFs on ConnectX-3 Infiniband HCA

November 24, 2017, 7:26 am

≫ Next: Re: Ubuntu does not probe VFs on ConnectX-3 Infiniband HCA

≪ Previous: Re: SR-IOV in ESXi and vSphere 6.5

I have a Ubuntu 16 server with a single port ConnectX-3 HCA. I have enabled SRIOV with 8 VFs on the HCA and configured the kernel with 'intel_iommu=on'. /etc/modprobe.d/mlx.conf is configured to load and probe all eight VFs. The the VFs are listed by lspci, but the system does not probe the VFs and create the virtual interfaces on the system. dmesg indicates "Skipping virtual function" for all the VFs. All instructions for configuring VFs I have read indicate my configuration should probe the VFs. Can anyone point me towards what else needs to be configured to enable the VFs?

Below are the details of the system and HCA configuration:

# uname -a

Linux cfmm-h2 4.4.0-98-generic #121-Ubuntu SMP Tue Oct 10 14:24:03 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

# lspci | grep Mellanox

05:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

05:00.1 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

05:00.2 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

05:00.3 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

05:00.4 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

05:00.5 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

05:00.6 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

05:00.7 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

05:01.0 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

# cat /etc/modprobe.d/mlx4.conf

options mlx4_core num_vfs=8 probe_vf=8 port_type_array=1

# mlx4_core gets automatically loaded, load mlx4_en also (LP: #1115710)

softdep mlx4_core post: mlx4_en

# mstflint -d 05:00.0 q

Image type: FS2

FW Version: 2.36.5000

Product Version: 02.36.50.00

Rom Info: type=PXE version=3.4.718 devid=4099

Device ID: 4099

Description: Node Port1 Port2 Sys image

GUIDs: 248a070300ba8e20 248a070300ba8e21 248a070300ba8e22 248a070300ba8e23

MACs: 248a07ba8e21 248a07ba8e22

VSD:

PSID: DEL1100001019

# mstconfig -d 05:00.0 q

Device #1:

----------

Device type: ConnectX3

PCI device: 05:00.0

Configurations: Current

SRIOV_EN 1

NUM_OF_VFS 8

LINK_TYPE_P1 3

LINK_TYPE_P2 3

LOG_BAR_SIZE 3

BOOT_PKEY_P1 0

BOOT_PKEY_P2 0

BOOT_OPTION_ROM_EN_P1 0

BOOT_VLAN_EN_P1 0

BOOT_RETRY_CNT_P1 0

LEGACY_BOOT_PROTOCOL_P1 0

BOOT_VLAN_P1 1

BOOT_OPTION_ROM_EN_P2 0

BOOT_VLAN_EN_P2 0

BOOT_RETRY_CNT_P2 0

LEGACY_BOOT_PROTOCOL_P2 0

BOOT_VLAN_P2 1

# lspci -vv -s 05:00.0

05:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

Subsystem: Mellanox Technologies MT27500 Family [ConnectX-3]

Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+

Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

Latency: 0, Cache Line Size: 32 bytes

Interrupt: pin A routed to IRQ 83

Region 0: Memory at 92400000 (64-bit, non-prefetchable) [size=1M]

Region 2: Memory at 38000800000 (64-bit, prefetchable) [size=8M]

Expansion ROM at <ignored> [disabled]

Capabilities: [40] Power Management version 3

Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)

Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-

Capabilities: [48] Vital Product Data

Product Name: CX353A - ConnectX-3 QSFP

Read-only fields:

[PN] Part number: 079DJ3

[EC] Engineering changes: A03

[SN] Serial number: IL079DJ37403172G0033

[V0] Vendor specific: PCIe Gen3 x8

[RV] Reserved: checksum good, 0 byte(s) reserved

Read/write fields:

[V1] Vendor specific: N/A

[YA] Asset tag: N/A

[RW] Read-write area: 104 byte(s) free

[RW] Read-write area: 253 byte(s) free

[RW] Read-write area: 252 byte(s) free

End

Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-

Vector table: BAR=0 offset=0007c000

PBA: BAR=0 offset=0007d000

Capabilities: [60] Express (v2) Endpoint, MSI 00

DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited

ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+

DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+

RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-

MaxPayload 256 bytes, MaxReadReq 4096 bytes

DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-

LnkCap: Port #8, Speed 8GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited, L1 unlimited

ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+

LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+

ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported

DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled

LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-

Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-

Compliance De-emphasis: -6dB

LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+

EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-

Capabilities: [c0] Vendor Specific Information: Len=18 <?>

Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)

ARICap: MFVC- ACS-, Next Function: 0

ARICtl: MFVC- ACS-, Function Group: 0

Capabilities: [148 v1] Device Serial Number 24-8a-07-03-00-ba-8e-20

Capabilities: [154 v2] Advanced Error Reporting

UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

UESvrt: DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-

CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+

CEMsk: RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+

AERCap: First Error Pointer: 00, GenCap+ CGenEn+ ChkCap+ ChkEn+

Capabilities: [18c v1] #19

Capabilities: [108 v1] Single Root I/O Virtualization (SR-IOV)

IOVCap: Migration-, Interrupt Message Number: 000

IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy+

IOVSta: Migration-

Initial VFs: 8, Total VFs: 8, Number of VFs: 8, Function Dependency Link: 00

VF offset: 1, stride: 1, Device ID: 1004

Supported Page Size: 000007ff, System Page Size: 00000001

Region 2: Memory at 0000038001000000 (64-bit, prefetchable)

VF Migration: offset: 00000000, BIR: 0

Kernel driver in use: mlx4_core

Kernel modules: mlx4_core

# dmesg (edited for mlx4 related lines)

[ 0.000000] Command line: BOOT_IMAGE=/ROOT/ubuntu@/boot/vmlinuz-4.4.0-98-generic root=ZFS=rpool/ROOT/ubuntu ro swapaccount=1 intel_iommu=on

[ 3.540033] mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)

[ 3.546709] mlx4_core: Initializing 0000:05:00.0

[ 9.511353] mlx4_core 0000:05:00.0: Enabling SR-IOV with 8 VFs

[ 9.620597] pci 0000:05:00.1: [15b3:1004] type 00 class 0x028000

[ 9.627586] pci 0000:05:00.1: Max Payload Size set to 256 (was 128, max 512)

[ 9.637048] iommu: Adding device 0000:05:00.1 to group 49

[ 9.643872] mlx4_core: Initializing 0000:05:00.1

[ 9.650716] mlx4_core 0000:05:00.1: enabling device (0000 -> 0002)

[ 9.658389] mlx4_core 0000:05:00.1: Skipping virtual function:1

[ 9.665756] pci 0000:05:00.2: [15b3:1004] type 00 class 0x028000

[ 9.672748] pci 0000:05:00.2: Max Payload Size set to 256 (was 128, max 512)

[ 9.682097] iommu: Adding device 0000:05:00.2 to group 50

[ 9.688952] mlx4_core: Initializing 0000:05:00.2

[ 9.695814] mlx4_core 0000:05:00.2: enabling device (0000 -> 0002)

[ 9.703545] mlx4_core 0000:05:00.2: Skipping virtual function:2

[ 9.710846] pci 0000:05:00.3: [15b3:1004] type 00 class 0x028000

[ 9.717871] pci 0000:05:00.3: Max Payload Size set to 256 (was 128, max 512)

[ 9.727364] iommu: Adding device 0000:05:00.3 to group 51

[ 9.734288] mlx4_core: Initializing 0000:05:00.3

[ 9.741154] mlx4_core 0000:05:00.3: enabling device (0000 -> 0002)

[ 9.748832] mlx4_core 0000:05:00.3: Skipping virtual function:3

[ 9.756277] pci 0000:05:00.4: [15b3:1004] type 00 class 0x028000

[ 9.763268] pci 0000:05:00.4: Max Payload Size set to 256 (was 128, max 512)

[ 9.772810] iommu: Adding device 0000:05:00.4 to group 52

[ 9.779724] mlx4_core: Initializing 0000:05:00.4

[ 9.786588] mlx4_core 0000:05:00.4: enabling device (0000 -> 0002)

[ 9.794256] mlx4_core 0000:05:00.4: Skipping virtual function:4

[ 9.801504] pci 0000:05:00.5: [15b3:1004] type 00 class 0x028000

[ 9.808499] pci 0000:05:00.5: Max Payload Size set to 256 (was 128, max 512)

[ 9.817815] iommu: Adding device 0000:05:00.5 to group 53

[ 9.824644] mlx4_core: Initializing 0000:05:00.5

[ 9.831359] mlx4_core 0000:05:00.5: enabling device (0000 -> 0002)

[ 9.838894] mlx4_core 0000:05:00.5: Skipping virtual function:5

[ 9.846081] pci 0000:05:00.6: [15b3:1004] type 00 class 0x028000

[ 9.853086] pci 0000:05:00.6: Max Payload Size set to 256 (was 128, max 512)

[ 9.862328] iommu: Adding device 0000:05:00.6 to group 54

[ 9.868962] mlx4_core: Initializing 0000:05:00.6

[ 9.875601] mlx4_core 0000:05:00.6: enabling device (0000 -> 0002)

[ 9.883002] mlx4_core 0000:05:00.6: Skipping virtual function:6

[ 9.890082] pci 0000:05:00.7: [15b3:1004] type 00 class 0x028000

[ 9.897070] pci 0000:05:00.7: Max Payload Size set to 256 (was 128, max 512)

[ 9.906218] iommu: Adding device 0000:05:00.7 to group 55

[ 9.912806] mlx4_core: Initializing 0000:05:00.7

[ 9.919380] mlx4_core 0000:05:00.7: enabling device (0000 -> 0002)

[ 9.926935] mlx4_core 0000:05:00.7: Skipping virtual function:7

[ 9.934160] pci 0000:05:01.0: [15b3:1004] type 00 class 0x028000

[ 9.941145] pci 0000:05:01.0: Max Payload Size set to 256 (was 128, max 512)

[ 9.950497] iommu: Adding device 0000:05:01.0 to group 56

[ 9.957354] mlx4_core: Initializing 0000:05:01.0

[ 9.964295] mlx4_core 0000:05:01.0: enabling device (0000 -> 0002)

[ 9.972187] mlx4_core 0000:05:01.0: Skipping virtual function:8

[ 9.979753] mlx4_core 0000:05:00.0: Running in master mode

[ 9.986885] mlx4_core 0000:05:00.0: PCIe link speed is 8.0GT/s, device supports 8.0GT/s

[ 9.994178] mlx4_core 0000:05:00.0: PCIe link width is x8, device supports x8

[ 10.178999] mlx4_core: Initializing 0000:05:00.1

[ 10.186463] mlx4_core 0000:05:00.1: enabling device (0000 -> 0002)

[ 10.194814] mlx4_core 0000:05:00.1: Skipping virtual function:1

[ 10.202620] mlx4_core: Initializing 0000:05:00.2

[ 10.210054] mlx4_core 0000:05:00.2: enabling device (0000 -> 0002)

[ 10.218256] mlx4_core 0000:05:00.2: Skipping virtual function:2

[ 10.225917] mlx4_core: Initializing 0000:05:00.3

[ 10.233189] mlx4_core 0000:05:00.3: enabling device (0000 -> 0002)

[ 10.241256] mlx4_core 0000:05:00.3: Skipping virtual function:3

[ 10.248961] mlx4_core: Initializing 0000:05:00.4

[ 10.256108] mlx4_core 0000:05:00.4: enabling device (0000 -> 0002)

[ 10.264085] mlx4_core 0000:05:00.4: Skipping virtual function:4

[ 10.271598] mlx4_core: Initializing 0000:05:00.5

[ 10.278675] mlx4_core 0000:05:00.5: enabling device (0000 -> 0002)

[ 10.286606] mlx4_core 0000:05:00.5: Skipping virtual function:5

[ 10.294012] mlx4_core: Initializing 0000:05:00.6

[ 10.301002] mlx4_core 0000:05:00.6: enabling device (0000 -> 0002)

[ 10.308782] mlx4_core 0000:05:00.6: Skipping virtual function:6

[ 10.316133] mlx4_core: Initializing 0000:05:00.7

[ 10.323060] mlx4_core 0000:05:00.7: enabling device (0000 -> 0002)

[ 10.330772] mlx4_core 0000:05:00.7: Skipping virtual function:7

[ 10.338002] mlx4_core: Initializing 0000:05:01.0

[ 10.344802] mlx4_core 0000:05:01.0: enabling device (0000 -> 0002)

[ 10.352404] mlx4_core 0000:05:01.0: Skipping virtual function:8

[ 10.366153] mlx4_en: Mellanox ConnectX HCA Ethernet driver v2.2-1 (Feb 2014)

[ 23.574588] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v2.2-1 (Feb 2014)

[ 23.585484] <mlx4_ib> mlx4_ib_add: counter index 0 for port 1 allocated 0

[ 23.664067] mlx4_core 0000:05:00.0: mlx4_ib: multi-function enabled

[ 23.679324] mlx4_core 0000:05:00.0: mlx4_ib: initializing demux service for 128 qp1 clients

↧

Re: Ubuntu does not probe VFs on ConnectX-3 Infiniband HCA

November 24, 2017, 1:08 pm

≫ Next: Re: Is SX6012 switch compatible with Intel Omnipath Interface card?

≪ Previous: Ubuntu does not probe VFs on ConnectX-3 Infiniband HCA

The issue appears to be that the module parameter need to be /etc/modprobe.d/mlx4_core.conf and not /etc/modprobe.d/mlx4.conf. After moving the parameters into the correct file the VFs are probed as expected on boot.

# cat /etc/modprobe.d/mlx4_core.conf

options mlx4_core num_vfs=8 probe_vf=8 port_type_array=1

↧

Re: Is SX6012 switch compatible with Intel Omnipath Interface card?

November 25, 2017, 7:30 am

≫ Next: Re: Is SX6012 switch compatible with Intel Omnipath Interface card?

≪ Previous: Re: Ubuntu does not probe VFs on ConnectX-3 Infiniband HCA

Hi. OmniPath (OPA) is an Intel proprietary protocol and is only implemented by OPA switches. You didn't mention what type of cable you're using, but OPA cables are also proprietary, and not compatible with Ethernet or InfiniBand.

Image may be NSFW.
Clik here to view.

↧