Quantcast
Viewing all 6226 articles
Browse latest View live

Omni-Path vs. Mellanox


Re: Mellanox IS 5030 Licenses

Hi Jae-Hoon,

 

Thank you for your reply.  I contacted support and got the switches upgraded to the latest firmware.  For others seeking the latest and probably the final version of the firmware, it is Firmware (latest.  EOL.  No more upgrades):  EFM_PPC_M405EX EFM_1.1.3004. 

The status of Mellanox SN2100'S LED

Hello,everyone

I want to know the status of Mellanox  SN2100'S LED.Today,I use the 100G cables in the switch and connect it between port 1 to port 3 ,then the led  links up,but the status is flap,the flap frequency is one second;but when I ues the 100G SR4 optical transceiver ,the flap  frequency is once in 30 seconds,this is the first time i use the SN2100, So is it normal ?  which status should it be?  Thanks for your help in advance.

Add Cisco switch to Mellanox NEO trouble.

Good afternoon Colleagues!

I ask you to help me with the connection of CICCO switches in Mellanox NEO.

After the start of provisioning and waiting, there is an error of timeout. Here's what's in the logs:

 

2017-11-15 20:54:06.273 job INFO performAction created a new job (20) for Provisioning

2017-11-15 20:54:06.300 job INFO performAction created sub-job (20.1) for device: 10.10.0.4

2017-11-15 20:54:06.301 job INFO Preparing job notification for job (20 - Provisioning), status:(New), progress: (0)

2017-11-15 20:54:06.301 job INFO job: (20.1), status: (New), progress: (0), device: (10.10.0.4)

2017-11-15 20:54:06.301 zmq INFO Send Message Topic:notification, category:notifications/jobs

2017-11-15 20:54:06.355 netservice INFO Performing action run_cli on devices

2017-11-15 20:54:06.355 netservice INFO commandline: [u'show running-config']

2017-11-15 20:54:06.355 netservice INFO arguments: {"globals": {}, "devices": {}}

2017-11-15 20:54:07.103 cli-facility INFO running : /opt/neo/providers/common/bin/providers/common/tools/clifacility/cli_facility.pyo --hosts /tmp/tmpMADHzY/devices.csv --listen-port 60374 --file /tmp/tmpMADHzY/commands.txt --pool-size 30 --operation-timeout 120

2017-11-15 20:54:07.115 cli-facility INFO Starting thread pool: size=30

2017-11-15 20:54:07.119 cli-facility INFO Handling Host: 10.10.0.4

2017-11-15 20:56:07.121 cli-facility WARNING Killing thread for context: 10.10.0.4

2017-11-15 20:56:07.121 cli-facility WARNING Timeout operation for Host(s): 10.10.0.4

2017-11-15 20:56:07.121 cli-facility INFO Result is ready...

2017-11-15 20:56:07.132 cli-facility INFO Sending result to client...

2017-11-15 20:56:07.132 job WARNING job 20.1 failed: Timeout while communicating with: 10.10.0.4

2017-11-15 20:56:07.133 job INFO updateJobStatus updated job (20.1) with status 32772

2017-11-15 20:56:07.133 job INFO Preparing job notification for job (20 - Provisioning), status:(Completed With Errors), progress: (100)

2017-11-15 20:56:07.133 job INFO job: (20.1), status: (Completed With Errors - Timeout while communicating with: 10.10.0.4), progress: (100), device: (10.10.0.4)

2017-11-15 20:56:07.135 cli-facility INFO Stopping thread pool

2017-11-15 20:56:07.134 zmq INFO Send Message Topic:notification, category:notifications/jobs

 

 

Thank you in advance!

Re: The status of Mellanox SN2100'S LED

Hi Ben,

 

When you say flap, you mean led blinking?

blink every second is probably STP bpdus.

blink every 30 seconds is LLDP maybe?

Image may be NSFW.
Clik here to view.

ConnectX-4 25Gbps link not detected with Arista 7150 switch

Hi,

 

I am noticing that when my connectx-4 25Gbps card is connected to the Arista 7150 switch (10Gbps) ports, the link is not getting detected.  

 

Based on the documents I read, my understanding is that the auto-negotiate should successfully negotiate the link to 10Gbps. 

 

I also tried to disable auto negotiate and set the port speed to 10Gbps and still it is not working.

 

The details:

 

Arista switch:

 

localhost>show version

Software image version: 4.12.7.1

Architecture:           i386

 

Host:

 

linux-6cof:~ # cat /etc/SuSE-release

SUSE Linux Enterprise Server 12 (x86_64)

VERSION = 12

PATCHLEVEL = 3

# This file is deprecated and will be removed in a future service pack or release.

# Please check /etc/os-release for details about this release.

linux-6cof:~ # uname -r

4.4.73-5-default

 

 

linux-6cof:~ # ethtool eth1

Settings for eth1:

        Supported ports: [ FIBRE Backplane ]

        Supported link modes:   1000baseKX/Full

                                10000baseKR/Full

                                25000baseCR/Full

                                25000baseKR/Full

                                25000baseSR/Full

        Supported pause frame use: Symmetric

        Supports auto-negotiation: Yes

        Advertised link modes:  10000baseKR/Full

        Advertised pause frame use: Symmetric

        Advertised auto-negotiation: No

        Speed: Unknown!

        Duplex: Unknown! (255)

        Port: FIBRE

        PHYAD: 0

        Transceiver: internal

        Auto-negotiation: off

        Supports Wake-on: d

        Wake-on: d

        Link detected: no

 

linux-6cof:~ # modinfo mlx5_core

filename:       /lib/modules/4.4.73-5-default/kernel/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.ko

version:        3.0-1

license:        Dual BSD/GPL

description:    Mellanox Connect-IB, ConnectX-4 core driver

author:         Eli Cohen <eli@mellanox.com>

Image may be NSFW.
Clik here to view.

Re: The status of Mellanox SN2100'S LED

Hi Eddie,

  yes, the flap means led blinking,and it's  very regular,I will ping IP address  between two switches to observe the STP bpdus, It can help us to analyse the status of led,I will tell you the result when I finish,Thanks so much.

Image may be NSFW.
Clik here to view.

Achieving 40Gbps with Ethernet mode on ConnectX-3 VPI

Hi,

i'm trying to get some more understanding  of achieving near line speed of 40Gbps on the following adapter card:

[root@compute8 scripts]# lspci -vv -s 07:00.0  | grep "Part number" -A 3

            [PN] Part number: MCX354A-FCBT      

            [EC] Engineering changes: A4

            [SN] Serial number: MT1334U01416         

            [V0] Vendor specific: PCIe Gen3 x8.

Some system info

[root@compute8 scripts]# cat /etc/centos-release

CentOS Linux release 7.3.1611 (Core)

[root@compute8 scripts]# ofed_info -s

MLNX_OFED_LINUX-4.1-1.0.2.0:

[root@compute8 scripts]# uname -r

3.10.0-514.26.2.el7.x86_64

 

2 identical HP ProLiant DL360p Gen8 servers equipped with 2 Quad Core Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz CPUs and 32GB RAM. Performance profile is  network-throughput (using tuned) on both servers.

The ConnectX-3 cards are connected back to back (no switch) with a Mellanox FDR copper cable (1m long). It's been put into Ethernet mode and i've followed the recommended optimization guide at Performance Tuning for Mellanox Adapters .

The problem is achieving anything near line speed of 40Gbps. I've tested with iperf2 since  iperf, iperf2, iperf3 recommends it (and not to use iperf3):

Server side:

[root@compute7 ~]# iperf -v

..

..

 

Client side:

[root@compute8 scripts]# iperf   -c 192.168.100.1  -P2

------------------------------------------------------------

Client connecting to 192.168.100.1, TCP port 5001

TCP window size:  325 KByte (default)

------------------------------------------------------------

[  3] local 192.168.100.2 port 54430 connected with 192.168.100.1 port 5001

[  4] local 192.168.100.2 port 54432 connected with 192.168.100.1 port 5001

[ ID] Interval       Transfer     Bandwidth

[  3]  0.0-10.0 sec  17.1 GBytes  14.7 Gbits/sec

[  4]  0.0-10.0 sec  17.1 GBytes  14.7 Gbits/sec

[SUM]  0.0-10.0 sec  34.1 GBytes  29.3 Gbits/sec

[root@compute8 scripts]# iperf   -c 192.168.100.1  -P3

------------------------------------------------------------

Client connecting to 192.168.100.1, TCP port 5001

TCP window size:  325 KByte (default)

------------------------------------------------------------

[  5] local 192.168.100.2 port 54438 connected with 192.168.100.1 port 5001

[  4] local 192.168.100.2 port 54434 connected with 192.168.100.1 port 5001

[  3] local 192.168.100.2 port 54436 connected with 192.168.100.1 port 5001

[ ID] Interval       Transfer     Bandwidth

[  5]  0.0-10.0 sec  14.4 GBytes  12.4 Gbits/sec

[  4]  0.0-10.0 sec  15.2 GBytes  13.1 Gbits/sec

[  3]  0.0-10.0 sec  15.3 GBytes  13.1 Gbits/sec

[SUM]  0.0-10.0 sec  44.9 GBytes  38.6 Gbits/sec

[root@compute8 scripts]# iperf   -c 192.168.100.1  -P4

------------------------------------------------------------

Client connecting to 192.168.100.1, TCP port 5001

TCP window size:  325 KByte (default)

------------------------------------------------------------

[  6] local 192.168.100.2 port 54446 connected with 192.168.100.1 port 5001

[  4] local 192.168.100.2 port 54440 connected with 192.168.100.1 port 5001

[  5] local 192.168.100.2 port 54444 connected with 192.168.100.1 port 5001

[  3] local 192.168.100.2 port 54442 connected with 192.168.100.1 port 5001

[ ID] Interval       Transfer     Bandwidth

[  6]  0.0-10.0 sec  11.0 GBytes  9.47 Gbits/sec

[  4]  0.0-10.0 sec  12.4 GBytes  10.6 Gbits/sec

[  5]  0.0-10.0 sec  13.0 GBytes  11.2 Gbits/sec

[  3]  0.0-10.0 sec  8.09 GBytes  6.95 Gbits/sec

[SUM]  0.0-10.0 sec  44.5 GBytes  38.2 Gbits/sec

 

So it seems a minimum of 3 threads is needed to get close to line speed. Increasing threads above 3 doesn't improve anything.  While running the above tests, i was able to observe a lot of changes in /proc/interrupts for ens2 ( port 1 of ConnectX-3), which means there interrupts being generated to request CPU time. This should not happen when RDMA is in use and i've confirmed RDMA is working using some of the tools (ib_send_bw, rping, udaddy, rdma_server etc) mentioned in HowTo Enable, Verify and Troubleshoot RDMA.

Why are these Mellanox utilities performing as intended? Is the answer built in RDMA support?

 

Further, using perf gives me some details:

[root@compute8 scripts]# perf stat -e  cpu-migrations,context-switches,task-clock,cycles,instructions,cache-references,cache-misses iperf -c 192.168.100.1  -P4

------------------------------------------------------------

Client connecting to 192.168.100.1, TCP port 5001

TCP window size:  325 KByte (default)

------------------------------------------------------------

[  6] local 192.168.100.2 port 54470 connected with 192.168.100.1 port 5001

[  4] local 192.168.100.2 port 54464 connected with 192.168.100.1 port 5001

[  3] local 192.168.100.2 port 54466 connected with 192.168.100.1 port 5001

[  5] local 192.168.100.2 port 54468 connected with 192.168.100.1 port 5001

[ ID] Interval       Transfer     Bandwidth

[  6]  0.0-10.0 sec  10.6 GBytes  9.08 Gbits/sec

[  4]  0.0-10.0 sec  11.5 GBytes  9.85 Gbits/sec

[  3]  0.0-10.0 sec  12.4 GBytes  10.7 Gbits/sec

[  5]  0.0-10.0 sec  10.1 GBytes  8.69 Gbits/sec

[SUM]  0.0-10.0 sec  44.6 GBytes  38.3 Gbits/sec

 

Performance counter stats for 'iperf -c 192.168.100.1 -P4':

 

               126      cpu-migrations            #    0.005 K/sec               

            11,934      context-switches          #    0.446 K/sec               

      26730.400620      task-clock (msec)         #    2.666 CPUs utilized       

    63,926,425,845      cycles                    #    2.392 GHz                 

    25,417,772,891      instructions              #    0.40  insn per cycle                                         

     1,786,983,037      cache-references          #   66.852 M/sec               

       446,840,327      cache-misses              #   25.005 % of all cache refs 

 

      10.025755759 seconds time elapsed

 

For instance, i observe the high numbers of CPU context switches that are very costly.

 

But then after some research efforts i discovered http://ftp100.cewit.stonybrook.edu/rperf. Using rperf server and client, i was able to achieve near line speed without any further effort and increased threads:

Server side:

root@compute7 ~]# rperf -s -p 5001 -l 500M -H

...

Client side:

[root@compute8 scripts]# perf stat -e  cpu-migrations,context-switches,task-clock,cycles,instructions,cache-references,cache-misses rperf -c $IP -p 5001 -H -G pw -l 500M -i 2

------------------------------------------------------------

RDMA Client connecting to 192.168.100.1, TCP port 5001

TCP window size: -1.00 Byte (default)

------------------------------------------------------------

[  4] local 192.168.100.2 port 40580 connected with 192.168.100.1 port 5001

[ ID] Interval       Transfer     Bandwidth

[  4]  0.0- 2.0 sec  8.79 GBytes  37.7 Gbits/sec

[  4]  2.0- 4.0 sec  9.28 GBytes  39.8 Gbits/sec

[  4]  4.0- 6.0 sec  8.79 GBytes  37.7 Gbits/sec

[  4]  6.0- 8.0 sec  9.28 GBytes  39.8 Gbits/sec

[  4]  8.0-10.0 sec  9.28 GBytes  39.8 Gbits/sec

[  4]  0.0-10.1 sec  45.9 GBytes  39.1 Gbits/sec

 

Performance counter stats for 'rperf -c 192.168.100.1 -p 5001 -H -G pw -l 500M -i 2':

 

                13      cpu-migrations            #    0.007 K/sec               

             1,348      context-switches          #    0.734 K/sec               

       1836.487997      task-clock (msec)         #    0.152 CPUs utilized       

     4,393,238,230      cycles                    #    2.392 GHz                 

     9,201,275,892      instructions              #    2.09  insn per cycle                                         

        26,855,320      cache-references          #   14.623 M/sec               

        23,419,862      cache-misses              #   87.208 % of all cache refs 

 

      12.084867922 seconds time elapsed

 

Note the numbers in CPU context switches that are very low compared to the iperf run and the low CPU utlization task-clock (msec). Monitoring /proc/loadavg also showed low CPU utilization.

Another important observation i did is that there are just a few interrupts generated as seen from /proc/interrupts and more importantly the transmitted and received packets are unchanged monitoring /proc/net/dev from the client side while rperf is being run. This clearly indicates that RDMA is being used here to move data from the application and directly into the server, bypassing the kernel.

Also, what i've found out is that the tuning of MTU is 9000 is vital, with default settings, even rperf is performing pretty bad!

 

According to Vangelis post Cannot get 40Gbps on Ethernet mode with ConnectX-3 VPI ,he was able to achieve near line speed using iperf2 without the need of using multi threads. Why am i not able to any longer? Is RDMA support removed from iperf2 (and it's been there at some point earlier)?

 

What's important in the end is not the benchmarks, but the actual application the systems will be running on Linux with such an above mentioned setup. How and will they be able to take advantage of RDMA and achieve anything close to the line speed? Do each of the running applications explicitly support RDMA in order to achieve the high speeds?


HPE IB Adapter not recognized by Windows

Hello,

 

I realize I may be better posting on the HP forum, but frankly I have always found this forum the most knowledgeable when it comes to anything Infiniband.

 

I have a number of HPE BL460c blades that I have installed the HP FDR IB Adapters 545M(702213-B21).  I have installed the IB Drivers from HPE v5.35. Strangely I could not find an Installation Guide or a Manual for this card.

 

The Adapters show up as Unknown Devices in Windows.  Tried Win12 R2 and Win16 with same results.

 

I have update the firmware on the adapter to the current 10.16.1058.

 

I have been working on this for a couple days and cannot get the cards to work.

 

I thought someone here might be able to help shed some insight on what I my try.

 

Thanks,

 

Todd

Image may be NSFW.
Clik here to view.

Hyper-V/Win2016 Switch Embedded Teaming on VPI IPoIB Interface with Multiple PKey

With Windows Server 2016, documentations are available related to SET on MLNX card in 40GB ethernet mode with multiple VLAN with support of RDMA.  Is this possible with MLNX card in IB mode running IPoIB with multiple PKey if the infrastructure is based on IB and has IB-IP gateway available?  part_man.exe creates new NIC for each PKey specified.  Do you create multiple SET per PKey based on those "virtual" NIC created by part_man.exe?  Will the SET still have the capability of RDMA?

Image may be NSFW.
Clik here to view.

Cannot set RX/TX channels on ConnectX4 MT27700

Hallo,

 

I recently got a ConnectX4, and I found that I cannot set RX / TX channels using ethtool as I was able to do with the ConnectX3 Pro. If I check, I see:

# ethtool -l ens5f0

Channel parameters for ens5f0:

Pre-set maximums:

RX:             0

TX:             0

Other:          512

Combined:       36

# ethtool -L ens5f0 tx 3
Cannot set device channel parameters: Invalid argument

which seems a bit weird. I had a look at the documentation, and found nothing in this respect. Is this something expected? I am using 4.1.1.0.2, and:

 

 

CA 'mlx5_0'
        CA type: MT4115
        Number of ports: 1
        Firmware version: 12.20.1030

Thanks a lot for any help!

 

L3D

Image may be NSFW.
Clik here to view.

Is SX6012 switch compatible with Intel Omnipath Interface card?

Sorry if it's a noob question. Can I connect Intel Omnipath host interface card (PCIe) to the Mellanox SX6012 switch?

I think I configured everything on the workstation, but when I plug in the cable to Mellanox switch, I never see the lights lit up.

Status of the OPA interface card shows it's either "polling" or "disconnected".

 

Thanks!

Image may be NSFW.
Clik here to view.

Re: Mellanox ConnectX-3 Disable RDMA

You can change to ETH + ETH mode via firmware tools from Mellanox...:)

 

BR,

Jae-Hoon Choi

Image may be NSFW.
Clik here to view.

including different OS

IHAC,

 

2 questions,

1,SX36xx series managment  module included different Mellanox OS version.

then upper OS version copy to lower OS version?

2, I need to get the release note of Mellanox OS3.5.1006-000 and

   what does it support documents of  CRA and plz get of CRA's link to me

 

thanks,

regrads,

 

tetsu

ConnectX-3 with Cisco ACI

Hello Mellanox people


We have built an openstack implementation on a Cisco ACI network. Cisco 9k's in spine/leaf topology.  All the cloud nodes are configured with dual port ConnectX-3 nics at 40 Gb. The ports are LACP bonds to separate leaf switches. They are cabled with twinax. We plan on growing this particular clouds compute base by > %400, meaning we will add another 100+ compute nodes, adding more 9k's and more mellanox nic's.  we are concerned about this as we have so many problems with what we have right now.

 

We are having huge problems, to the point where many applications have stopped. They either crash or timeout waiting on io.  Storage for our cloud is over the network, in the form of a ceph cluster.  So network latency is important.

 

Host configuration:

NIC: Mellanox Ethernet Controller MT27500 - ConnectX-3 Dual Port 40Gbe QSFP+ -  Device 0079

Red Hat 7.2 kernel 3.10.0-514  (which is in the 7.3 tree)

Mellanox driver:  Stock driver 2.2-1 from red hat kernel package

/etc/modprobe.d/mlx4.conf  left as is and tried:

options mlx4_en pfctx=3 pfcrx=3

 

Symtoms:

Network latency, and possibly even packet loss.  Can't prove this yet, but I believe packets are disappearing.  Causes outages and outright failures of client applications and services.

Ceph (storage cluster) has huge problems. Random 5-10 second outages. I believe its packet loss. Red Hat says our ceph problems are caused by network problems and won't support it until its fixed..  So they agree with me!

Cloud hosts drop millions of packets.  Packet drop rate is directly proportional to data rates.

Cloud hosts send a lot of pause frames, even at low data rates. 50-100/s  Is this normal?

Cloud hosts receives no pause frames

Our network support people say they are seeing a ton of pause frames, large number of buffer drops on switch uplinks

 

Question:

Does anyone have any experience with Mellanox cards in a Cisco ACI environment?

  1. flow control. The default configuration for the ConnectX-3 card flow control is LLFC or 802.3x port based.  That is my understanding, please correct me if I'm wrong.  Cisco ACI only supports 802.1Qbb priority based flow control.   Would incompatible flow control, as in layer 2 congestion management manifest itself as what we are seeing?

I tried to enable the priority based flow control via the driver configuration.  But couldn't tell if it was enabled or not. The cards sends out pause frames, but can't tell what type. Below shows the setting:

 

cat /sys/module/mlx4_en/parameters/pfctx

0

cat /sys/module/mlx4_en/parameters/pfcrx

0

 

My theory is we have a flow control problem, the switch is confused, the cards as well.  My only concern is, even at low data rates, the cards are sending out 50-100 pause frames a second.  At higher rates, 100's a second.  Is this normal?

When  under load, say 5-7 Gb/s bursts of traffic, we get 100-200 dropped packets a second. One system could have 100 million dropped packets on the bond and / or physical nics.

 

If its not a flow control problem, what could be another root cause, so to speak.  Sorry, this is a big question with a lot of factors.

 

Any help is a huge help.  We will be growing this environment, but don't want to commit to these specific switches/NIC's until we can get this working.

 

Cheers

Rocke


mlx4_en flow control

Hello

 

release 2.2-1 for red hat driver shows support for 802.1Qbb priority based flow control.  Is the default LLFC or port based?  Which flow control is enabled by default?  I have tried making changes to the /etc/modprobe.d/mlx4.conf to set PFC flow control.

My entry:

 

options mlx4_en pfctx=3 pfcrx=3

 

Will this work?

Image may be NSFW.
Clik here to view.

Re: SR-IOV in ESXi and vSphere 6.5

Is there any progress with the SRV-IOV ? We already have ESXi 6.5 u1 and still do not work: /. In native driver there is still no support for max_vfs (esxcli system module parameters list -m nmlx4_core), test on ConnectX-3 EN:

enable_64b_cqe_eqe int     Enable 64 byte CQEs/EQEs when the the FW supports this

   Values : 1 - enabled, 0 - disabled

   Default: 0

 

enable_dmfs        int     Enable Device Managed Flow Steering

   Values : 1 - enabled, 0 - disabled

   Default: 1

 

enable_qos         int     Enable Quality of Service support in the HCA

   Values : 1 - enabled, 0 - disabled

   Default: 0

 

enable_rocev2      int     Enable RoCEv2 mode for all devices

   Values : 1 - enabled, 0 - disabled

   Default: 0

 

enable_vxlan_offloads   int     Enable VXLAN offloads when supported by NIC

   Values : 1 - enabled, 0 - disabled

   Default: 1

 

log_mtts_per_seg   int     Log2 number of MTT entries per segment

   Values : 1-7

   Default: 3

 

log_num_mgm_entry_size  int     Log2 MGM entry size, that defines the number of QPs per MCG, for example: value 10 results in 248 QP per MGM entry

   Values : 9-12

   Default: 12

 

msi_x              int     Enable MSI-X

   Values : 1 - enabled, 0 - disabled

   Default: 1

 

mst_recovery       int     Enable recovery mode(only NMST module is loaded)

   Values : 1 - enabled, 0 - disabled

   Default: 0

 

rocev2_udp_port    int     Destination port for RoCEv2

   Values : 1-65535 for RoCEv2

   Default: 4791

Ubuntu does not probe VFs on ConnectX-3 Infiniband HCA

I have a Ubuntu 16 server with a single port ConnectX-3 HCA. I have enabled SRIOV with 8 VFs on the HCA and configured the kernel with 'intel_iommu=on'. /etc/modprobe.d/mlx.conf is configured to load and probe all eight VFs. The the VFs are listed by lspci, but the system does not probe the VFs and create the virtual interfaces on the system. dmesg indicates "Skipping virtual function" for all the VFs.  All instructions for configuring VFs I have read indicate my configuration should probe the VFs. Can anyone point me towards what else needs to be configured to enable the VFs?

 

Below are the details of the system and HCA configuration:

 

# uname -a

Linux cfmm-h2 4.4.0-98-generic #121-Ubuntu SMP Tue Oct 10 14:24:03 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

 

# lspci | grep Mellanox

05:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

05:00.1 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

05:00.2 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

05:00.3 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

05:00.4 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

05:00.5 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

05:00.6 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

05:00.7 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

05:01.0 Network controller: Mellanox Technologies MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]

 

# cat /etc/modprobe.d/mlx4.conf

options mlx4_core num_vfs=8 probe_vf=8 port_type_array=1

# mlx4_core gets automatically loaded, load mlx4_en also (LP: #1115710)

softdep mlx4_core post: mlx4_en

 

# mstflint -d 05:00.0 q

Image type:      FS2

FW Version:      2.36.5000

Product Version: 02.36.50.00

Rom Info:        type=PXE version=3.4.718 devid=4099

Device ID:       4099

Description:     Node             Port1            Port2            Sys image

GUIDs:           248a070300ba8e20 248a070300ba8e21 248a070300ba8e22 248a070300ba8e23

MACs:                                 248a07ba8e21     248a07ba8e22

VSD:

PSID:            DEL1100001019

 

# mstconfig -d 05:00.0 q

 

Device #1:

----------

 

Device type:    ConnectX3

PCI device:     05:00.0

 

Configurations:                              Current

         SRIOV_EN                            1

         NUM_OF_VFS                          8

         LINK_TYPE_P1                        3

         LINK_TYPE_P2                        3

         LOG_BAR_SIZE                        3

         BOOT_PKEY_P1                        0

         BOOT_PKEY_P2                        0

         BOOT_OPTION_ROM_EN_P1               0

         BOOT_VLAN_EN_P1                     0

         BOOT_RETRY_CNT_P1                   0

         LEGACY_BOOT_PROTOCOL_P1             0

         BOOT_VLAN_P1                        1

         BOOT_OPTION_ROM_EN_P2               0

         BOOT_VLAN_EN_P2                     0

         BOOT_RETRY_CNT_P2                   0

         LEGACY_BOOT_PROTOCOL_P2             0

         BOOT_VLAN_P2                        1

 

# lspci -vv -s 05:00.0

05:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

  Subsystem: Mellanox Technologies MT27500 Family [ConnectX-3]

  Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+

  Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

  Latency: 0, Cache Line Size: 32 bytes

  Interrupt: pin A routed to IRQ 83

  Region 0: Memory at 92400000 (64-bit, non-prefetchable) [size=1M]

  Region 2: Memory at 38000800000 (64-bit, prefetchable) [size=8M]

  Expansion ROM at <ignored> [disabled]

  Capabilities: [40] Power Management version 3

  Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)

  Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-

  Capabilities: [48] Vital Product Data

  Product Name: CX353A - ConnectX-3 QSFP

  Read-only fields:

  [PN] Part number: 079DJ3

  [EC] Engineering changes: A03

  [SN] Serial number: IL079DJ37403172G0033

  [V0] Vendor specific: PCIe Gen3 x8

  [RV] Reserved: checksum good, 0 byte(s) reserved

  Read/write fields:

  [V1] Vendor specific: N/A

  [YA] Asset tag: N/A

  [RW] Read-write area: 104 byte(s) free

  [RW] Read-write area: 253 byte(s) free

  [RW] Read-write area: 253 byte(s) free

  [RW] Read-write area: 253 byte(s) free

  [RW] Read-write area: 253 byte(s) free

  [RW] Read-write area: 253 byte(s) free

  [RW] Read-write area: 253 byte(s) free

  [RW] Read-write area: 253 byte(s) free

  [RW] Read-write area: 253 byte(s) free

  [RW] Read-write area: 253 byte(s) free

  [RW] Read-write area: 253 byte(s) free

  [RW] Read-write area: 253 byte(s) free

  [RW] Read-write area: 253 byte(s) free

  [RW] Read-write area: 253 byte(s) free

  [RW] Read-write area: 253 byte(s) free

  [RW] Read-write area: 252 byte(s) free

  End

  Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-

  Vector table: BAR=0 offset=0007c000

  PBA: BAR=0 offset=0007d000

  Capabilities: [60] Express (v2) Endpoint, MSI 00

  DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited

  ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+

  DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+

  RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-

  MaxPayload 256 bytes, MaxReadReq 4096 bytes

  DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-

  LnkCap: Port #8, Speed 8GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited, L1 unlimited

  ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+

  LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+

  ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

  LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

  DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported

  DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled

  LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-

  Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-

  Compliance De-emphasis: -6dB

  LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+

  EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-

  Capabilities: [c0] Vendor Specific Information: Len=18 <?>

  Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)

  ARICap: MFVC- ACS-, Next Function: 0

  ARICtl: MFVC- ACS-, Function Group: 0

  Capabilities: [148 v1] Device Serial Number 24-8a-07-03-00-ba-8e-20

  Capabilities: [154 v2] Advanced Error Reporting

  UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

  UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

  UESvrt: DLP+ SDES- TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-

  CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+

  CEMsk: RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+

  AERCap: First Error Pointer: 00, GenCap+ CGenEn+ ChkCap+ ChkEn+

  Capabilities: [18c v1] #19

  Capabilities: [108 v1] Single Root I/O Virtualization (SR-IOV)

  IOVCap: Migration-, Interrupt Message Number: 000

  IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy+

  IOVSta: Migration-

  Initial VFs: 8, Total VFs: 8, Number of VFs: 8, Function Dependency Link: 00

  VF offset: 1, stride: 1, Device ID: 1004

  Supported Page Size: 000007ff, System Page Size: 00000001

  Region 2: Memory at 0000038001000000 (64-bit, prefetchable)

  VF Migration: offset: 00000000, BIR: 0

  Kernel driver in use: mlx4_core

  Kernel modules: mlx4_core

 

# dmesg (edited for mlx4 related lines)

[    0.000000] Command line: BOOT_IMAGE=/ROOT/ubuntu@/boot/vmlinuz-4.4.0-98-generic root=ZFS=rpool/ROOT/ubuntu ro swapaccount=1 intel_iommu=on

[    3.540033] mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)

[    3.546709] mlx4_core: Initializing 0000:05:00.0

[    9.511353] mlx4_core 0000:05:00.0: Enabling SR-IOV with 8 VFs

[    9.620597] pci 0000:05:00.1: [15b3:1004] type 00 class 0x028000

[    9.627586] pci 0000:05:00.1: Max Payload Size set to 256 (was 128, max 512)

[    9.637048] iommu: Adding device 0000:05:00.1 to group 49

[    9.643872] mlx4_core: Initializing 0000:05:00.1

[    9.650716] mlx4_core 0000:05:00.1: enabling device (0000 -> 0002)

[    9.658389] mlx4_core 0000:05:00.1: Skipping virtual function:1

[    9.665756] pci 0000:05:00.2: [15b3:1004] type 00 class 0x028000

[    9.672748] pci 0000:05:00.2: Max Payload Size set to 256 (was 128, max 512)

[    9.682097] iommu: Adding device 0000:05:00.2 to group 50

[    9.688952] mlx4_core: Initializing 0000:05:00.2

[    9.695814] mlx4_core 0000:05:00.2: enabling device (0000 -> 0002)

[    9.703545] mlx4_core 0000:05:00.2: Skipping virtual function:2

[    9.710846] pci 0000:05:00.3: [15b3:1004] type 00 class 0x028000

[    9.717871] pci 0000:05:00.3: Max Payload Size set to 256 (was 128, max 512)

[    9.727364] iommu: Adding device 0000:05:00.3 to group 51

[    9.734288] mlx4_core: Initializing 0000:05:00.3

[    9.741154] mlx4_core 0000:05:00.3: enabling device (0000 -> 0002)

[    9.748832] mlx4_core 0000:05:00.3: Skipping virtual function:3

[    9.756277] pci 0000:05:00.4: [15b3:1004] type 00 class 0x028000

[    9.763268] pci 0000:05:00.4: Max Payload Size set to 256 (was 128, max 512)

[    9.772810] iommu: Adding device 0000:05:00.4 to group 52

[    9.779724] mlx4_core: Initializing 0000:05:00.4

[    9.786588] mlx4_core 0000:05:00.4: enabling device (0000 -> 0002)

[    9.794256] mlx4_core 0000:05:00.4: Skipping virtual function:4

[    9.801504] pci 0000:05:00.5: [15b3:1004] type 00 class 0x028000

[    9.808499] pci 0000:05:00.5: Max Payload Size set to 256 (was 128, max 512)

[    9.817815] iommu: Adding device 0000:05:00.5 to group 53

[    9.824644] mlx4_core: Initializing 0000:05:00.5

[    9.831359] mlx4_core 0000:05:00.5: enabling device (0000 -> 0002)

[    9.838894] mlx4_core 0000:05:00.5: Skipping virtual function:5

[    9.846081] pci 0000:05:00.6: [15b3:1004] type 00 class 0x028000

[    9.853086] pci 0000:05:00.6: Max Payload Size set to 256 (was 128, max 512)

[    9.862328] iommu: Adding device 0000:05:00.6 to group 54

[    9.868962] mlx4_core: Initializing 0000:05:00.6

[    9.875601] mlx4_core 0000:05:00.6: enabling device (0000 -> 0002)

[    9.883002] mlx4_core 0000:05:00.6: Skipping virtual function:6

[    9.890082] pci 0000:05:00.7: [15b3:1004] type 00 class 0x028000

[    9.897070] pci 0000:05:00.7: Max Payload Size set to 256 (was 128, max 512)

[    9.906218] iommu: Adding device 0000:05:00.7 to group 55

[    9.912806] mlx4_core: Initializing 0000:05:00.7

[    9.919380] mlx4_core 0000:05:00.7: enabling device (0000 -> 0002)

[    9.926935] mlx4_core 0000:05:00.7: Skipping virtual function:7

[    9.934160] pci 0000:05:01.0: [15b3:1004] type 00 class 0x028000

[    9.941145] pci 0000:05:01.0: Max Payload Size set to 256 (was 128, max 512)

[    9.950497] iommu: Adding device 0000:05:01.0 to group 56

[    9.957354] mlx4_core: Initializing 0000:05:01.0

[    9.964295] mlx4_core 0000:05:01.0: enabling device (0000 -> 0002)

[    9.972187] mlx4_core 0000:05:01.0: Skipping virtual function:8

[    9.979753] mlx4_core 0000:05:00.0: Running in master mode

[    9.986885] mlx4_core 0000:05:00.0: PCIe link speed is 8.0GT/s, device supports 8.0GT/s

[    9.994178] mlx4_core 0000:05:00.0: PCIe link width is x8, device supports x8

[   10.178999] mlx4_core: Initializing 0000:05:00.1

[   10.186463] mlx4_core 0000:05:00.1: enabling device (0000 -> 0002)

[   10.194814] mlx4_core 0000:05:00.1: Skipping virtual function:1

[   10.202620] mlx4_core: Initializing 0000:05:00.2

[   10.210054] mlx4_core 0000:05:00.2: enabling device (0000 -> 0002)

[   10.218256] mlx4_core 0000:05:00.2: Skipping virtual function:2

[   10.225917] mlx4_core: Initializing 0000:05:00.3

[   10.233189] mlx4_core 0000:05:00.3: enabling device (0000 -> 0002)

[   10.241256] mlx4_core 0000:05:00.3: Skipping virtual function:3

[   10.248961] mlx4_core: Initializing 0000:05:00.4

[   10.256108] mlx4_core 0000:05:00.4: enabling device (0000 -> 0002)

[   10.264085] mlx4_core 0000:05:00.4: Skipping virtual function:4

[   10.271598] mlx4_core: Initializing 0000:05:00.5

[   10.278675] mlx4_core 0000:05:00.5: enabling device (0000 -> 0002)

[   10.286606] mlx4_core 0000:05:00.5: Skipping virtual function:5

[   10.294012] mlx4_core: Initializing 0000:05:00.6

[   10.301002] mlx4_core 0000:05:00.6: enabling device (0000 -> 0002)

[   10.308782] mlx4_core 0000:05:00.6: Skipping virtual function:6

[   10.316133] mlx4_core: Initializing 0000:05:00.7

[   10.323060] mlx4_core 0000:05:00.7: enabling device (0000 -> 0002)

[   10.330772] mlx4_core 0000:05:00.7: Skipping virtual function:7

[   10.338002] mlx4_core: Initializing 0000:05:01.0

[   10.344802] mlx4_core 0000:05:01.0: enabling device (0000 -> 0002)

[   10.352404] mlx4_core 0000:05:01.0: Skipping virtual function:8

[   10.366153] mlx4_en: Mellanox ConnectX HCA Ethernet driver v2.2-1 (Feb 2014)

[   23.574588] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v2.2-1 (Feb 2014)

[   23.585484] <mlx4_ib> mlx4_ib_add: counter index 0 for port 1 allocated 0

[   23.664067] mlx4_core 0000:05:00.0: mlx4_ib: multi-function enabled

[   23.679324] mlx4_core 0000:05:00.0: mlx4_ib: initializing demux service for 128 qp1 clients

Re: Ubuntu does not probe VFs on ConnectX-3 Infiniband HCA

The issue appears to be that the module parameter need to be /etc/modprobe.d/mlx4_core.conf and not /etc/modprobe.d/mlx4.conf. After moving the parameters into the correct file the VFs are probed as expected on boot.

 

# cat /etc/modprobe.d/mlx4_core.conf

options mlx4_core num_vfs=8 probe_vf=8 port_type_array=1

Re: Is SX6012 switch compatible with Intel Omnipath Interface card?

Hi.  OmniPath (OPA) is an Intel proprietary protocol and is only implemented by OPA switches.  You didn't mention what type of cable you're using, but OPA cables are also proprietary, and not compatible with Ethernet or InfiniBand.

Image may be NSFW.
Clik here to view.
Viewing all 6226 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>