Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6226 articles
Browse latest View live

Re: MLNX+NVIDIA ASYNC GPUDirect - Segmentation fault: invalid permissions for mapped object running mpi with CUDA

$
0
0

PHB:Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)

 

Based on the topo you give, the mlx5_1 and mlx5_0  is connected to gpu0 by a PCIe Host Bridge.

It meas that, even gdr, the flow from GPU0 to localnode Host,then nic(mlx5_1)  on local node.

On remote node,the flow from nic(mlx5_1) to host,then GPU0.

At the non-gdr,it just replaces the GPU with mem(DDR).Still, the flow through the host. Maybe that's why it seems the same result.

How much is your test latency ?

 


Re: DPDK with MLX4 VF on Hyper-v VM

$
0
0

Hello Hui,

 

I have tested this internally and found that currently we  don’t officially support WinOf driver versions with DPDK (except for WinOf driver for Azure).

You should use Azure VMs and contact them for retrieving the supported WinOf driver.

 

Thank you,

Karen.

InfiniBand amber port led flashing

$
0
0

We recently replaced IB switches (both of int-a,b) to newer model, SX6790 36 port IB switch.

After replacing, we noticed that some port LEDs showing abnormal/non-defined behavior;

i.e. Port#8 LED of int-b switch was flashing amber continuously (port#8 was connected to Node8/int-b).

We replaced the IB-cable and then port8 LED became normal(solid green).

Please tell us the meaning of amber LED flashing, it's possible impact and implication, and fix/workaround.

Connext-x3 roce mode

$
0
0

Hello,

 

can someone confirm what roce mode should I use for my Windows 2016 S2D deployment? I am using Connect-X3 (not Pro).

 

Thanks!

Re: Connext-x3 roce mode

Re: sr-iov and vxlan used

mst start fails with ConnectX-4 on ppc64le

$
0
0

Hi,

 

I'm trying to setup VFs using SRIOV on a ppc64le machine

 

$ lsb_release -a

No LSB modules are available.

Distributor ID: Ubuntu

Description:    Ubuntu 16.04.4 LTS

Release:        16.04

Codename:       xenial


$ uname -a

 

Linux p006n03 4.10.0-35-generic #39~16.04.1-Ubuntu SMP Wed Sep 13 08:59:44 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

 

$ lspci | grep Mellanox

0000:01:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

0040:01:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

 

First i installed MLNX_OFED driver as per steps: https://community.mellanox.com/docs/DOC-2688

Then i installed latest MFT (4.10.0) for ppc64le from here: http://www.mellanox.com/page/management_tools

 

Running "mst start" subsequently fails however

 

$ sudo mst start

Starting MST (Mellanox Software Tools) driver set

Loading MST PCI module - Success

Loading MST PCI configuration module - Success

Create devices

/usr/bin/mst: line 382: 13070 Segmentation fault      (core dumped) ${mbindir}/minit $fullname ${busdevfn} 88 92

cat: /dev/mst/mt4115_pci_cr0: No such file or directory

/usr/bin/mst: line 382: 13132 Segmentation fault      (core dumped) ${mbindir}/minit $fullname ${busdevfn} 88 92

cat: /dev/mst/mt4115_pci_cr1: No such file or directory

Unloading MST PCI module (unused) - Success

 

Unloading MST PCI configuration module (unused) - Success

 

What could be the reason for this error?

 

I ultimately want to enable VFs on the CX4 as per steps here: https://community.mellanox.com/docs/DOC-2386 but cannot proceed due to this error

Re: mst start fails with ConnectX-4 on ppc64le


Re: Yocto embedded build of rdma-core

$
0
0

The solution to this problem was to make use of the incorporated recipes in the updated openembedded build.  About a month ago, rdma-core was added to the mainlline tree.  We had been trying to get this to work ourselves by writing our own recipes.  Now that the code is integrated it just builds.

rxe driver does not support kernel ABI

$
0
0

Getting a small error when I try to do an rping test. I'm building rxe into kernel 4.16 and rdma-core using yocto on an Arria10 socfpga containing a dual core A53 ARM processor. I get the kernel modules and userland loaded:

 

root@arria10:~# lsmod | grep rxe
rdma_rxe 102400 0
ib_core 192512 6 rdma_rxe,ib_cm,rdma_cm,ib_uverbs,iw_cm,rdma_ucm

 

I can configure the rxe0 device but rxe_cfg is giving a strange error:

 

root@arria10:~# rxe_cfg
libibverbs: Warning: Driver rxe does not support the kernel ABI of 1 (supports 2 to 2) for device /sys/class/infiniband/rxe0
IB device 'rxe0' wasn't found
Name Link Driver Speed NMTU IPv4_addr RDEV RMTU
eth0 yes  st_gmac      1500 10.0.1.24 rxe0 (?)

 

Any hints on what this means, i.e. the kernel ABI error would be appreciated!

 

Thanks,

FM

Re: rxe driver does not support kernel ABI

$
0
0

After setting up the yocto build to include the various rdma-core modules according to yocto practices, this error went away.

Re: rxe driver does not support kernel ABI

$
0
0

Its back.  For some reason I keep getting this warning

libibverbs: Warning: Driver rxe does not support the kernel ABI of 1 (supports 2 to 2) for device /sys/class/infiniband/rxe0

Re: Connext-x3 roce mode

$
0
0

Karen,

 

Thanks for replying and ref doc.

Re: sr-iov and vxlan used

Re: mst start fails with ConnectX-4 on ppc64le

$
0
0

Hi Karen,

 

Thanks for your response. I do have the Advanced Toolchain Runtime installed.

 

$ sudo apt list --installed | grep advance-toolchain

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

advance-toolchain-at10.0-devel/now 10.0-3 ppc64el [installed,local]

advance-toolchain-at10.0-mcore-libs/now 10.0-3 ppc64el [installed,local]

advance-toolchain-at10.0-perf/now 10.0-3 ppc64el [installed,local]

advance-toolchain-at10.0-runtime/now 10.0-3 ppc64el [installed,local]

advance-toolchain-at7.1-devel/trusty,now 7.1-5 ppc64el [installed]

advance-toolchain-at7.1-mcore-libs/trusty,now 7.1-5 ppc64el [installed]

advance-toolchain-at7.1-perf/trusty,now 7.1-5 ppc64el [installed]

advance-toolchain-at7.1-runtime/trusty,now 7.1-5 ppc64el [installed]

 

I did the export as mentioned(libc.so.6 exists on my system) but still see the error

 

$ echo $LD_PRELOAD

/lib/powerpc64le-linux-gnu/libc.so.6

 

I still see the error however.

 

${mbindir}/minit from /usr/bin/mst gives a segmentation fault for some reason (as seen in the logs from my previous message), not sure why that happens


Re: mst start fails with ConnectX-4 on ppc64le

$
0
0

Thank you Sood,

Please open a support ticket with the details so we can further investigate.

You can open a ticket by sending us an email to support@mellanox.com

 

Regards,

Karen.

Re: "Priority trust-mode is not supported on your system"?

$
0
0

Hi,

 

Can you give more details on what you tried and what did you use ?

 

Thanks

Marc

Web interface error on SX6036

$
0
0

I am trying to setup a SX6036 VPI switch, previously used at another institute. I've configured the mgmt interface and can connect to the web UI, however it immediately gives the following error:

 

Internal Error

An internal error has occurred.

Your options from this point are:

See the logs for more details.

Return to the home page.

Retry the bad page which gave the error.

 

 

When I enable logging monitor and try to log in I see the following on the terminal:

 

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: web_include_template(), web_template.c:364, build 1: can't use empty string as operand of "!"

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: Error in template "status-logs" at line 545 of the generated TCL code

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: web_render_template(), web_template.c:226, build 1: Error code 14002 (assertion failed) returned

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: main(), rh_main.c:337, build 1: Error code 14002 (assertion failed) returned

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: Request handler failed with error code 14002: assertion failed

Jul 23 11:34:29 ib-switch httpd[4535]: [Mon Jul 23 11:34:29 2018] [error] [client ipremvd] Exited with error code 14002: assertion failed, referer: http://ip.removed./admin/launch?script=rh&template=failure&badpage=%2Fadmin%2Flaunch%3Fscript%3Drh%26template%3Dstatus-logs

 

 

Any idea as to check what may have failed and how to fix it?

 

regards

Andrew

Re: rxe driver does not support kernel ABI

$
0
0

I traced this to the function match_device() in libibverbs/init.c

 

There is a check for ABI versions:

 

if (sysfs_dev->abi_ver < ops->match_min_abi_version ||

            sysfs_dev->abi_ver > ops->match_max_abi_version) {

                fprintf(stderr, PFX

                        "Warning: Driver %s does not support the kernel ABI of %u (supports %u to %u) for device %s\n",

 

The variable sysfs_dev is being passed into this call by another routine called try_driver() which is called by try_drivers() which is called by try_all_drivers() which appears to be called by

ibverbs_get_device_list()

 

Does this help?

Re: rxe driver does not support kernel ABI

$
0
0

It appears that the abi version is stored here:

root@arria10:/sys/class/infiniband# cat rxe0/device/infiniband_verbs/uverbs0/abi_version

1

And this needs to be 2 according to the code...

Viewing all 6226 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>