Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6226 articles
Browse latest View live

What are some good budget options for NICs and Switch?

$
0
0

I have a limited budget, and I want to buy a high speed network interface cards and a switch to connect multiple PCs into a local network (just a couple of meters away from one another) for HPC research.

Some of the key points for me are:

  • It should be compatible with Windows
  • Speed preferably 56 gigabit
  • Needs to be able to sustain a 100% load at all times
  • Non-managed Switch
  • No SFP(+) uplinks

 

Taking all of that into account what models should I look to buy? Perhaps some older models from a couple of years ago?

What are some caveats in building a system like that that I should consider?

What should I expect in terms of latency?


SN2100B v3.6.8004

$
0
0

Hi All,

 

I don't know if someone already posted something about this.

I just want to share that when we upgraded our SN2100 to v3.6.8004, the GUI doesn't load. Luckily it can still be accessible via SSH and I changed the next boot partition and it rebooted fine. I downgraded the partition from a working version and now we can manage the switch both via GUI and CLI.

 

Hope this helps someone who wants to upgrade to this version.

Re: MLNX+NVIDIA ASYNC GPUDirect - Segmentation fault: invalid permissions for mapped object running mpi with CUDA

$
0
0

I have encountered this question, too.

It was because of the ucx do not compile with cuda.(The mlnx install the default ucx).

When I recompile the ucx with cuda and reinstall it ,It works.

Header Data Split

$
0
0

I've made a feeble attempt to utilise Header Data Split (HDS) offload on Connect-X 5 adapters, by creating the striding WQ context with a non-zero log2_hds_buf_size value. However, the hardware won't have it and reports back bad_param error with syndrome 0x6aaebb.

 

According to an online error syndrome list, this translates to human readable as:

create_rq/rmp: log2_hds_buf_size not supported

 

Since the Public PRM does not describe HDS offload, I'm curious to whether certain preconditions need to met for this offload to work, or if this is a known restriction in current firmware? I'd also like to know if it's possible to configure the HDS "level", that is where the split happens (between L3/L4, L4/L5, ...).

 

The way I'd envision this feature to work is to zero-pad the end of headers up to log2_hds_buf_size, placing the upper layer payload at a fixed offset for any variable-size header length.

Re: Can't get full FDR bandwidth with Connect-IB card in PCI 2.0 x16 slot

$
0
0

Hello Eric -

   I hope all is well...

You won't achieve a line rate of 56G/s because the NIC is: MCB193A-FCAT MT_1220110019 and your PCEi is 2.0

And the release notes for you FW state:

Connect-IB® Host Channel Adapter, single-port QSFP, FDR 56Gb/s,PCIe3.0 x16, tall bracket, RoHS R6

 

So getting ~45 ~48 Gb/s is good.

 

Have a great day!

Steve

RoCE v2 configuration with Linux drivers and packages

$
0
0

is it possible to configure RoCE v2 with Connectx-4 card without MLNX_OFED? can someone please share info if there is any guide/doc available to configure with Linux drivers and packages?

I tried to do with drivers and packages but I am not able to succeed. When I used MLNX_OFED, RoCE is configured successfully.

send_bw test between QSFP ports on Dual Port Adapter

$
0
0

Hello Mellanox,

 

I have ConnectX-3 QSFP Dual Port CX354A  Adapter with Windows 7 x64 Pro on PC, I would like to make throughput test between QSFP ports. For this connected 1m fiber cable between ports and set CX354A in Ethernet mode in Device Manager. I also manualy set different IP addresses for 2 Ethernet adapters with the same IP mask:

IP1: 192.168.0.1

mask: 255.255.255.0

 

IP2: 192.168.0.2

mask: 255.255.255.0

 

I tried make tcp iperf3 test, but I had only 12 Gbit/s instead of 40 Gbit/s. In your documentation where ib_send_bw  utility was recommended for testing the performance.

 

How make test with ib_send_bw between two Ethernet interfaces on one PC?

 

For example: to set interface in iperf3 I can choose with key -B: iperf3 -B 192.168.0.1.

 

Thank you.

 

--

With Best Wishes,

Dmitrii

RoCEv2 GID disappeared ?

$
0
0

Hi everybody !

 

how are connectx-5 GID created/initialized ? they disappeared after ofed upgrade...

 

ibv_devinfo -v neither show_gids display any GID...

 

Any ideas

regards, raph

 

after upgrade to MLNX_OFED_LINUX-4.4-1.0.0.0-debian9.1-x86_64

 

scisoft13:~ % sudo show_gids

DEV     PORT INDEX   GID                        IPv4 VER DEV

---     ---- -----   ---                        ------------ --- ---

n_gids_found=0

 

 

 

before upgrade

scisoft13:~ % sudo show_gids

DEV     PORT INDEX   GID                        IPv4 VER DEV

---     ---- -----   ---                        ------------ --- ---

mlx4_0  1 0       fe80:0000:0000:0000:526b:4bff:fe4f:be21                 v1 enp131s0

mlx4_0  2 0       fe80:0000:0000:0000:526b:4bff:fe4f:be22                 v1 enp131s0d1

mlx5_0  1 0       fe80:0000:0000:0000:526b:4bff:fed3:d164                 v1 enp130s0f0

mlx5_0  1 1       fe80:0000:0000:0000:526b:4bff:fed3:d164                 v2 enp130s0f0

mlx5_0  1 2       0000:0000:0000:0000:0000:ffff:c0a8:030d 192.168.3.13    v1 enp130s0f0

mlx5_0  1 3       0000:0000:0000:0000:0000:ffff:c0a8:030d 192.168.3.13    v2 enp130s0f0

mlx5_1  1 0       fe80:0000:0000:0000:526b:4bff:fed3:d165                 v1 enp130s0f1

mlx5_1  1 1       fe80:0000:0000:0000:526b:4bff:fed3:d165                 v2 enp130s0f1

n_gids_found=8

 

hca_id: mlx5_1

 

        transport:                      InfiniBand (0)

        fw_ver:                         16.23.1000

        node_guid:                      506b:4b03:00d3:d185

        sys_image_guid:                 506b:4b03:00d3:d184

        vendor_id:                      0x02c9

        vendor_part_id:                 4119

        hw_ver:                         0x0

        board_id:                       MT_0000000012

        phys_port_cnt:                  1

        max_mr_size:                    0xffffffffffffffff

        page_size_cap:                  0xfffffffffffff000

        max_qp:                         262144

        max_qp_wr:                      32768

        device_cap_flags:               0xe5721c36

                                        BAD_PKEY_CNTR

                                        BAD_QKEY_CNTR

                                        AUTO_PATH_MIG

                                        CHANGE_PHY_PORT

                                        PORT_ACTIVE_EVENT

                                        SYS_IMAGE_GUID

                                        RC_RNR_NAK_GEN

                                        XRC

                                        Unknown flags: 0xe5620000

        device_cap_exp_flags:           0x520DF8F100000000

                                        EXP_DC_TRANSPORT

                                        EXP_CROSS_CHANNEL

                                        EXP_MR_ALLOCATE

                                        EXT_ATOMICS

                                        EXT_SEND NOP

                                        EXP_UMR

                                        EXP_ODP

                                        EXP_RX_CSUM_TCP_UDP_PKT

                                        EXP_RX_CSUM_IP_PKT

                                        EXP_MASKED_ATOMICS

                                        EXP_RX_TCP_UDP_PKT_TYPE

                                        EXP_SCATTER_FCS

                                        EXP_WQ_DELAY_DROP

                                        EXP_PHYSICAL_RANGE_MR

                                        EXP_UMR_FIXED_SIZE

                                        Unknown flags: 0x200000000000

        max_sge:                        30

        max_sge_rd:                     30

        max_cq:                         16777216

        max_cqe:                        4194303

        max_mr:                         16777216

        max_pd:                         16777216

        max_qp_rd_atom:                 16

        max_ee_rd_atom:                 0

        max_res_rd_atom:                4194304

        max_qp_init_rd_atom:            16

        max_ee_init_rd_atom:            0

        atomic_cap:                     ATOMIC_HCA (1)

        log atomic arg sizes (mask)             0x8

        masked_log_atomic_arg_sizes (mask)      0x3c

        masked_log_atomic_arg_sizes_network_endianness (mask)   0x34

        max fetch and add bit boundary  64

        log max atomic inline           5

        max_ee:                         0

        max_rdd:                        0

        max_mw:                         16777216

        max_raw_ipv6_qp:                0

        max_raw_ethy_qp:                0

        max_mcast_grp:                  2097152

        max_mcast_qp_attach:            240

        max_total_mcast_qp_attach:      503316480

        max_ah:                         2147483647

        max_fmr:                        0

        max_srq:                        8388608

        max_srq_wr:                     32767

        max_srq_sge:                    31

        max_pkeys:                      128

        local_ca_ack_delay:             16

        hca_core_clock:                 78125

        max_klm_list_size:              65536

        max_send_wqe_inline_klms:       20

        max_umr_recursion_depth:        4

        max_umr_stride_dimension:       1

        general_odp_caps:

                                        ODP_SUPPORT

                                        ODP_SUPPORT_IMPLICIT

        max_size:                       0xFFFFFFFFFFFFFFFF

        rc_odp_caps:

                                        SUPPORT_SEND

                                        SUPPORT_RECV

                                        SUPPORT_WRITE

                                        SUPPORT_READ

        uc_odp_caps:

                                        NO SUPPORT

        ud_odp_caps:

                                        SUPPORT_SEND

        dc_odp_caps:

                                        SUPPORT_SEND

                                        SUPPORT_WRITE

                                        SUPPORT_READ

        xrc_odp_caps:

                                        NO SUPPORT

        raw_eth_odp_caps:

                                        NO SUPPORT

        max_dct:                        262144

        max_device_ctx:                 1020

        Multi-Packet RQ supported

                Supported for objects type:

                        IBV_EXP_MP_RQ_SUP_TYPE_SRQ_TM

                        IBV_EXP_MP_RQ_SUP_TYPE_WQ_RQ

                Supported payload shifts:

                        2 bytes

                Log number of strides for single WQE: 3 - 16

                Log number of bytes in single stride: 6 - 13

 

        VLAN offloads caps:

                                        C-VLAN stripping offload

                                        C-VLAN insertion offload

        rx_pad_end_addr_align:  64

        tso_caps:

        max_tso:                        262144

        supported_qp:

                                        SUPPORT_RAW_PACKET

        packet_pacing_caps:

        qp_rate_limit_min:              0kbps

        qp_rate_limit_max:              0kbps

        ooo_caps:

        ooo_rc_caps  = 0x1

        ooo_xrc_caps = 0x1

        ooo_dc_caps  = 0x1

        ooo_ud_caps  = 0x0

                                        SUPPORT_RC_RW_DATA_PLACEMENT

                                        SUPPORT_XRC_RW_DATA_PLACEMENT

                                        SUPPORT_DC_RW_DATA_PLACEMENT

        sw_parsing_caps:

                                        SW_PARSING

                                        SW_PARSING_CSUM

                                        SW_PARSING_LSO

        supported_qp:

                                        SUPPORT_RAW_PACKET

        tag matching not supported

        tunnel_offloads_caps:

                                        TUNNEL_OFFLOADS_VXLAN

                                        TUNNEL_OFFLOADS_GRE

                                        TUNNEL_OFFLOADS_GENEVE

        UMR fixed size:

                max entity size:        2147483648

        Device ports:

                port:   1

                        state:                  PORT_ACTIVE (4)

                        max_mtu:                4096 (5)

                        active_mtu:             1024 (3)

                        sm_lid:                 0

                        port_lid:               0

                        port_lmc:               0x00

                        link_layer:             Ethernet

                        max_msg_sz:             0x40000000

                        port_cap_flags:         0x04010000

                        max_vl_num:             invalid value (0)

                        bad_pkey_cntr:          0x0

                        qkey_viol_cntr:         0x0

                        sm_sl:                  0

                        pkey_tbl_len:           1

                        gid_tbl_len:            256

                        subnet_timeout:         0

                        init_type_reply:        0

                        active_width:           4X (2)

                        active_speed:           25.0 Gbps (32)

                        phys_state:             LINK_UP (5)


Re: RoCEv2 GID disappeared ?

$
0
0

Hi Raphael.

 

What is the output for "show_gids" after the driver installation?

 

Thank you,

Karen.

Re: DPDK with MLX4 VF on Hyper-v VM

$
0
0

Hi Hui,

 

What is the WinOf version installed on your Window Server 2016 ?

From the latest driver release notes it looks as ubuntu 18.02 VM is not supported when working with SR-IOV.

 

Thank you,

Karen.

Re: DPDK with MLX4 VF on Hyper-v VM

$
0
0

Karen,

 

5.50 I believe. Here is the driver information on my MLX3 card and OS version.

Do you know how to check if SR-IOV works properly? As far as I can tell, it seems to be working:

 

 

Anyway, I have another Ubuntu 16.04 VM as well. I ran into the same error. I read somewhere that to run DPDK with MLX3, it would need kernel version to be higher than 4.14.

So I thought that error might be caused by the older kernel on my 16.04 Ubuntu. Hence tried on the newer version 18.02 VM, but still the same error. So kernel doesn't seem to be the problem here.

 

Any thoughts?

 

Thanks!

Re: DPDK with MLX4 VF on Hyper-v VM

Re: DPDK with MLX4 VF on Hyper-v VM

$
0
0

Hi Karen,

 

I was trying DPDK 18.02.

 

The MLNX_DPDK user guide for KVM is nice, although I need to run DPDK with Hyperv.

 

Does Mellanox have similar guide for Hyper-V? I don't have to use a specific version of DPDK. Any version is fine, as long as I can make it work

 

Hui

Re: RoCEv2 GID disappeared ?

$
0
0

thank you for helping ! it is strange because everything was ok before

upgrade.

 

- ping is ok

- /sys/class/infiniband/ etc exists and is populated but gids...

 

after upgrade to MLNX_OFED_LINUX-4.4-1.0.0.0-debian9.1-x86_64

 

scisoft13:~ % sudo show_gids

 

DEV     PORT INDEX   GID                        IPv4 VER DEV

 

---     -


 

Re: RoCEv2 GID disappeared ?

$
0
0

scisoft13:~ % cat /sys/class/infiniband/mlx5_0/ports/1/gids/0

0000:0000:0000:0000:0000:0000:0000:0000

scisoft13:~ % cat /sys/class/infiniband/mlx5_0/ports/1/gids/1

0000:0000:0000:0000:0000:0000:0000:0000

 

scisoft13:~ % cat /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/types/0

cat: /sys/class/infiniband/mlx5_0/ports/1/gid_attrs/types/0: Invalid argument

 

same issue on 2 servers with connectx-5 EN 100Gb/s optical link and connectx-3 40GBb/s  copper link

ofed install without issue


Re: RoCEv2 GID disappeared ?

$
0
0

Hi Raphael,

 

Thank you for the information, it looks as an unexpected behaviour related to the driver and this specific Operating system.

For us to continue and investigate it please send an email to support@mellanox.com and open a support ticket with all the details.

 

Thank you,

Karen.

Re: MLNX+NVIDIA ASYNC GPUDirect - Segmentation fault: invalid permissions for mapped object running mpi with CUDA

$
0
0

Thanks a lot for the reply. It solved the above issue but after running mpirun, i do not see any latency difference with and without GDR

 

My Questions :

  1. Why I do not see any latency difference with and without GDR. ?
  2. Does below sequence or steps correct ? Does it matter for my Question 1

 

Note: I am having single GPU on both host and peer. Iommu is disabled.

## nvidia-smi topo -m

           GPU0    mlx5_0  mlx5_1  CPU Affinity

GPU0     X      PHB     PHB     18-35

mlx5_0  PHB      X      PIX

mlx5_1  PHB     PIX      X

 

Steps followed are:

1. Install CUDA 9.2 and add the library and bin path in .bashrc

2. Install latest MLX OFED

3. Compile and Install nv_peer_mem driver

4. Get UCX from git. Configure UCX with cuda and  Install UCX

5. Configure Openmpi-3.1.1 and install it.

./configure --prefix=/usr/local --with-wrapper-ldflags=-Wl,-rpath,/lib --enable-orterun-prefix-by-default --disable-io-romio --enable-picky --with-cuda=/usr/local/cuda-9.2

6. Configure OSU Benchmarks-5.4.2 with cuda and install it

./configure prefix=/root/osu_benchmarks CC=mpicc --enable-cuda --with-cuda=/usr/local/cuda-9.2

 

Run mpirun. I do not see any latency difference with and without GDR.

 

Thanks for your Help.

Re: MLNX+NVIDIA ASYNC GPUDirect - Segmentation fault: invalid permissions for mapped object running mpi with CUDA

$
0
0

I'm not sure  have you resolved seg 11 problem by my way.

As far as I see,I compile the openmpi with my ucx:

./configure --prefix=/usr/local/openmpi-3.1.1 --with-wrapper-ldflags=-Wl,-rpath,/lib --disable-vt --enable-orterun-prefix-by-default -disable-io-romio --enable-picky --with-cuda=/usr/local/cuda  --with-ucx=/opt/ucx-cuda --enable-mem-debug --enable-debug --enable-timing

Actually, It will be less latency on GDR. What kind of net card have you been using?CX4 or CX 3?

Wish you share some test data and  test environment configuration,it will be great.

Web interface error on SX6036

$
0
0

I am trying to setup a SX6036 VPI switch, previously used at another institute. I've configured the mgmt interface and can connect to the web UI, however it immediately gives the following error:

 

Internal Error

An internal error has occurred.

Your options from this point are:

See the logs for more details.

Return to the home page.

Retry the bad page which gave the error.

 

 

When I enable logging monitor and try to log in I see the following on the terminal:

 

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: web_include_template(), web_template.c:364, build 1: can't use empty string as operand of "!"

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: Error in template "status-logs" at line 545 of the generated TCL code

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: web_render_template(), web_template.c:226, build 1: Error code 14002 (assertion failed) returned

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: main(), rh_main.c:337, build 1: Error code 14002 (assertion failed) returned

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: Request handler failed with error code 14002: assertion failed

Jul 23 11:34:29 ib-switch httpd[4535]: [Mon Jul 23 11:34:29 2018] [error] [client 137.158.30.196] Exited with error code 14002: assertion failed, referer: http://ip.removed./admin/launch?script=rh&template=failure&badpage=%2Fadmin%2Flaunch%3Fscript%3Drh%26template%3Dstatus-logs

 

 

Any idea as to check what may have failed and how to fix it?

 

regards

Andrew

Re: MLNX+NVIDIA ASYNC GPUDirect - Segmentation fault: invalid permissions for mapped object running mpi with CUDA

$
0
0

Yes using your way segmentation fault got resolved.

I am using "Mellanox ConnectX-5" adapter.

OS - CentOS74

 

Is the below topology looks good to you

## nvidia-smi topo -m

           GPU0    mlx5_0  mlx5_1  CPU Affinity

GPU0     X      PHB     PHB     18-35

mlx5_0  PHB      X      PIX

mlx5_1  PHB     PIX      X

 

Running the below command to check the latency

mpirun --allow-run-as-root -host LOCALNODE,REMOTENODE -mca btl_openib_want_cuda_gdr 1 -np 2 -mca btl_openib_if_include mlx5_1 -mca -bind-to core -cpu-set 23 -x CUDA_VISIBLE_DEVICES=0 /usr/local/libexec/osu-micro-benchmarks/mpi/pt2pt/osu_latency -d cuda D D

Viewing all 6226 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>