Re: Neo, error 'Device Management Discovery' .
Setup Mellanox MSX1012B in HA environment.
Hello Community,
I am new to Mellanox switches. I am trying to configure 2 units of MSX1012B in HA environment. These switches will be behind 2 juniper firewalls serving the server farm.
I followed the configuration guide but i am little confused here whether i need to configure IPL and MGLAG to meet my requirement. Below is the diagram which i want to achieve. Switch-A and Switch-B are MSX1012B.
Thank you!
Re: Neo, error 'Device Management Discovery' .
For ETH discovery to work properly, you must configure LLDP for all managed devices such as MSN2700B, MSN2410.
1) Configure lldp on the switches.
2) Turn on LLDP Discovery .
3) Restart the NEO service please run the following command
/opt/neo/neoservice restart
4) Monitor if the same issue occurs.
Re: Neo, error 'Device Management Discovery' .
Done.
But the same issue has occurred.
On all switches it was configured yet:
##
## LLDP configuration
##
lldp
Re: Neo, error 'Device Management Discovery' .
Hi ,
I saw that you opened a support case#474466 through IBS account.
We will continue the debug through the support case.
Thanks,
Samer
Re: Firmware for MHJH29 ?
Hello Romain -
Good day to you...
Could you get the board_id with "ibv_devinfo"
And the part number with:
> lspci | grep Mell NOTE: the bus:dev.func of the device
> lspci -s bus:dev.func -xxxvvv
See:
Read-only fields:
[PN] Part number:
If you could update this thread with this information it would be very helpful.
thanks - steve
Re: Problem with symbol error counter
Usually, symbol errors caused by some physical condition and in many cases fixed by a) reseating BOTH ends of the cable or b) replacing the cable. If you are using OEM solution, you might contact the hardware vendor after trying reseating the cables and see if your equipment is under the warranty or open case with him.
In order to reset the fabric counters, use 'ibdiagnet -pc' command and the same command should be used to collect information about the fabric. ibqueryerrors, despite that it exists in Mellanox OFED, shouldn't be used as it not under the development. ibdiagnet is a swiss army knife.
missing ifup-ib in latest release?
Hi .. I have some old cluster nodes that were working fine under previous versions of CentOS 7 (I think it was CentOS 7.3 before update) but after doing a recent update to CentOS 7.5 I can't seem to get the interface to come up. I reinstalled the latest MLNX_OFED drivers (MLNX_OFED_LINUX-4.3-3.0.2.1-rhel7.5-x86_64) which installed properly. I see the card in lspci and the kernel modules seem to be loaded as well. However, I can't seem to bring up the interface. Doing an ifup I get this:
ifup ib0
ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Device ib0 does not seem to be present, delaying initialization.
Which seemed weird to me that it was trying to use the ifup-eth code instead of the ifup-ib code to bring up the interface. When I looked for this file I don't see it on the system with the mlnx_ofed software installed. If I don't install mlnx_ofed and just leave the CentOS drivers installed the card comes up fine. I also notice this comes from the rdma-core package from CentOS:
# rpm -qf /etc/sysconfig/network-scripts/ifup-ib
rdma-core-15-7.el7_5.x86_64
When I look at the mlnx_ofed installed machine I don't see an rdma-core package...
# rpm -qa | grep rdma
librdmacm-41mlnx1-OFED.4.2.0.1.3.43302.x86_64
librdmacm-utils-41mlnx1-OFED.4.2.0.1.3.43302.x86_64
librdmacm-devel-41mlnx1-OFED.4.2.0.1.3.43302.x86_64
So I'm wondering if I am missing something with this? Previous versions I didn't seem to have any issues with getting it installed and using it. Anyone have some advice as to what I should look at further to figure this out? Thanks,
Re: ConnectX-5 EN vRouter Offload
Hi Marc,
Do you mean that product brief has over promising mistake? Contrail cannot use OVS.
Best regards,
Re: Ceph with OVS Offload
Any experts about this?
Re: igmp mlag vlan config right?
Thank you for the answers to my questions!
One Correct:
Wrong command:
ip igmp snooping static-group 232.43.211.234 interface mlag-port-channel 1 source 192.168.1.1 192.168.1.2 192.168.1.3 192.168.1.4
Right command:
vlan 1 ip igmp snooping static-group 232.43.211.234 interface mlag-port-channel 1
Unfortunately, this does not work because the IPL interface Po11 can not be added to a static group.
mrouter problem:
When I set the mlag ports as mrouter, I do not get a multicast group, either dynamically or statically.
Here is the working configuration in a mlag:
#enable global
ip igmp snooping
#enable snooping via vlan
vlan 1 ip igmp snooping
#enable querier per vlan
vlan 1 ip igmp snooping querier
#set the igmp snooping querier ip address in Webui off to free ip 1.1.1.1
vlan 1 ip igmp snooping querier address 1.1.1.1
Re: missing ifup-ib in latest release?
Hi ,
Could you please check if ib0 interface found under "ifconfig -a" ?
If not, i suggest the following:
1) invoke mst start -> mst status -> ifconfig and check again
2) Try to restart the interfaces:
- /etc/init.d/openibd restart
- opensm start or start the SM on the switch
3) If the above still not working , create interface manually :
vi /etc/sysconfig/network-scripts/ifcfg-ib0
NAME="ib0"
DEVICE="ib0"
ONBOOT=yes
BOOTPROTO=static
TYPE=Infiniband
IPADDR=<ip from the same subnet>
Thanks,
Samer
How to enable VF multi-queue for SR-IOV on KVM?
I have successfully enable SR-IOV on kvm for ConnectX-3 with KVM (InfiniBand).Speeds up to 28.6Gb/s between guest hosts by using the iperf tool,but speeds up to 14Gb/s between virtual machines.I found that although the virtual machine shows multiple queues in /proc/interrupts,only one queue is actually available.I have configured smp_affinity and disable the irqbalance service.How to enable VF multi-queue for SR-IOV on KVM?
Thanks !
vm host:
[root@host-09 ~]# cat /proc/interrupts | grep mlx4
45: 106 52 58 59 59 59 54 55 PCI-MSI-edge mlx4-async@pci:0000:00:07.0
46: 2435659 2939965 41253 26523 49013 59796 56406 70341 PCI-MSI-edge mlx4-1@0000:00:07.0
47: 0 0 0 0 0 0 0 0 PCI-MSI-edge mlx4-2@0000:00:07.0
48: 0 0 0 0 0 0 0 0 PCI-MSI-edge mlx4-3@0000:00:07.0
49: 0 0 0 0 0 0 0 0 PCI-MSI-edge mlx4-4@0000:00:07.0
50: 0 0 0 0 0 0 0 0 PCI-MSI-edge mlx4-5@0000:00:07.0
51: 0 0 0 0 0 0 0 0 PCI-MSI-edge mlx4-6@0000:00:07.0
52: 0 0 0 0 0 0 0 0 PCI-MSI-edge mlx4-7@0000:00:07.0
53: 0 0 0 0 0 0 0 0 PCI-MSI-edge mlx4-8@0000:00:07.0
54: 0 0 0 0 0 0 0 0 PCI-MSI-edge mlx4-9@0000:00:07.0
55: 0 0 0 0 0 0 0 0 PCI-MSI-edge mlx4-10@0000:00:07.0
56: 0 0 0 0 0 0 0 0 PCI-MSI-edge mlx4-11@0000:00:07.0
57: 0 0 0 0 0 0 0 0 PCI-MSI-edge mlx4-12@0000:00:07.0
58: 0 0 0 0 0 0 0 0 PCI-MSI-edge mlx4-13@0000:00:07.0
59: 0 0 0 0 0 0 0 0 PCI-MSI-edge mlx4-14@0000:00:07.0
60: 0 0 0 0 0 0 0 0 PCI-MSI-edge mlx4-15@0000:00:07.0
61: 0 0 0 0 0 0 0 0 PCI-MSI-edge mlx4-16@0000:00:07.0
[root@host-09 ~]# cat /proc/irq/46/smp_affinity
02
[root@host-09 ~]# cat /proc/irq/47/smp_affinity
04
[root@host-09 ~]# cat /proc/irq/48/smp_affinity
08
[root@host-09 ~]# cat /proc/irq/49/smp_affinity
10
[root@host-09 ~]# cat /proc/irq/50/smp_affinity
20
[root@host-09 ~]# cat /proc/irq/51/smp_affinity
40
[root@host-09 ~]# cat /proc/irq/52/smp_affinity
80
[root@host-09 ~]# cat /proc/irq/53/smp_affinity
01
[root@host-09 ~]# cat /proc/irq/54/smp_affinity
02
[root@host-09 ~]# cat /proc/irq/55/smp_affinity
04
[root@host-09 ~]# cat /proc/irq/56/smp_affinity
08
[root@host-09 ~]# cat /proc/irq/57/smp_affinity
10
[root@host-09 ~]# cat /proc/irq/58/smp_affinity
20
[root@host-09 ~]# cat /proc/irq/59/smp_affinity
40
[root@host-09 ~]# cat /proc/irq/60/smp_affinity
80
[root@host-09 ~]# cat /proc/irq/61/smp_affinity
01
[root@host-09 ~]# ls -la /sys/class/net/ib0/queues/
total 0
drwxr-xr-x 4 root root 0 Jun 26 12:11 .
drwxr-xr-x 5 root root 0 Jun 26 12:11 ..
drwxr-xr-x 2 root root 0 Jun 26 12:11 rx-0
drwxr-xr-x 3 root root 0 Jun 26 12:11 tx-0
Re: mlnx_qos cannot assign priority values to TCs after 8 SR-IOV devices
Hi Steve,
I'm using the latest FW and MOFED as well.
I've issued the command you advised, it returned the correct output, but priority levels are all stacked under TC0, which by default are spreaded over all the TCs.
The output:
mlnx_qos -i enp6s0f1 --pfc 0,0,0,1,0,0,0,0
DCBX mode: OS controlled
Priority trust state: pcp
Cable len: 7
PFC configuration:
priority 0 1 2 3 4 5 6 7
enabled 0 0 0 1 0 0 0 0
tc: 0 ratelimit: unlimited, tsa: vendor
priority: 0
priority: 1
priority: 2
priority: 3
priority: 4
priority: 5
priority: 6
priority: 7
Thanks,
David
Re: Firmware for MHJH29 ?
Stephen Yannalfo wrote:
> Could you get the board_id with "ibv_devinfo"
Sure; took a little time as I had to get a system running to put the HCA in ;-)
When I put some IB between systems at home years ago, the PCIe
would limit me to about DDR anyway (one 8x 1.0, and 4x 2.0 ...), so
i used DDR boards with available firmware. I have since upgraded
some bits of hardware, and wondering if I couldn't get QDR on 8x 2.0.
But it seems QDR was mostly deployed with QSFP connector, not
CX4, so the MHJH29 seems a bit of a black sheep from that era...
Thanks for your help !
Host is a Supermicro X9SRi, primary PCIe slot (16x 2.0). Running
CentOS 7.3
[root@localhost ~]# ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.6.900
node_guid: 0002:c903:0002:173a
sys_image_guid: 0002:c903:0002:173d
vendor_id: 0x02c9
vendor_part_id: 26428
hw_ver: 0xA0
board_id: MT_04E0120005
phys_port_cnt: 2
port: 1
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: InfiniBand
(second port same as first, nothing is plugged in).
2.6.9 feels old, the latest seems to be 2.9.1[000]
> And the part number with:
> > lspci | grep Mell NOTE: the bus:dev.func of the device
> > lspci -s bus:dev.func -xxxvvv
That's one verbose lspci :-)
02:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev a0)
Subsystem: Mellanox Technologies Device 0005
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 61
NUMA node: 0
Region 0: Memory at fba00000 (64-bit, non-prefetchable) [size=1M]
Region 2: Memory at 38ffff000000 (64-bit, prefetchable) [size=8M]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [48] Vital Product Data
Product Name: Eagle QDR
Read-only fields:
[PN] Part number: MHJH29-XTC
[EC] Engineering changes: X5
[SN] Serial number: MT0821X00122
[V0] Vendor specific: PCIe Gen2 x8
[RV] Reserved: checksum good, 0 byte(s) reserved
Read/write fields:
[V1] Vendor specific: N/A
[YA] Asset tag: N/A
[RW] Read-write area: 111 byte(s) free
End
Capabilities: [9c] MSI-X: Enable+ Count=256 Masked-
Vector table: BAR=0 offset=0007c000
PBA: BAR=0 offset=0007d000
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75.000W
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #8, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s unlimited, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 1
ARICtl: MFVC- ACS-, Function Group: 0
Kernel driver in use: mlx4_core
Kernel modules: mlx4_core
00: b3 15 3c 67 06 04 10 00 a0 00 06 0c 10 00 00 00
10: 04 00 a0 fb 00 00 00 00 0c 00 00 ff ff 38 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 b3 15 05 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 0b 01 00 00
40: 01 48 03 00 00 00 00 00 03 9c ff 7f 00 00 00 78
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 10 00 02 00 01 8e 2c 01 20 20 00 00 82 f4 03 08
70: 00 00 82 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 1f 00 00 00 00 00 00 00 00 00 00 00
90: 02 00 00 00 00 00 00 00 00 00 00 00 11 60 ff 80
a0: 00 c0 07 00 00 d0 07 00 05 00 8a 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Re: ConnectX-5 EN vRouter Offload
Hi Marc,
What is the best way to run Contrail vRouter on HCI environment where kernel will use same bonded ConnectX-5 EN ports with vRouter. I want to achieve maximum possible offload of ConnectX-5? SR-IOV, DPDK, SR-IOV + DPDK or else? If related to SR-IOV, should I use it with PF or VF?
Best regards,
get a dump cqe when trying to invalid mr in cx4
Hi,
I have a problem with local invalid/send with invalid operation with ConnectX-4 NICs.
[863318.002031] mlx5_0:dump_cqe:275:(pid 31419): dump error cqe
[863318.002032] 00000000 00000000 00000000 00000000
[863318.002033] 00000000 00000000 00000000 00000000
[863318.002034] 00000000 00000000 00000000 00000000
[863318.002035] 00000000 09007806 25000178 000006d2
ofed version:
MLNX_OFED_LINUX-4.1-1.0.2.0 (OFED-4.1-1.0.2)
firmware version:
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: ConnectX4LX
Part Number: MCX4121A-ACA_Ax
Description: ConnectX-4 Lx EN network interface card; 25GbE dual-port SFP28; PCIe3.0 x8; ROHS R6
PSID: MT_2420110034
PCI Device Name: 0000:81:00.0
Base MAC: 0000248a07b37aa2
Versions: Current Available
FW 14.20.1010 N/A
PXE 3.5.0210 N/A
Status: No matching image found
I used the tools I developed, and it works in other device like cx3, qlogic. I'v checked the input parameter and make sure it's right. Please share me any suggestions about how to fix it.
Thanks
Re: XenServer 7.2 64bit
There is another script -
./mlnxofedinstall
that should be run in order to install drivers, not ./install.sh
Also there are some other issues (potentially removing mellanox-mlnxen.x86_64 and fixing pif-scan or pif-introduce not working) - you should probably check here for solution:
https://discussions.citrix.com/topic/383014-xenserver-70-mellanox-connectx-3-nic-infiniband/
Greetings,
Emil
Re: Connection between two infiniband ports
Does your custom essay writing service really can help with Linux command or mellanox utility?
Re: mlnx_qos cannot assign priority values to TCs after 8 SR-IOV devices
Hello David -
I hope all is well...
Could you open a case with Mellanox Support so we can take a deeper look at this issue?
Thank you -
Steve