Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6226 articles
Browse latest View live

mlx5_core - device's health compromised

$
0
0

Dear all,

 

I have a connect-IB adapter, firmware version 10.10.5020, connected via a PCIe switch to a host running CENTOS 7 and Mellanox drivers MLNX_OFED_LINUX-2.4-1.0.4-rhel7.0-x86_64.

Just after boot, I do see the following messages:

 

localhost kernel: mlx5_core 0000:03:00.0: device's health compromised

localhost kernel: mlx5_core 0000:03:00.0: assert_var[0] 0x0000007a

localhost kernel: mlx5_core 0000:03:00.0: assert_var[1] 0x0000006e

localhost kernel: mlx5_core 0000:03:00.0: assert_var[2] 0x00000000

localhost kernel: mlx5_core 0000:03:00.0: assert_var[3] 0x00000000

localhost kernel: mlx5_core 0000:03:00.0: assert_var[4] 0x00000000

localhost kernel: mlx5_core 0000:03:00.0: assert_exit_ptr 0x006a013c

localhost kernel: mlx5_core 0000:03:00.0: assert_callra 0x006a0c9c

localhost kernel: mlx5_core 0000:03:00.0: fw_ver 0xa00a139c

localhost kernel: mlx5_core 0000:03:00.0: hw_id 0x000001ff

localhost kernel: mlx5_core 0000:03:00.0: irisc_index 0

localhost kernel: mlx5_core 0000:03:00.0: synd 0x10: High temprature

localhost kernel: mlx5_core 0000:03:00.0: ext_synd 0x0000

localhost kernel: mlx5_core 0000:03:00.0: handling bad device here

 

PCIe device 03:00.0 is the connect-IB card.

The system runs safely so far and the ports link-up with both QDR or FDR cables.

However, I am worried about the health of the system.

Anybody knows in specific what's the meaning of the messages reported above?

 

Many thanks.


Re: LEDs of switch and adapters not lighting up.

$
0
0

Issue resolved. My physical port of the adapter was damaged.

Re: mlx5_core - device's health compromised

$
0
0

Hello,

 

Can you pass on the full dmesg output from this event ? Also, can you run the mget_temp tool and let us know how much you're seeing ?

 

The syntax should be something like :

#mst start

# mget_temp -d /dev/mst/mt4113_pciconf0

Re: mlx5_core - device's health compromised

$
0
0

Dear Erez,

 

thank you for your reply.

Attached you can find the full dmesg output of the event.

 

The temperature read by your suggested command is 51°C:

 

#mget_temp -d /dev/mst/mt4113_pciconf0

>51

 

Where is the temperature sensor actually placed in the connect-IB?

Does it measure the air temperature or the temperature of the processor?

The manual says safe range is 0-55°C, although I think this refers to room conditions.

 

Regards.

Simone

Re: Reserved op code 0x80 during UC transfers

$
0
0

I think the packets were are detecting may be Backward Explicit Congestion Notification (BECN) packets used for congestion control for UC transfers.  These Congestion Notification Packets (CNP) are described in the IB specification Annex A10 on Congestion Control.  The CNP has 0x80 for the op code, according to Annex A10.3.2.  The newer versions of OFED, starting around version 2.0 have support for congestion control.  Can anyone confirm this information?  Is anyone enabling Congestion Control in their network?

New Cisco ASA 5506-X will replace ASA 5505

$
0
0

Cisco ASA 5506-X ASA5506-K9 firewall is supposed to have the FirePOWER engine and also equipped with Gigabit ports. The FirePOWER module (most probably will be a software module on the low-end modules) offers next-generation firewall services, including Next-Generation Intrusion Prevention System (NGIPS), Application Visibility and Control (AVC), URL filtering, and Advanced Malware Protection (AMP).

Re: mlx5_core - device's health compromised

$
0
0

Temperature sensor in Connect-IB card (diodes) are located and spread inside the the adapter silicon,and I guess you're in the safe zone as the threshold should be 100 [degrees centigrade].

 

The assertion failure seem to trigger from the Connect-IB health mechanism checkup, I'd say that it will be good practice to move this adapter to another server and see if you're still seeing this behavior.

 

it will also be beneficial if you can share you server spec

 

Cheers.

Re: Reserved op code 0x80 during UC transfers

$
0
0

I can say that I've never seen medium or big sized Infiniband clusters using CC as of today.


2.0.0-9 ibdump does not work with firmware version 2.7.000

$
0
0

Hi,

 

I am new to this forum. :-)

 

We tried to run ibdump to look for multicast traffic.

The 2.0.0-9 ibdump barfs  on device FW version:

  -E- Device firmware version (2.7.000) does not support ethernet sniffing. Required 2.11.1140 or higher

but ibdump 1.0.6 seemed to work.

 

Is this a regression? I got the 2.0.0-9 from MLNX OFED 2.4.1 release. How do I get the

mentioned 2.11.1140 or higher version?

 

Thanks,

Jay

Re: 2.0.0-9 ibdump does not work with firmware version 2.7.000

$
0
0

Ah, I interpreted the error message incorrectly.

 

The "2.11.1140" is actually meant the version of firmware, not the version of ibdump.

 

That makes sense now. Sorry for the noise. Not a good way to join a forum. :-(

 

Jay

What software to download?

$
0
0

I have some old Mellanox hardware and I am trying to move it from Red Hat 5 to RHEL 6. I download the most current software and tried to install it, but the install froze. How do I determine what software to download. As I said, it is pretty old, but I have the output of lspci if that helps.

Re: What software to download?

$
0
0

To give everyone a little more background on my problem, we are running ROCKS 5.4 on RHEL 5. The system can be patched to pass security, but it is one of the last remaining RHEL 5 machines. I am trying to bring up a new cluster using RHEL 6.6 and ROCKS 6.1.1. Everything is pretty much in place, but I need to get the Infiniband working. I have no experience in this area. The system was here, long before me.

 

The new cluster sees the Infiniband board using lspci, but that is about it. /etc/init.d has not startup scripts and I don't think a module is present in the kernel. I am basically looking for the correct version of software to download. Thanks.

Re: What software to download?

Re: 2.0.0-9 ibdump does not work with firmware version 2.7.000

$
0
0

What HCA are we talking about ?

 

mst start

flint -d /dev/mst/mt***pciconf0 q

Re: What software to download?

$
0
0

Here is the revevant portion of the output of lspci. Since this is running on an early version of Red Hat Enterprise Linux 5, I may have to go back and get an archive version of the software, but I don't know the version to get. Thanks.

 

 

[ramos@compute-0-1 tmp]$ more infiniband

06:00.0 InfiniBand: Mellanox Technologies MT25208 [InfiniHost III Ex] (rev 20)

        Subsystem: Mellanox Technologies MT25208 [InfiniHost III Ex]

        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-

        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

        Latency: 0, Cache Line Size: 64 bytes

        Interrupt: pin A routed to IRQ 5

        Region 0: Memory at eff00000 (64-bit, non-prefetchable) [size=1M]

        Region 2: Memory at e8000000 (64-bit, prefetchable) [size=8M]

        Capabilities: [40] Power Management version 2

                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)

                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-

        Capabilities: [48] Vital Product Data

                Product Name: Lion mini DDR

                Read-only fields:

                        [PN] Part number: MHGA28-XTC

                        [EC] Engineering changes: A3

                        [SN] Serial number: MT0722X00031

                        [V0] Vendor specific: PCIe x8

                        [RV] Reserved: checksum good, 0 byte(s) reserved

                Read/write fields:

                        [V1] Vendor specific: N/A

                        [YA] Asset tag: N/A

                        [RW] Read-write area: 107 byte(s) free

                End

        Capabilities: [90] MSI: Enable- Count=1/32 Maskable- 64bit+

                Address: 0000000000000000  Data: 0000

        Capabilities: [84] MSI-X: Enable- Count=32 Masked-

                Vector table: BAR=0 offset=00082000

                PBA: BAR=0 offset=00082200

        Capabilities: [60] Express (v1) Endpoint, MSI 00

                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited

                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE- FLReset-

                DevCtl: Report errors: Correctable- Non-Fatal- Fatal+ Unsupported-

                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-

                        MaxPayload 128 bytes, MaxReadReq 512 bytes

                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-

                LnkCap: Port #8, Speed 2.5GT/s, Width x8, ASPM L0s, Latency L0 unlimited, L1 unlimited

                        ClockPM- Surprise- LLActRep- BwNot-

                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-

                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

                LnkSta: Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-

        Kernel modules: ib_mthca

 

 

[ramos@compute-0-1 tmp]$


Re: What software to download?

$
0
0

I downloaded the latest software, untarred it, and started the installed. The install proceeded until it got to the firmware upgrade. It failed on that. Then I tried to run /etc/init.d/openibd. It started, but couldn't load the drivers and quit.

 

Now I am wondering if I download the firmware will that do the trick. I went to the firmware page, but I only have MT25208 number, there are a number of firmware upgrades with that number, but various descriptions. I don't know the exact card I have.

Re: nbdx - data mismatch

$
0
0

Hi Ted, we are looking into this.

 

I will update you,

Ophir.

¿Is it possible use MNPA19-XTR card for connect two pc's with windows 8.1 x64 directly for get a 10Gbits link?

$
0
0

Hi,

 

Sorry about the basic question: ¿Is it possible use MNPA19-XTR card for connect two pc's with windows 8.1 x64 directly for get a 10Gbits link? I only want share my SSD discs with a fast link interface. ¿what cables and devices are necessaries for do it this? I need 2 cards and 10 meters cable.

 

Thank you!

How do I load the MLNX_OFED_LINUX driver into Debian 6.0.10?

$
0
0


I have a version of the drivers built for Debian 6.0.9, but have not been able to find a version for Debian 6.0.10.

Re: 2.0.0-9 ibdump does not work with firmware version 2.7.000

$
0
0

Erez,

 

Those were ConnectX, old systems that were retired to test lab.

Since we can still use ibdump 1.0.6 we are OK.

 

I posted the question because I mistakenly thought there was a newer version of ibdump.

 

Thanks for your response~

Viewing all 6226 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>