Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all articles
Browse latest Browse all 6226

Re: InfiniHost III Ex - Suspend/Resume not working on Debian Linux

$
0
0

Here's a more complete log output:

 

Nov 18 14:28:29 alin kernel: [    9.977168] ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4, 2008)

Nov 18 14:28:29 alin kernel: [    9.977170] ib_mthca: Initializing 0000:02:00.0

Nov 18 14:28:29 alin kernel: [   11.374057] ib_mthca 0000:02:00.0: HCA FW version 5.1.000 is old (5.3.000 is current).

Nov 18 14:28:29 alin kernel: [   11.374059] ib_mthca 0000:02:00.0: If you have problems, try updating your HCA FW.

Nov 18 14:29:10 alin kernel: [   59.296536] ib1: ib_dealloc_pd failed

Nov 18 14:31:22 alin kernel: [  167.880313] ib_mthca 0000:02:00.0: SW2HW_MPT failed (-16)

Nov 18 14:33:16 alin kernel: [  281.265414] ib_mthca 0000:02:00.0: HW2SW_MPT failed (-16)

Nov 18 14:33:22 alin kernel: [  287.885556] ib_mthca 0000:02:00.0: SW2HW_MPT failed (-16)

Nov 18 14:34:16 alin kernel: [  341.266202] ib_mthca 0000:02:00.0: HW2SW_MPT failed (-16)

Nov 18 14:34:22 alin kernel: [  347.886276] mthca0: ib_query_port 1 failed

 

It suggests a firmware update and you can see more errors.

 

I don't have the 'mst' command. I installed the debian package mstflint:

 

mstflint - Mellanox firmware burning application

 

Which comes with: mstconfig    mstflint     mstmcra      mstmread     mstmtserver  mstmwrite    mstregdump   mstvpd

 

Rebooting does solve the problem.

 

I should mention, if I don't put an IP address on the card and connect to the network, I can unload the modules in this order (unlike my example above):

 

modprobe -r ib_ipoib

modprobe -r ib_umad

modprobe -r mlx4_ib

 

Nevertheless, if I load the modules once again in the correct order I don't get an IB0 or IB1 interface and ibstatus shows:

 

Fatal error:  device '*': sys files not found (/sys/class/infiniband/*/ports)

/usr/sbin/ibstatus: 21: exit: Illegal number: -1

 

Note: this is all without suspend/resume being involved. So basically, I can only load the modules once and have connectivity, subsequent reloads will render the card unresponsive and nothing shows up in the log files or dmesg. If I can solve that problem, then I could probably get suspend/resume to work.


Viewing all articles
Browse latest Browse all 6226

Trending Articles