Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all articles
Browse latest Browse all 6226

Re: “Invalid module format” error while loading nv_peer_mem in CentOS 6.6

$
0
0

But from earlier comments, it looks like nv_peer_mem is loaded on vega2

Actually for some time it even didn't load on vega2, as you can see in post #1 of this thread, then at a certain moment it loaded and appears in lsmod command (post #3), but anyway the nv_peer_mem service always fails to start. Just this fact that insmod sometimes works and sometimes not is quite strange, isn't it ?

 

Check also /etc/modprobe.d folder content on both system

It is exactly the same on both systems:

[root@vega2 boot]# ll /etc/modprobe.d

total 44

-rw-r--r--. 1 root root   52 Dec  2  2015 anaconda.conf

-rw-r--r--. 1 root root  884 Oct 16  2014 blacklist.conf

-rw-r--r--. 1 root root  382 Oct 15  2014 dist-alsa.conf

-rw-r--r--. 1 root root 5596 Oct 15  2014 dist.conf

-rw-r--r--. 1 root root  473 Oct 15  2014 dist-oss.conf

-rw-r--r--. 1 root root   26 Dec  2  2015 ib_ipoib.conf

-rw-r--r--. 1 root root   46 Jul 13  2015 ib_sdp.conf

-rw-r--r--. 1 root root   49 Jul 13  2015 mlnx.conf

-rw-r--r--. 1 root root   76 Dec  3  2015 nvidia-installer-disable-nouveau.conf

-rw-r--r--. 1 root root   30 Oct 10  2009 openfwwf.conf

 

Run bash -x /etc/init.d/nv_peer_mem start and see where it fails and what is in dmesg

I already checked it, the modprobe command fails, and in dmesg appear few lines about the duplicate symbol:

 

[root@vega2 boot]# bash -x /etc/init.d/nv_peer_mem start

+ CONFIG=/etc/infiniband/nv_peer_mem.conf

+ modname=nv_peer_mem

+ reqmods='ib_core nvidia'

+ '[' '!' -f /etc/infiniband/nv_peer_mem.conf ']'

+ . /etc/infiniband/nv_peer_mem.conf

++ ONBOOT=yes

++ pwd

+ CWD=/boot

+ cd /etc/infiniband

++ pwd

+ WD=/etc/infiniband

+ modprobe=/sbin/modprobe

+ /sbin/modprobe -c

+ grep -q '^allow_unsupported_modules  *0'

+ ACTION=start

+ shift

+ '[' Xyes '!=' Xyes ']'

+ RC=0

+ case $ACTION in

+ start

+ local RC=0

+ echo -n 'starting... '

starting... + for mod in '$reqmods'

+ is_module ib_core

+ local RC

+ /sbin/lsmod

+ grep -w ib_core

+ RC=0

+ return 0

+ continue

+ for mod in '$reqmods'

+ is_module nvidia

+ local RC

+ grep -w nvidia

+ /sbin/lsmod

+ RC=0

+ return 0

+ continue

+ load_module nv_peer_mem

+ local module=nv_peer_mem

++ modinfo nv_peer_mem

++ grep filename

++ awk '{print $NF}'

+ filename=/lib/modules/2.6.32-504.el6.x86_64/extra/nv_peer_mem.ko

+ '[' '!' -n /lib/modules/2.6.32-504.el6.x86_64/extra/nv_peer_mem.ko ']'

+ /sbin/modprobe nv_peer_mem

FATAL: Error inserting nv_peer_mem (/lib/modules/2.6.32-504.el6.x86_64/extra/nv_peer_mem.ko): Invalid module format

+ RC=1

+ '[' 1 -eq 0 ']'

+ echo 'Failed to load nv_peer_mem'

Failed to load nv_peer_mem

+ log_msg 'Failed to load nv_peer_mem'

+ logger -i 'nv_peer_mem: Failed to load nv_peer_mem'

+ return 1

+ RC=1

+ exit 1

 

[root@vega2 boot]# dmesg

...

nvidia 0000:03:00.0: irq 164 for MSI/MSI-X

nvidia_uvm: Unregistered the UVM driver

nvidia 0000:03:00.0: PCI INT A disabled

nvidia 0000:03:00.0: PCI INT A -> GSI 32 (level, low) -> IRQ 32

nvidia 0000:03:00.0: setting latency timer to 64

NVRM: loading NVIDIA UNIX x86_64 Kernel Module  352.39  Fri Aug 14 18:09:10 PDT 2015

nvidia_uvm: Loaded the UVM driver, major device number 245

nv_p2p_dummy: exports duplicate symbol nvidia_p2p_free_page_table (owned by nvidia)

nv_p2p_dummy: exports duplicate symbol nvidia_p2p_free_page_table (owned by nvidia)

nvidia 0000:03:00.0: irq 164 for MSI/MSI-X

nvidia 0000:03:00.0: irq 164 for MSI/MSI-X

nvidia 0000:03:00.0: irq 164 for MSI/MSI-X

nvidia 0000:03:00.0: irq 164 for MSI/MSI-X

nvidia 0000:03:00.0: irq 164 for MSI/MSI-X

nvidia 0000:03:00.0: irq 164 for MSI/MSI-X

nv_p2p_dummy: exports duplicate symbol nvidia_p2p_free_page_table (owned by nvidia)

nv_p2p_dummy: exports duplicate symbol nvidia_p2p_free_page_table (owned by nvidia)

nv_p2p_dummy: exports duplicate symbol nvidia_p2p_free_page_table (owned by nvidia)

ip_tables: (C) 2000-2006 Netfilter Core Team

nv_p2p_dummy: exports duplicate symbol nvidia_p2p_free_page_table (owned by nvidia)

nv_p2p_dummy: exports duplicate symbol nvidia_p2p_free_page_table (owned by nvidia)

nv_p2p_dummy: exports duplicate symbol nvidia_p2p_free_page_table (owned by nvidia)

nv_p2p_dummy: exports duplicate symbol nvidia_p2p_free_page_table (owned by nvidia)

nv_p2p_dummy: exports duplicate symbol nvidia_p2p_free_page_table (owned by nvidia)

 

Be sure you are using exact kernel version, you may run md5sum on the vmlinuz and initramfs images

This is a good idea, I didn't think about it ! In fact there is a difference in the initramfs file, while the others are identical. What could be the cause ? Might this be the issue ? Here is the output:

 

[root@vega1 boot]# md5sum vmlinuz-2.6.32-504.el6.x86_64

0805f85b126ebc6adf84b6ead56a080b  vmlinuz-2.6.32-504.el6.x86_64

[root@vega1 boot]# md5sum initramfs-2.6.32-504.el6.x86_64.img

744d9c3ae08cb795e1b2142250d51c74  initramfs-2.6.32-504.el6.x86_64.img

[root@vega1 boot]# md5sum System.map-2.6.32-504.el6.x86_64

f9fda70c10eb7a2e3bedac7c73606519  System.map-2.6.32-504.el6.x86_64

 

 

[root@vega2 boot]# md5sum vmlinuz-2.6.32-504.el6.x86_64

0805f85b126ebc6adf84b6ead56a080b  vmlinuz-2.6.32-504.el6.x86_64

[root@vega2 boot]# md5sum initramfs-2.6.32-504.el6.x86_64.img

df9afba8ad789256ccec2e715f514d02  initramfs-2.6.32-504.el6.x86_64.img

[root@vega2 boot]# md5sum System.map-2.6.32-504.el6.x86_64

f9fda70c10eb7a2e3bedac7c73606519  System.map-2.6.32-504.el6.x86_64

 

 

 

Thanks and bye,

   Stefano


Viewing all articles
Browse latest Browse all 6226

Trending Articles