Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all articles
Browse latest Browse all 6226

Kernel panic while booting linux

$
0
0

Kernel panics while booting linux if mellanox card is connected to the network. It boots okay if I disconnect the card.

(after it successfully boots I can connect it to the network. though it sometime(not always) causes host to hang when I run ping over the network, for which I don't have much details to post..)

 

Here are details on the system

# uname -a

Linux <hostname> 4.2.0-35-generic #40-Ubuntu SMP Tue Mar 15 22:15:45 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

 

# mlxup

Querying Mellanox devices firmware ...

Device #1:

----------

  Device Type:      ConnectX3Pro

  Part Number:      MCX312B-XCC_Ax

  Description:      ConnectX-3Pro EN network interfacecard; 10GigE; dual-port SFP+; PCIe3.0x8 8GT/s; RoHS R6

  PSID:             MT_1200111023

  PCI Device Name:  0000:02:00.0

  Port1 MAC:        e41d2db25040

  Port2 MAC:        e41d2db25041

  Versions:         Current        Available   

     FW             2.36.5000      2.36.5000   

     PXE            3.4.0718       3.4.0718    

  Status:           Up to date

 

Stack dump from crash(dmesg file is attached)

 

      KERNEL: /usr/lib/debug/boot/vmlinux-4.2.0-35-generic
    DUMPFILE: ../201607301001/dump.201607301001  [PARTIAL DUMP]
        CPUS: 8
        DATE: Sat Jul 30 10:01:52 2016
      UPTIME: 00:00:14
LOAD AVERAGE: 1.19, 0.25, 0.08
       TASKS: 584
    NODENAME: <hostname>
     RELEASE: 4.2.0-35-generic
     VERSION: #40-Ubuntu SMP Tue Mar 15 22:15:45 UTC 2016
     MACHINE: x86_64  (3409 Mhz)
      MEMORY: 16 GB
       PANIC: "BUG: unable to handle kernel paging request at 0000001100000002"
         PID: 1625
     COMMAND: "docker"
        TASK: ffff8803e1f5a940  [THREAD_INFO: ffff8803de0e8000]
         CPU: 4
       STATE: TASK_RUNNING (PANIC)
crash> bt
PID: 1625   TASK: ffff8803e1f5a940  CPU: 4   COMMAND: "docker"
#0 [ffff88041ed033f0] machine_kexec at ffffffff8105913b
#1 [ffff88041ed03460] crash_kexec at ffffffff81109bf2
#2 [ffff88041ed03530] oops_end at ffffffff81018ead
#3 [ffff88041ed03560] no_context at ffffffff810682a5
#4 [ffff88041ed035d0] __bad_area_nosemaphore at ffffffff81068570
#5 [ffff88041ed03620] bad_area_nosemaphore at ffffffff810686f3
#6 [ffff88041ed03630] __do_page_fault at ffffffff810689d7
#7 [ffff88041ed03690] do_page_fault at ffffffff81068d42
#8 [ffff88041ed036b0] page_fault at ffffffff817fabc8
    [exception RIP: __netdev_pick_tx+102]
    RIP: ffffffff816e64e6  RSP: ffff88041ed03768  RFLAGS: 00010202
    RAX: ffff88040c2d97f0  RBX: 0000000000000000  RCX: ffffffff816e6480
    RDX: 000000000000000c  RSI: ffff8803d4359b00  RDI: ffff8803fb440000
    RBP: ffff88041ed037a8   R8: ffff88041ed19b00   R9: ffff8803d4359b00
    R10: 0000000000000000  R11: 0000000000000150  R12: ffff8803fb440000
    R13: 0000000000000000  R14: 00000000ffffffff  R15: 0000001100000002
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
#9 [ffff88041ed037b0] mlx4_en_select_queue at ffffffffc0187a7f [mlx4_en]
#10 [ffff88041ed037d0] netdev_pick_tx at ffffffff816edac1
#11 [ffff88041ed03800] __dev_queue_xmit at ffffffff816edc07
#12 [ffff88041ed03860] dev_queue_xmit_sk at ffffffff816ee0e3
#13 [ffff88041ed03870] netdev_send at ffffffffc04de305 [openvswitch]
#14 [ffff88041ed038b0] ovs_vport_send at ffffffffc04ddc28 [openvswitch]
#15 [ffff88041ed038d0] do_output at ffffffffc04d0289 [openvswitch]
#16 [ffff88041ed038f0] do_execute_actions at ffffffffc04d0874 [openvswitch]
#17 [ffff88041ed039a0] ovs_execute_actions at ffffffffc04d177f [openvswitch]
#18 [ffff88041ed039d0] ovs_dp_process_packet at ffffffffc04d4f04 [openvswitch]
#19 [ffff88041ed03a60] ovs_vport_receive at ffffffffc04dd38b [openvswitch]
#20 [ffff88041ed03c10] netdev_frame_hook at ffffffffc04de5d0 [openvswitch]
#21 [ffff88041ed03c40] __netif_receive_skb_core at ffffffff816eb2d4
#22 [ffff88041ed03ce0] __netif_receive_skb at ffffffff816eb988
#23 [ffff88041ed03d00] netif_receive_skb_internal at ffffffff816eba02
#24 [ffff88041ed03d40] napi_gro_frags at ffffffff816ec4a7
#25 [ffff88041ed03d70] mlx4_en_process_rx_cq at ffffffffc0189870 [mlx4_en]
#26 [ffff88041ed03e10] mlx4_en_poll_rx_cq at ffffffffc0189db6 [mlx4_en]
#27 [ffff88041ed03e60] net_rx_action at ffffffff816ebf09
#28 [ffff88041ed03ef0] __do_softirq at ffffffff81081131
#29 [ffff88041ed03f60] irq_exit at ffffffff81081433
#30 [ffff88041ed03f70] do_IRQ at ffffffff817fb878
--- <IRQ stack> ---
#31 [ffff8803de0ebf58] ret_from_intr at ffffffff817f97eb
    RIP: 000000000088d618  RSP: 000000c82024d118  RFLAGS: 00000202
    RAX: 0000000073f84770  RBX: 0000000000000400  RCX: 0000000054423aca
    RDX: 0000000089ecd45f  RSI: 000000c820542940  RDI: 000000c820544000
    RBP: 00000000e4458357   R8: 000000008847594a   R9: 0000000039eb6dc2
    R10: 00000000d57b5eff  R11: 00000000fa36c492  R12: 0000000000000004
    R13: 0000000000dd5c19  R14: 0000000000000002  R15: 0000000000000008
    ORIG_RAX: ffffffffffffff3d  CS: 0033  SS: 002b
crash>

 

Has anyone seen the similar issue?


Viewing all articles
Browse latest Browse all 6226

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>