Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6226 articles
Browse latest View live

Storage Spaces Direct Windows Server 2016 (1607) BSOD - Mellanox ConnectX-3 Pro (Dell)

$
0
0

Good afternoon,

 

There is very little documentation specific to Windows Server 2016, much of the RDMA/RoCE  documentation referrers to Windows Server 2012(r2) Storage Spaces. So I figured I'd start a conversation in here to help others also looking at Microsoft Storage Spaces Direct (S2D) in Windows Server 2016.

 

I currently have an open case with Dell ProSupport regarding a BSOD my 2 Node cluster encounters. Either node will just halt and restart after 60 seconds when stress testing the environment. Each server is configured as follows...

  • Dell 13th Gen R730XD
  • 2x 120GB Intel SSDs SSDSC2BB120G6R (OS Mirror)
  • 6x 1.6TB SSDs SSDSC2BX016T4R
  • 6x 8TB HDDs ST8000NM0055-1RM112
  • 2x Intel DC P3700 800GB (Journal / Cache)
  • 256GB 2400Mhz Memory
  • HBA330 Mini Controller
  • 1x Mellanox ConnectX-3 Pro (MT04103) Dual Port SFP+ 10GbE (Firmware Version: 2.26.50.80 / Driver Version: 2.25.12665.0)
  • Running Windows Server 2016 DataCenter 1607 Build 14393.693

 

Each server has two links to a Dell N4032F Switch.

To rule out a possible fault with my switch config, Dell advised I directly connect the two nodes together. RDMA is engaged because I can see the traffic using performance monitor.

 

Here's the order in which I've setup my environment...

  1. Install the OS and fully update/patch
  2. Set Windows Power Mode to Performance
  3. Install Windows Features - Hyper-V / File-Services / Failover-Clustering / Data-Center-Bridging
  4. Install Dell drivers for all hardware including the Mellanox nics. (I've tried both the Mellanox drivers and Dell's. They appear to be the same. MLNX_VPI_WinOF-5_25_All_Win2016_x64 / Driver Version: 2.25.12665.0)
  5. I perform the network configuration. Essentially create a Hyper-V SET Switch joined to both ports of the Mellanox nic. I then create two vNics connected to the new Switch with a VLAN tag. (See attached file)
  6. I then create the Failover-Cluster and enable Storage Spaces Direct (See attached file)

 

Everything appears to be okay then it'll randomly crash. Below is a memory dump. This is what I receive on either host. I want to upgrade the firmware but it's a Dell product code so I'm stuck. It's been three weeks and we still don't have a working environment. I also have another debug output further below...

*******************************************************************************

*                                                                             *

*                        Bugcheck Analysis                                    *

*                                                                             *

*******************************************************************************

 

DRIVER_POWER_STATE_FAILURE (9f)

A driver has failed to complete a power IRP within a specific time.

Arguments:

Arg1: 0000000000000003, A device object has been blocking an Irp for too long a time

Arg2: ffffa48778febe20, Physical Device Object of the stack

Arg3: ffffc080258f4960, nt!TRIAGE_9F_POWER on Win7 and higher, otherwise the Functional Device Object of the stack

Arg4: ffff9c8fe2328010, The blocked IRP

 

Debugging Details:

------------------

 

Implicit thread is now ffff9c8f`e23a8080

 

DUMP_CLASS: 1

 

DUMP_QUALIFIER: 401

 

BUILD_VERSION_STRING:  14393.693.amd64fre.rs1_release.161220-1747

 

SYSTEM_MANUFACTURER:  Dell Inc.

 

SYSTEM_PRODUCT_NAME:  PowerEdge R730xd

 

SYSTEM_SKU:  SKU=NotProvided;ModelName=PowerEdge R730xd

 

BIOS_VENDOR:  Dell Inc.

 

BIOS_VERSION:  2.3.4

 

BIOS_DATE:  11/08/2016

 

BASEBOARD_MANUFACTURER:  Dell Inc.

 

BASEBOARD_PRODUCT:  0WCJNT

 

BASEBOARD_VERSION:  A04

 

DUMP_TYPE:  1

 

BUGCHECK_P1: 3

 

BUGCHECK_P2: ffffa48778febe20

 

BUGCHECK_P3: ffffc080258f4960

 

BUGCHECK_P4: ffff9c8fe2328010

 

DRVPOWERSTATE_SUBCODE:  3

 

FAULTING_THREAD:  e23a8080

 

CPU_COUNT: 38

 

CPU_MHZ: 960

 

CPU_VENDOR:  GenuineIntel

 

CPU_FAMILY: 6

 

CPU_MODEL: 4f

 

CPU_STEPPING: 1

 

CPU_MICROCODE: 6,4f,1,0 (F,M,S,R)  SIG: B00001E'00000000 (cache) B00001E'00000000 (init)

 

DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT

 

BUGCHECK_STR:  0x9F

 

PROCESS_NAME:  System

 

CURRENT_IRQL:  2

 

ANALYSIS_SESSION_HOST:  PHALFORDPC

 

ANALYSIS_SESSION_TIME:  01-26-2017 10:07:27.0372

 

ANALYSIS_VERSION: 10.0.14321.1024 amd64fre

 

LAST_CONTROL_TRANSFER:  from fffff800d1ce5f5c to fffff800d1dcf506

 

STACK_TEXT: 

ffffc080`2afcd6a0 fffff800`d1ce5f5c : 00000000`00000000 00000000`00000001 ffffa487`79d23801 fffff800`d1d47359 : nt!KiSwapContext+0x76

ffffc080`2afcd7e0 fffff800`d1ce59ff : ffffa487`70040100 00000000`00000000 00000000`00000000 fffff800`00000000 : nt!KiSwapThread+0x17c

ffffc080`2afcd890 fffff800`d1ce77c7 : ffffc080`00000000 fffff80d`41a33a01 ffffa487`70040130 00000000`00000000 : nt!KiCommitThreadWait+0x14f

ffffc080`2afcd930 fffff80d`41a0aaba : ffffa487`790a6c90 ffffa487`00000000 fffff80d`41a44000 ffffa487`00000000 : nt!KeWaitForSingleObject+0x377

ffffc080`2afcd9e0 fffff80d`3b05debf : 00000000`00000000 00000000`00000006 ffffa487`78fd3980 fffff80d`3b428bf9 : mlx4eth63+0x4aaba

ffffc080`2afcda30 fffff80d`3b0f6f80 : ffffa487`71c971a0 00000000`00000000 ffff9c8f`e2328010 00000000`00000000 : NDIS!ndisMInvokeShutdown+0x53

ffffc080`2afcda60 fffff80d`3b0b910a : ffffa487`71c971a0 00000000`00000000 0000007f`fffffff8 ffff9c8e`c5249bb0 : NDIS!ndisMShutdownMiniport+0xb4

ffffc080`2afcda90 fffff80d`3b09d342 : 00000000`00000000 00000000`00000000 ffff9c8f`e2328010 ffffa487`71c971a0 : NDIS!ndisSetSystemPower+0x1bdc6

ffffc080`2afcdb10 fffff80d`3b01fc28 : ffff9c8f`e2328010 ffffa487`78febe20 ffff9c8f`e2328200 ffffa487`71c97050 : NDIS!ndisSetPower+0x96

ffffc080`2afcdb40 fffff800`d1d9a1c2 : ffff9c8f`e23a8080 ffffc080`2afcdbf0 fffff800`d1f80600 ffffa487`71c97050 : NDIS!ndisPowerDispatch+0xa8

ffffc080`2afcdb70 fffff800`d1c82729 : ffffffff`fa0a1f00 fffff800`d1d99fe4 ffff9c8e`c9cb8120 00000000`000001d1 : nt!PopIrpWorker+0x1de

ffffc080`2afcdc10 fffff800`d1dcfbb6 : ffffc080`25955180 ffff9c8f`e23a8080 fffff800`d1c826e8 00000000`00000000 : nt!PspSystemThreadStartup+0x41

ffffc080`2afcdc60 00000000`00000000 : ffffc080`2afce000 ffffc080`2afc8000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16

 

 

STACK_COMMAND:  .thread 0xffff9c8fe23a8080 ; kb

 

THREAD_SHA1_HASH_MOD_FUNC:  b7cf6cc0234897f6fd93ad4ead1f75c9e7fd9df1

 

THREAD_SHA1_HASH_MOD_FUNC_OFFSET:  263f1d39481efd9f34c4df5786cc37534825cc6e

 

THREAD_SHA1_HASH_MOD:  1de60aba82b9f9b6af56a445a099815cd801e5d9

 

FOLLOWUP_IP:

mlx4eth63+4aaba

fffff80d`41a0aaba 488d152f050300  lea     rdx,[mlx4eth63+0x7aff0 (fffff80d`41a3aff0)]

 

FAULT_INSTR_CODE:  2f158d48

 

SYMBOL_STACK_INDEX:  4

 

SYMBOL_NAME:  mlx4eth63+4aaba

 

FOLLOWUP_NAME:  MachineOwner

 

MODULE_NAME: mlx4eth63

 

IMAGE_NAME:  mlx4eth63.sys

 

DEBUG_FLR_IMAGE_TIMESTAMP:  57c2dc3b

 

BUCKET_ID_FUNC_OFFSET:  4aaba

 

FAILURE_BUCKET_ID:  0x9F_3_POWER_DOWN_mlx4eth63!unknown_function

 

BUCKET_ID:  0x9F_3_POWER_DOWN_mlx4eth63!unknown_function

 

PRIMARY_PROBLEM_CLASS:  0x9F_3_POWER_DOWN_mlx4eth63!unknown_function

 

TARGET_TIME:  2017-01-26T09:54:25.000Z

 

OSBUILD:  14393

 

OSSERVICEPACK:  0

 

SERVICEPACK_NUMBER: 0

 

OS_REVISION: 0

 

SUITE_MASK:  400

 

PRODUCT_TYPE:  3

 

OSPLATFORM_TYPE:  x64

 

OSNAME:  Windows 10

 

OSEDITION:  Windows 10 Server TerminalServer DataCenter SingleUserTS

 

OS_LOCALE: 

 

USER_LCID:  0

 

OSBUILD_TIMESTAMP:  2016-12-21 06:50:57

 

BUILDDATESTAMP_STR:  161220-1747

 

BUILDLAB_STR:  rs1_release

 

BUILDOSVER_STR:  10.0.14393.693.amd64fre.rs1_release.161220-1747

 

ANALYSIS_SESSION_ELAPSED_TIME: 6ba

 

ANALYSIS_SOURCE:  KM

 

FAILURE_ID_HASH_STRING:  km:0x9f_3_power_down_mlx4eth63!unknown_function

 

FAILURE_ID_HASH:  {476104f0-13a3-bd96-8e08-ff1f10ccd888}

 

Followup:     MachineOwner

This is another one...

 

 

Microsoft (R) Windows Debugger Version 10.0.14321.1024 AMD64

Copyright (c) Microsoft Corporation. All rights reserved.

 

 

 

 

Loading Dump File [D:\MEMORY.DMP]

Kernel Bitmap Dump File: Kernel address space is available, User address space may not be available.

 

 

Symbol search path is: srv*

Executable search path is:

Windows 10 Kernel Version 14393 MP (56 procs) Free x64

Product: Server, suite: TerminalServer DataCenter SingleUserTS

Built by: 14393.693.amd64fre.rs1_release.161220-1747

Machine Name:

Kernel base = 0xfffff801`96a11000 PsLoadedModuleList = 0xfffff801`96d16060

Debug session time: Fri Jan 20 16:16:45.177 2017 (UTC + 0:00)

System Uptime: 0 days 1:40:08.946

Loading Kernel Symbols

...............................................................

................................................................

..............................................

Loading User Symbols

 

 

Loading unloaded module list

..............................................

*******************************************************************************

*                                                                             *

*                        Bugcheck Analysis                                    *

*                                                                             *

*******************************************************************************

 

 

Use !analyze -v to get detailed debugging information.

 

 

BugCheck 133, {1, 1e00, 0, 0}

 

 

Page 4200 not present in the dump file. Type ".hh dbgerr004" for details

Page 4200 not present in the dump file. Type ".hh dbgerr004" for details

Page 4200 not present in the dump file. Type ".hh dbgerr004" for details

Probably caused by : mrxsmb.sys ( mrxsmb!SmbWskSend+1f2 )

 

 

Followup:     MachineOwner

---------

 

 

53: kd> !analyze -v

*******************************************************************************

*                                                                             *

*                        Bugcheck Analysis                                    *

*                                                                             *

*******************************************************************************

 

 

DPC_WATCHDOG_VIOLATION (133)

The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL

or above.

Arguments:

Arg1: 0000000000000001, The system cumulatively spent an extended period of time at

  DISPATCH_LEVEL or above. The offending component can usually be

  identified with a stack trace.

Arg2: 0000000000001e00, The watchdog period.

Arg3: 0000000000000000

Arg4: 0000000000000000

 

 

Debugging Details:

------------------

 

 

Page 4200 not present in the dump file. Type ".hh dbgerr004" for details

Page 4200 not present in the dump file. Type ".hh dbgerr004" for details

Page 4200 not present in the dump file. Type ".hh dbgerr004" for details

 

 

DUMP_CLASS: 1

 

 

DUMP_QUALIFIER: 401

 

 

BUILD_VERSION_STRING:  14393.693.amd64fre.rs1_release.161220-1747

 

 

SYSTEM_MANUFACTURER:  Dell Inc.

 

 

SYSTEM_PRODUCT_NAME:  PowerEdge R730xd

 

 

SYSTEM_SKU:  SKU=NotProvided;ModelName=PowerEdge R730xd

 

 

BIOS_VENDOR:  Dell Inc.

 

 

BIOS_VERSION:  2.3.4

 

 

BIOS_DATE:  11/08/2016

 

 

BASEBOARD_MANUFACTURER:  Dell Inc.

 

 

BASEBOARD_PRODUCT:  0WCJNT

 

 

BASEBOARD_VERSION:  A04

 

 

DUMP_TYPE:  1

 

 

BUGCHECK_P1: 1

 

 

BUGCHECK_P2: 1e00

 

 

BUGCHECK_P3: 0

 

 

BUGCHECK_P4: 0

 

 

DPC_TIMEOUT_TYPE:  DPC_QUEUE_EXECUTION_TIMEOUT_EXCEEDED

 

 

CPU_COUNT: 38

 

 

CPU_MHZ: 960

 

 

CPU_VENDOR:  GenuineIntel

 

 

CPU_FAMILY: 6

 

 

CPU_MODEL: 4f

 

 

CPU_STEPPING: 1

 

 

CPU_MICROCODE: 6,4f,1,0 (F,M,S,R)  SIG: B00001E'00000000 (cache) B00001E'00000000 (init)

 

 

DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT

 

 

BUGCHECK_STR:  0x133

 

 

PROCESS_NAME:  System

 

 

CURRENT_IRQL:  d

 

 

ANALYSIS_SESSION_HOST:  PHALFORDPC

 

 

ANALYSIS_SESSION_TIME:  01-22-2017 02:23:17.0663

 

 

ANALYSIS_VERSION: 10.0.14321.1024 amd64fre

 

 

LAST_CONTROL_TRANSFER:  from fffff80196bb1000 to fffff80196b5b6f0

 

 

STACK_TEXT: 

ffffdb80`5a305d88 fffff801`96bb1000 : 00000000`00000133 00000000`00000001 00000000`00001e00 00000000`00000000 : nt!KeBugCheckEx

ffffdb80`5a305d90 fffff801`96adc7e8 : 00001b8b`037a81b4 00001b8b`037a7f29 fffff780`00000320 fffff801`96b57cc0 : nt! ?? ::FNODOBFM::`string'+0x46470

ffffdb80`5a305df0 fffff801`972344e5 : ffffcb86`adf28900 ffffcb86`adf28900 00000000`00000001 ffffcb86`adf28900 : nt!KeClockInterruptNotify+0xb8

ffffdb80`5a305f40 fffff801`96a685d6 : ffffdb80`58adfd00 00000000`00000000 00000000`00000000 00000000`00000000 : hal!HalpTimerClockIpiRoutine+0x15

ffffdb80`5a305f70 fffff801`96b5cd6a : ffffdb80`631d61d0 ffffd382`b0228cf0 00000000`000000b8 00000000`00000008 : nt!KiCallInterruptServiceRoutine+0x106

ffffdb80`5a305fb0 fffff801`96b5d1b7 : 00000000`00000017 ffffdb80`631d6278 ffffdb80`631d65c0 ffffcb86`b65aaa40 : nt!KiInterruptSubDispatchNoLockNoEtw+0xea

ffffdb80`631d6150 fffff801`96b61271 : ffffcb86`bd353c4e fffff801`96a7d923 00000000`00000200 00000000`00001000 : nt!KiInterruptDispatchNoLockNoEtw+0x37

ffffdb80`631d62e0 fffff801`96a7d923 : 00000000`00000200 00000000`00001000 ffffcb86`bd353776 fffffeff`00000000 : nt!ExpInterlockedPopEntrySListEnd+0x11

ffffdb80`631d62f0 fffff800`fc9aaf13 : ffffd383`d366120c ffffcb86`bbb68b40 ffffdb80`631d6540 ffffcb86`b09ada78 : nt!IoAllocateMdl+0x73

ffffdb80`631d6340 fffff800`fc9a9c61 : 00000000`3337ddbe ffffd383`d378d260 00000000`00000000 ffffd383`d3661228 : tcpip!TcpSegmentTcbSend+0x223

ffffdb80`631d6420 fffff800`fc9abc8d : 00000000`00000010 fffffff6`00000007 fffff800`fcb54210 00000000`0028ed91 : tcpip!TcpBeginTcbSend+0x481

ffffdb80`631d6710 fffff800`fc9a95d5 : 00000000`00000000 ffffcb86`bbb68b40 00000000`00001000 ffffdb80`631d6b62 : tcpip!TcpTcbSend+0x25d

ffffdb80`631d6ad0 fffff800`fc9a929a : 00000000`0059dc59 ffffdb80`631d6d60 ffffdb80`631d6d01 00000000`00000000 : tcpip!TcpEnqueueTcbSendOlmNotifySendComplete+0xa5

ffffdb80`631d6b00 fffff800`fc9a8ddb : ffffdb80`00001300 ffffeb7f`760fa0e8 ffffdb80`631d6d01 fffff801`96a9f581 : tcpip!TcpEnqueueTcbSend+0x30a

ffffdb80`631d6c00 fffff801`96a9f505 : ffffdb80`631d6d01 ffffdb80`631d6d00 ffffcb86`b433f010 fffff800`fc9a8db0 : tcpip!TcpTlConnectionSendCalloutRoutine+0x2b

ffffdb80`631d6c80 fffff800`fc9f1aa6 : ffffd383`d1225010 00000000`00000000 00000000`00000000 ffffcb86`b08e7530 : nt!KeExpandKernelStackAndCalloutInternal+0x85

ffffdb80`631d6cd0 fffff800`fc4e1d47 : ffffd383`d1225010 ffffdb80`631d6df0 00000000`00000000 00000000`00000000 : tcpip!TcpTlConnectionSend+0x76

ffffdb80`631d6d40 fffff800`fdb59b02 : ffffd383`d1225010 fffff800`fc4fd090 ffffdb80`631d6df0 ffffcb86`b433f010 : afd!AfdWskDispatchInternalDeviceControl+0xf7

ffffdb80`631d6db0 fffff800`fdb9c3d1 : ffffd383`d0bcb720 ffffcb86`b433f010 00000000`c000020c fffff800`fdbfad4e : mrxsmb!SmbWskSend+0x1f2

ffffdb80`631d6ea0 fffff800`fdb9c2b8 : ffffd383`d4293eb8 fffff800`fdb5a53b fffff800`fdb8f000 00000000`00000000 : mrxsmb!RxCeSend+0xe1

ffffdb80`631d6ff0 fffff800`fdb593dd : 00000000`00040070 ffffd383`d4293f28 ffffd383`d0bcb720 ffffd383`d4293eb8 : mrxsmb!VctSend+0x68

ffffdb80`631d7040 fffff800`fdbfbdc1 : ffffd383`d4293d01 ffffd383`d40e07f0 ffffd383`d4293d28 00000000`00000000 : mrxsmb!SmbCseSubmitBufferContext+0x33d

ffffdb80`631d7110 fffff800`fdb59f46 : ffffd383`d4293d00 ffffdb80`631d7200 ffffcb86`00800000 00000000`00000000 : mrxsmb20!Smb2Write_Start+0x1d1

ffffdb80`631d71e0 fffff800`fdc24126 : ffffdb80`631d75a0 ffffd383`ccdbd810 ffffcb86`b436f7a0 00000000`00000004 : mrxsmb!SmbCeInitiateExchange+0x376

ffffdb80`631d7540 fffff800`fc71755c : ffffd383`d4293d28 00000000`00000001 ffffd383`ccdbd810 fffff801`96a2e934 : mrxsmb20!MRxSmb2Write+0x126

ffffdb80`631d75a0 fffff800`fc72a37d : fffff800`fc708000 ffffd383`ccdbd810 ffffcb86`bb2348b0 fffff800`fc708000 : rdbss!RxLowIoSubmit+0x17c

ffffdb80`631d7610 fffff800`fc6e7a0c : 00000000`00000003 00000000`00000001 ffffcb86`bb2348b0 ffffcb86`bb2348b0 : rdbss!RxLowIoWriteShell+0x9d

ffffdb80`631d7640 fffff800`fc72a289 : 00000000`00000000 ffffd383`d44b8800 ffffcb86`b0b1da40 00000000`00000001 : rdbss!RxCommonFileWrite+0x74c

ffffdb80`631d7830 fffff800`fc6e299b : ffffd383`ccdbd810 ffffcb86`b48ed080 ffffcb86`bb2348b0 00000000`00000000 : rdbss!RxCommonWrite+0x59

ffffdb80`631d7860 fffff800`fc71e6e6 : ffffd383`d44b8900 00000000`000371fd 00000000`00000000 00000000`00000002 : rdbss!RxFsdCommonDispatch+0x55b

ffffdb80`631d79e0 fffff800`fdb990eb : 00000000`00000000 fffff801`96aa55bc 00000000`00000000 ffffcb86`ade77350 : rdbss!RxFsdDispatch+0x86

ffffdb80`631d7a30 fffff800`fb8f72e7 : ffffd383`d2921600 00000000`00000001 00000000`00000102 ffffcb86`bb2348b0 : mrxsmb!MRxSmbFsdDispatch+0xeb

ffffdb80`631d7a70 fffff800`fb8f65c8 : ffffd383`d2d9d040 ffffdb80`631d7ba0 00000000`00040000 ffffd383`d2921600 : clusport!ClusPortSendPassthruReadWriteRemote+0x227

ffffdb80`631d7ac0 fffff800`fb8f4f21 : ffffd383`d2921600 ffffd383`d2921600 ffffd383`d2f0cbb0 ffffd383`d2921701 : clusport!ClusPortExecuteIrp+0x118

ffffdb80`631d7b70 fffff800`fb8f4bfa : 00000000`00000001 fffff800`fb913a80 00000000`00000000 ffffd383`d2921760 : clusport!ClusPortIrpWorker+0x51

ffffdb80`631d7ba0 fffff801`96a13729 : 00000000`00000000 ffffd383`d44b8800 00000000`00000080 fffff800`fb8f4ae0 : clusport!CsvFsThreadPoolWorkerRoutine+0x11a

ffffdb80`631d7c10 fffff801`96b60bb6 : ffffdb80`59fc0180 ffffd383`d44b8800 fffff801`96a136e8 00000000`00000000 : nt!PspSystemThreadStartup+0x41

ffffdb80`631d7c60 00000000`00000000 : ffffdb80`631d8000 ffffdb80`631d2000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16

 

 

 

 

STACK_COMMAND:  kb

 

 

THREAD_SHA1_HASH_MOD_FUNC:  867e5a968da76728f7672cda902ce03b0094126c

 

 

THREAD_SHA1_HASH_MOD_FUNC_OFFSET:  0ca44a1f529d8537ce142132cfbf564122925c1a

 

 

THREAD_SHA1_HASH_MOD:  5f7fb32acfabea61ff02c84c0c3baa1fbe4b0b8d

 

 

FOLLOWUP_IP:

mrxsmb!SmbWskSend+1f2

fffff800`fdb59b02 8bd8            mov     ebx,eax

 

 

FAULT_INSTR_CODE:  8b49d88b

 

 

SYMBOL_STACK_INDEX:  12

 

 

SYMBOL_NAME:  mrxsmb!SmbWskSend+1f2

 

 

FOLLOWUP_NAME:  MachineOwner

 

 

MODULE_NAME: mrxsmb

 

 

IMAGE_NAME:  mrxsmb.sys

 

 

DEBUG_FLR_IMAGE_TIMESTAMP:  57cf9c38

 

 

BUCKET_ID_FUNC_OFFSET:  1f2

 

 

FAILURE_BUCKET_ID:  0x133_ISR_mrxsmb!SmbWskSend

 

 

BUCKET_ID:  0x133_ISR_mrxsmb!SmbWskSend

 

 

PRIMARY_PROBLEM_CLASS:  0x133_ISR_mrxsmb!SmbWskSend

 

 

TARGET_TIME:  2017-01-20T16:16:45.000Z

 

 

OSBUILD:  14393

 

 

OSSERVICEPACK:  0

 

 

SERVICEPACK_NUMBER: 0

 

 

OS_REVISION: 0

 

 

SUITE_MASK:  400

 

 

PRODUCT_TYPE:  3

 

 

OSPLATFORM_TYPE:  x64

 

 

OSNAME:  Windows 10

 

 

OSEDITION:  Windows 10 Server TerminalServer DataCenter SingleUserTS

 

 

OS_LOCALE: 

 

 

USER_LCID:  0

 

 

OSBUILD_TIMESTAMP:  2016-12-21 06:50:57

 

 

BUILDDATESTAMP_STR:  161220-1747

 

 

BUILDLAB_STR:  rs1_release

 

 

BUILDOSVER_STR:  10.0.14393.693.amd64fre.rs1_release.161220-1747

 

 

ANALYSIS_SESSION_ELAPSED_TIME: e45

 

 

ANALYSIS_SOURCE:  KM

 

 

FAILURE_ID_HASH_STRING:  km:0x133_isr_mrxsmb!smbwsksend

 

 

FAILURE_ID_HASH:  {f4239d18-f80c-7c1f-6289-34a57aa17a7d}

 

 

Followup:     MachineOwner

---------

 

Can somebody at Mellanox please help? I'm tempted to buy two off the shelf Mellanox cards just to rule Dell's firmware out of the equation.


Re: KNEM errors when running OMPI 2.0.1

$
0
0

FATAL: Error inserting knem (/lib/modules/3.13.0-37-generic/updates/dkms/knem.ko): Invalid module format indicates that the module is not built to match your kernel, even though it is in the correct dkms directory. Also, I would guess that your MPI is built to use knem as it doesn't complain about it being missing, and nothing tries to load it if it is not. So, I would download the latest version from the knem site, built it and install it. - Note - read the section of instructions about modifying the udev file. This will be necessary unless everyone is in the RDMA group. Knem does make a difference. It allows for 0 copy transfers withing the system, and doesn't have the security set-up problems of the other 0 copy options

Here are instructions:

KNEM: Fast Intra-Node MPI Communication

Re: Storage Spaces Direct Windows Server 2016 (1607) BSOD - Mellanox ConnectX-3 Pro (Dell)

Eswitchd error - using mellanox mitaka neutron ml2 plugin

$
0
0

Hi,

 

Eswitchd keeps failing with the below error (as seen in /var/log/neutron/eswitchd.log)

 

PF ib0 must have Mellanox Vendor ID,SR-IOV and driver module enabled. Terminating!

ib0 interface is on Mellanox ConnectX-3

 

How do I check for the three conditions

 

1. Vendor ID

2. SR-IOV is enabled

3. driver module is enabled

 

SR-IOV is already configures as described in HowTo Configure SR-IOV for ConnectX-3 with KVM (InfiniBand)

 

Any response is greatly appreciated!

 

Thanks,

Sunny

Re: Windows 2008 R2 BSOD connect x3 WinOF-5.10

$
0
0

I have installed the 5.22 version of the driver. Turns out it needed a hotfix, that it couldn't install. I installed it manually, and now the 5.22 install completes fine. I then shut the server down, and installed the card. After booting, I was able to see the card in network devices, and set an IP address, after the drivers were installed. It then asked me to reboot. After rebooting, the system BSOD within 5 minutes of when the windows login displays, I shutdown, tried another card, and the same problem exists. I had to remove the card, to get the system to stay up.

 

I noticed when it was detected the card, it looked on windows update fort he mellanox IPoIB driver. Thought that was strange, but not sure if thats normal.

 

WP_20170126_14_59_47_Pro.jpg

 

Please advise next steps.

Re: Windows 2008 R2 BSOD connect x3 WinOF-5.10

$
0
0

Hi Jim Kilborn,

Do you still have issue after installing the latest version of WinOF?

Thanks and regards,
Martijn van Breugel
Mellanox Technical Support

Re: KNEM errors when running OMPI 2.0.1

$
0
0

Hi David,

 

OMPI can use knem module, however it doesn't care about the compilation. knem is a part of kernel and not a part of OMPI. If any of kernel modules, like knem, cannot be loaded because of wrong symbols the issue should be taken with kernel module developers.

At the same time, you might try to recompile the modules for you kernel and see if it help. This link might be a good start point Command to rebuild all DKMS modules for all installed kernels? - Ask Ubuntu

Re: Mgmt_class for MAD packet in MLX4 driver

$
0
0

Hello Rama,

 

What driver version are you currently running?

How are you observing the Mgmt_class with rping?

What do you mean by "data offset as per Subn class"?

Is there a current issue?

What are you trying to accomplish?

 

Sophie.


Re: Windows 2008 R2 BSOD connect x3 WinOF-5.10

$
0
0

Hi Jim,

Can you provide us a screenshot from the BSOD?

Thanks and regards,
~Martijn

Mellanox MT27500 Link to Port 1 is Down

$
0
0

Hello,

 

I have Mellanox MT27500 dual port card which is installed on HP SL250s. Ubuntu 14.04.5 with 3.13.0-24-generic kernel is running on the server. Both port1 type and port 2 type are defined as eth. I have connected port 1 and port 2 to switch with same type of cable but only port 2 seems linked. Led belongs to port 1 is off.

 

Do you have any idea to use both ports?

 

Thanks,

Serhat

 

$ sudo lspci | grep Mellanox

07:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

 

$ cat /sys/bus/pci/devices/0000\:07\:00.0/mlx4_port1

eth

$ cat /sys/bus/pci/devices/0000\:07\:00.0/mlx4_port2

eth

 

$ sudo ethtool eth0

Settings for eth0:

  Supported ports: [ TP ]

  Supported link modes:   10000baseT/Full

  Supported pause frame use: No

  Supports auto-negotiation: No

  Advertised link modes:  10000baseT/Full

  Advertised pause frame use: No

  Advertised auto-negotiation: No

  Speed: Unknown!

  Duplex: Unknown! (255)

  Port: Twisted Pair

  PHYAD: 0

  Transceiver: internal

  Auto-negotiation: off

  MDI-X: Unknown

  Supports Wake-on: d

  Wake-on: d

  Current message level: 0x00000014 (20)

        link ifdown

  Link detected: no

 

$ sudo ethtool eth1

Settings for eth1:

  Supported ports: [ TP ]

  Supported link modes:   10000baseT/Full

  Supported pause frame use: No

  Supports auto-negotiation: No

  Advertised link modes:  10000baseT/Full

  Advertised pause frame use: No

  Advertised auto-negotiation: No

  Speed: Unknown!

  Duplex: Unknown! (255)

  Port: Twisted Pair

  PHYAD: 0

  Transceiver: internal

  Auto-negotiation: off

  MDI-X: Unknown

  Supports Wake-on: g

  Wake-on: g

  Current message level: 0x00000014 (20)

        link ifdown

  Link detected: no

 

$ sudo ip link set dev eth0 up

$ sudo ip link set dev eth1 up

 

$ ip link show eth0

12: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000

    link/ether 00:02:c9:fc:25:a0 brd ff:ff:ff:ff:ff:ff

$ ip link show eth1

13: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000

    link/ether 00:02:c9:fc:25:a1 brd ff:ff:ff:ff:ff:ff

Peerdirect Raw Ethernet programming

$
0
0

Hello,

 

I read through the excellent documents "Raw Ethernet Programming" where ethernet packets are sent and received bypassing the kernel.

 

My question: Is ist possible to GPUdirect (4.0) Sync (Peerdirect) with raw Ethernet programming?

 

I would like to duplicate the raw ethernet programming tutorial with packets sent and received directly by the GPU without touching host RAM or CPU.

 

Is it possible to activate the complete network interface card (Connect-X4 or Connect-X5) by the GPU and have the NIC not even visible at host linux?

 

Best Regards and many thanks!

Re: Which ESXi driver to use for SRP/iSER over IB (not Eth!)?

$
0
0

I think we have every right to ask for clarifications from Mellanox at this stage. ophirmaor?

Re: Which ESXi driver to use for SRP/iSER over IB (not Eth!)?

$
0
0

Some corrections to your list:

* 1.8.2.4 is for 5.x, 1.8.2.5 is for 6.0

* 1.8.3 beta supports both SRP and iSER and can be forcibly installed on 5.x and 6.0

* All 1.8.x.x support X2, X3 and X3 Pro. Also, they're the last to support X2. Not sure about the EN mode, haven't tried it.

* 1.9.x, 2.x and 3.x support only X3 and X3 Pro, none support X2 or older.

* 1.9.x and 3.x support only EN. 1.9.x is the only one supporting iSER.

* 2.x is the latest supporting the IB mode (only for X3 and X3 Pro). It may also support the EN mode, but I haven't tested it.

* Connect-IB, X4 in the IB mode and X5 aren't supported at all - I think this one is particularly insulting, because it means even relatively new cards are left without ESXi support.

 

I think you're absolutely right with your conclusion to stick with 1.8.2.5 on SRP under ESXi 6.0, this is exactly what I intend to do, even though I have X3 (not Pro) across the board and a managed IB/EN switch. Theoretically, I could use the 1.9.x in the EN mode (still on ESXi 6.0) over iSER, but performance wouldn't be on the same level as SRP and it wouldn't allow me to move to ESXi 6.5 anyway. I don't need any Windows support, my only storage client are ESXi hosts.

 

As for using X2 as 10Gb NICs, I think this is how they're recognised by the inbox ESXi 6.0 drivers (although not 100% sure). You can give it a shot.

Re: Which ESXi driver to use for SRP/iSER over IB (not Eth!)?

$
0
0

So just summarizing the options:

 

Driver VersionStorage ProtocolAdapter ModeAdapter FamilyVMware Ver Supported
1.8.2.5IPoIB+SRPVPI OnlyCX-2ESXi5.5,ESXi6.0
1.8.3betaIPoIB+iSerVPI OnlyCX-2,CX-3 ProESXi5.5,ESXi6.0
1.9iSerEN OnlyCX-3 ProESXi5.5,ESXi6.0
2IPoIB iSCSIVPI OnlyCX-2,CX-3 ProESXi5.5,ESXi6.0
3native iSCSIEN OnlyCX-3 ProESXi5.5,ESXi6.0,ESXi6.5
4native iSCSIEN OnlyCX-4ESXi5.5,ESXi6.0,ESXi6.5

(see later post with corrected chart)

 

IPoIB is TCP/IP in software only (no RDMA) on VPI only
No support for any CX-2 solutions

 

So since I don't have a full complement of CX-3 Pro everywhere and only an un-managed  IB switch I would be best to stick with 1.8.2.5 on SRP under ESXi6.0

Also since I need windows storage support I would be best to stick with SRP on 2008R2 since there is no iSer support on windows server. (and no SRP support after 2008R2)

 

I will stick with ESXi 6.0 currently (probably wouldn't be moving to 6.5 yet anyway) but when I do it looks like I will need to replace the 40G IB switch with an EN switch and fill out my storage network with CX-3 Pro to get iSer support.  Also I would need to hopefully find an iSer driver for Server 2012R2 (and/or 2016)

 

Is there any way with ESXi to use the CX-2 as a 10G ethernet adapter {with appropriate QSFP to SFP adapter} or is there no support at all on ESXi 6.0 (or 6.5).

Re: Which ESXi driver to use for SRP/iSER over IB (not Eth!)?

$
0
0

Ok I think I got all those corrections (might help someone later looking this up)

 

Driver VersionStorage ProtocolAdapter ModeAdapter FamilyVMware Ver SupportedNotes
1.8.2.4IPoIB+iSCSI, SRPVPI Only,?ENCX-2,CX-3,CX-3 ProESXi5.x
1.8.2.5IPoIB+iSCSI, SRPVPI Only,?ENCX-2,CX-3,CX-3 ProESXi6.0
1.8.3betaIPoIB+iSCSI, SRP, IPoIB+iSerVPI OnlyCX-2,CX-3,CX-3 ProESXi5.1(ESXi5.5 and 6.0 forced)
1.9iSCSI, iSerEN OnlyCX-3,CX-3 ProESXi5.5,ESXi6.0
2IPoIB+iSCSIVPI,?ENCX-3,CX-3 ProESXi5.5,ESXi6.0
3iSCSIEN OnlyCX-3,CX-3 ProESXi5.5,ESXi6.0,ESXi6.5
4iSCSIEN OnlyCX-4ESXi5.5,ESXi6.0,ESXi6.5

Re: Peerdirect Raw Ethernet programming

$
0
0

Hi,

I would suggest you to take a look on perftest - git://flatbed.openfabrics.org/~grockah/perftest.git suite. It has support CUDA and raw ethernet.

nfs over rdma

$
0
0

centos 7.3 MLNX_OFED_LINUX-3.4-1.0.0.0

I get this error:

nfs over rdma mount.nfs: mount(2) Input/output error, when hitting this command on the nfs-rdma client:

mount -o rdma,port=20049 ip-Server:/mnt/  /mnt/Client

 

can anyone give me help?

Re: Windows 2008 R2 BSOD connect x3 WinOF-5.10

$
0
0

Please see the attached screenshot.

 

 

Regards,

Jim

Re: Windows 2008 R2 BSOD connect x3 WinOF-5.10

$
0
0

Hi Jim,

Thank you for providing the screenshot.
Can we setup a phone-call regarding this issue? Let me know which time suits you. We can also discuss about moving the case to MyMellanox instead of using the community.

Thanks and regards,
~Martijn
 

Re: Which ESXi driver to use for SRP/iSER over IB (not Eth!)?

$
0
0

If you use vSphere OFED 1.8.2.4, 1.8.2.5, 1.8.3.0 on ESXi 6.0  with Solaris

COMSTAR SRP, iSER target then you will meet the ESXi PSOD.

 

ESXi 6.0 support Linux target only.

 

Jahoon Choi

2017년 1월 30일 (월) 19:03, jasonc <community@mellanox.com>님이 작성:

 

Mellanox Interconnect Community

<https://community.mellanox.com/?et=watches.email.outcome>

Which ESXi driver to use for SRP/iSER over IB (not Eth!)?

 

Jason Cecchin

<https://community.mellanox.com/people/jasonc?et=watches.email.outcome>

marked mpogr

<https://community.mellanox.com/people/mpogr?et=watches.email.outcome>'s

reply on Which ESXi driver to use for SRP/iSER over IB (not Eth!)?

<https://community.mellanox.com/thread/3379?et=watches.email.outcome> as

helpful. View the full reply

<https://community.mellanox.com/message/7767?et=watches.email.outcome#comment-7767>

 

Viewing all 6226 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>