Re: Omnios + RSF-1 + Inifiniband

March 26, 2015, 3:09 am

≫ Next: Re: Omnios + RSF-1 + Inifiniband

≪ Previous: Re: Omnios + RSF-1 + Inifiniband

Hi,

Thanks for replying. Its not from what i understand, and if it was, then I would think it would still function, the communication for RSF-1 is over ethernet. The storage network is using infiniband, this is using IPoIB for iSCSI on windows as SRP support was removed in 2012, and NFS for ESXi, do to the VMs being exposed in the snapshots making vm recovery from snapshots easier, SRP was tested also, didnt provide much greater performance overall.

The storage nodes have a management network over ethernet, which is also use for heartbeats, they have a serial link for heartbeat and a set of quorum disks. Failover in these tests were manually initiated which unmounts the pool, removes the network configuration, remounts it on the other node and reconfigures the network.

The infiniband network config and subnet manager could be the problem.

Each system has 2 ports, 1 port from each is connected to an independent switch (no link between them), so port 1 on all systems go to switch 1 and port 2 switch 2.

Port 1 is on subnet 10.200.46.0/25 and port 2 is on 10.200.46.128/25 (IPoIB).

But both subnets have the same pkey. In the IPoIB release notes it says that different subnet need a different pkey if they are on the same switch, otherwise arp updates may produce an incorrect route. These are not on the same switch, but thought that it could be the problem. But checking the arp updates on ESXi, appear to show the IP addresses moving over to the correct MAC and using the correct interface.

There are 2 ESXi NFS datastores, running over the different subnets, datastore 1 over subnet1 and 2 over 2.

The subnet manager could also be a problem, reading different config appear to show different notations, so not sure which is correct. The correct partition config is: Default=0xffff,ipoib,rate=7,x mtu=5,defmember=full:ALL; The subnet manager logs also produce some errors multiple times:

583281 [23991700] 0x01 -> __osm_mcmr_rcv_join_mgrp: ERR 1B10: Provided Join State != FullMember - required for create, MGID: ff12:401b:ffff::2 from port 0x0002c903002af1cf (MT25408 ConnectX Mellanox Technologies)

558044 [23190700] 0x02 -> osm_report_notice: Reporting Generic Notice type:3 num:66 (New mcast group created) from LID:9 GID:fe80::2:c903:2a:f16f

I have set all the systems to use an MTU of 4K, debian, the subnet manager didnt want to go above 2K until it was set into connected mode. I was going to set everything back to 2K as its default to be sure thats not causing a problem.

I have 2 setups that i was going to try, the first was, keep the layout the same as now, but have 2 pkeys, 1 for each subnet and change the MTU back to default, see if that works. The next is to use only 1 subnet and pkey and link the 2 switches together, again with 2K MTU. The later is not the recommended config for an iSCSI network with multiple paths, so isnt really wanted.

Any additional help with this would be appreciated, and any additional info you may need I will try to provide.

Thanks

↧

Re: Omnios + RSF-1 + Inifiniband

March 26, 2015, 3:40 am

≫ Next: Re: I have two MCQH29-XDR mezzannie cards in different servers, that suddenly do not appear in device manager, or work at all.

≪ Previous: Re: Omnios + RSF-1 + Inifiniband

I remember coming across this exact problem a long time ago setting up a similar config.

I went back through some old correspondence and indeed found that it was ARP requests not being honoured by the infiniband HCAs that was the root cause of the problem.

Apparently whilst IB cards do have ARP functionality, they do not honour the command to update their ARP for any given IP address.

This contradicts your experiences as you suggest that you can see the ARP updates on the interface so I am a little confused.

I would not think that the subnet manager would become involved at all in this scenario as the LID assignment would only ever change on a reboot.

Have you tried using connected mode? Packet headers are slightly different compared to Datagram mode, and is worth a shot.

RFC 4391 - Transmission of IP over InfiniBand (IPoIB)

↧

Re: I have two MCQH29-XDR mezzannie cards in different servers, that suddenly do not appear in device manager, or work at all.

March 26, 2015, 5:02 am

≫ Next: Re: MLNX_OFED_LINUX-2.4-1.0.4 - test problems

≪ Previous: Re: Omnios + RSF-1 + Inifiniband

Hi Will,

what is the OS you are using? (Win, Linux, version). which driver and version are you using?

have you done any OS upgrades recently?

↧

Re: MLNX_OFED_LINUX-2.4-1.0.4 - test problems

March 26, 2015, 5:15 am

≫ Next: Re: I have two MCQH29-XDR mezzannie cards in different servers, that suddenly do not appear in device manager, or work at all.

≪ Previous: Re: I have two MCQH29-XDR mezzannie cards in different servers, that suddenly do not appear in device manager, or work at all.

In this particular use-case, when testing with pertest tools with lowest to highest message size, users should put -a flag on both the sender and receiver.

↧

Re: I have two MCQH29-XDR mezzannie cards in different servers, that suddenly do not appear in device manager, or work at all.

March 26, 2015, 12:20 pm

≫ Next: Mellanox ConnectX-3 Ethernet Card.

≪ Previous: Re: MLNX_OFED_LINUX-2.4-1.0.4 - test problems

Great Questions!

OS is Windows 2012 R2

Drivers are WinOF 4.9

Firmware was a custom burn version above 2_9_1000, that was working initially. 2.10.720 I believe.

I have two MHQH19B-XTR's at the same firmware and drivers, in two identical machines running Win 2012 R2, without issuee.

The primary difference is that the XTR's are not mezzannie PCI cards, and the problem cards are.

Both servers started at a clean install of 2012 R2. One of the servers has since had a clean install, with the non-working card installed from the beginning.

Usually, windows has no problem detecting and loading generic drivers, on clean install.

However, my cards are not seen at all anywhere on the system.

So my questions are:

Is my mezzannie slot damaged or did both cards die?

If the firmware is goofed up, and I short the jumper to ignore the flash, should I be able to see the card via the mst utility? Because it doesn't show up..

mst does not show the adapter at all, so I don't believe this is a case of windows driver issue. PCI-Z also does not show any cards on the PCI bus, but all the same PCI equipment shows up otherwise.

The problem began after some physical maintenance to the server that included de-racking it and doing some hard drive stuff internally. While I don't believe damage to the slots occured (can't imagine what might have done that), it is suspicious to me that the cards failed to work from that point forward.

I don't know if bad firmware coupled with a cold-start can cause the cards to lose their mind?

And I don't know if the cards should show-up in the mst utility if they are put into "ignore the flash" mode...

I have two machines that I can move the cards too, to isolate and see if it's a slot problem or a card problem, but those machines need a little bit of work to fit the card in there, so I wanted to make sure there was nothing else I could do to see if the cards are still alive before I do that.

I don't have an i2c setup to connect to that port at this point, but I'm open to it if that's the only way to figure this thing out.. since these cards are not cheap!

Thanks for any information!

↧

Mellanox ConnectX-3 Ethernet Card.

March 27, 2015, 3:19 am

≫ Next: Re: Mellanox ConnectX-3 Ethernet Card.

≪ Previous: Re: I have two MCQH29-XDR mezzannie cards in different servers, that suddenly do not appear in device manager, or work at all.

Hi,

I am currenly working on mellanox ConnectX-3 ethernet card. I found some tools like mlnx_qos, mlnx_qcn, and mlnx_perf but not aware about how to test it.

I wanted to know more about the features like QoS , QCN and how it is tested or verified.

Kindly share the thoughts on this which will help me to progress further.

Thanks in advance.

-Thanks and Regards,

Sumit

↧

Re: Mellanox ConnectX-3 Ethernet Card.

March 27, 2015, 9:06 am

≫ Next: Re: Omnios + RSF-1 + Inifiniband

≪ Previous: Mellanox ConnectX-3 Ethernet Card.

Go to the Mellanox documentation server.

https://mymellanox.force.com

and there to /support/VF_SerialSearch

Enter a valid serial number of one of your cards, then search for MLNX_QOS to find documentation

that you may download.

SPEEDY

↧

Re: Omnios + RSF-1 + Inifiniband

March 27, 2015, 9:26 am

≫ Next: Re: Mellanox ConnectX-3 Ethernet Card.

≪ Previous: Re: Mellanox ConnectX-3 Ethernet Card.

David

I assume you have dual port IB cards.

You should:

- connect the IB card with both switches

- define per subnet a pkey (we have default and 4 pkeys from 60 to 63)

Example for partition.conf:

# Defaultpartition

Default=0x7fff, ipoib, mtu=4 : ALL=full, SELF=full ;

# Namenskonvention key<VLAN tag> z.B. key60=0x803C 60 (dezimal)=3C (hexadecimal)

key60=0x803C, ipoib, mtu=4 : ALL=full;

key61=0x803D, ipoib, mtu=4 : ALL=full;

key62=0x803E, ipoib, mtu=4 : ALL=full;

key63=0x803F, ipoib, mtu=4 : ALL=full;

- define a virtual switch in ESX, standard or distributed does not matter

- define in ESX the first port as active and the second port as standby (this is important)

- define per subnet a portgroup and use the pkey as VLAN id

This works for us in our clustered storage setup with Solaris 11.2 and corosync/pacemaker.

We use 3 subnets: storage, vmotion, backup.

Each of the 4 IB switches are connected with 2 other switches as a mesh of 4 switches.

If you use just one port per subnet you disable failover between ports.

And I guess you would like to use redundancy and automatic failover if you use 2 switches.

You may try to change the order of active/standby per portgroup=subnet=VLAN if you want to have traffic over all ports.

And we use NFS over IPOIB and not ISCSI or SRP with datastores sizes between 5 and 170 TB.

NFS is simpler and fast enough with IPOIB.

Andreas

↧

Re: Mellanox ConnectX-3 Ethernet Card.

March 27, 2015, 9:34 am

≫ Next: Re: I have two MCQH29-XDR mezzannie cards in different servers, that suddenly do not appear in device manager, or work at all.

≪ Previous: Re: Omnios + RSF-1 + Inifiniband

Hi,

1. I suggest you start with the MLNX_OFED or MLNX_EN User Manual located on Mellanox web.

Mellanox Products: Mellanox OpenFabrics Enterprise Distribution for Linux (MLNX_OFED)

2. I suggest you also browse here Solutions

You may find many thing related to mlnx_qos, for example:

Network Considerations for Global Pause, PFC and QoS with Mellanox Switches and Adapters

End-to-End QoS Configuration for Mellanox Switches and Adapters

HowTo Configure QoS over SR-IOV

There are more discussion with answer that may help you.

Ophir.

↧

Re: I have two MCQH29-XDR mezzannie cards in different servers, that suddenly do not appear in device manager, or work at all.

March 29, 2015, 5:30 am

≫ Next: Re: Designing for high availability

≪ Previous: Re: Mellanox ConnectX-3 Ethernet Card.

Hi Will,

it does sound like a bad card or a a bad slot to me, I believe that in live fish mode the card should be identified in Device manager, and also with MFT only as "SDR" or "Recovery Mode"

Here's a picture of what a jumper on an HCA should look like :

So I think you might need to try eliminating factors by putting this on a server / slot which is known to work.

Let us know how you progress.

Cheers!

↧

Re: Designing for high availability

March 29, 2015, 9:54 pm

≫ Next: Re: Designing for high availability

≪ Previous: Re: I have two MCQH29-XDR mezzannie cards in different servers, that suddenly do not appear in device manager, or work at all.

Hi,

If you're planning to use GPFS over RDMA you'll most likely have to use IPoIB as the upper layer interface. IPoIB can use the standard linux bonding driver with an active/passive configuration, as of now I'm not familiar with IPoIB working as active/active.

This is a very common configuration with GPFS.

↧

Re: Designing for high availability

March 29, 2015, 10:42 pm

≫ Next: Re: Infiniband for Mellanox Technologies MT25204

≪ Previous: Re: Designing for high availability

I'am familiar with Infiniband, GPFS and RDMA is a single switch environment. This is straightforward and you can define your IPoIB interfaces as bond to fail over when individual links fail. I have also seen configurations where two IB switches are used and bonding to fail over between the two, but in a active/passive arrangement. In both cases the resulting configuration is similar to an analogue Ethernet configuration.

I'm looking for a configuration similar to two Ethernet switches, stacked together, with the hosts using LACP to manage the links. This allows to use the full bandwidth of the links (link aggregation) and provides fail-over in case of switch failure.

↧

Re: Infiniband for Mellanox Technologies MT25204

March 30, 2015, 2:29 am

≫ Next: Re: Omnios + RSF-1 + Inifiniband

≪ Previous: Re: Designing for high availability

which firmware update version did you use?

Thanks.

↧

Re: Omnios + RSF-1 + Inifiniband

March 30, 2015, 5:00 pm

≫ Next: Mellanox Messaging Library RPM

≪ Previous: Re: Infiniband for Mellanox Technologies MT25204

Hi,

thanks for the help, I will give it a go when I'm back at work next week, off for Easter now

You are correct, using a dual port card.

Again thanks for the information and that you are using something very similar to me, which is working. Hopefully I will be able to get it working next week and report back. If not hopefully you will be willing to provide a little more help

Decided on NFS for ESXi for the same reason, simpler, easier access to the VMs in the snapshots.

Thanks again.

↧

Mellanox Messaging Library RPM

March 31, 2015, 10:13 am

≫ Next: Re: Mellanox Messaging Library RPM

≪ Previous: Re: Omnios + RSF-1 + Inifiniband

Is there an archive of the mxm RPMs available? We are trying to find a copy of the mxm RPM that is compatible with CentOS 4.4. We need to compile OpenMPI with MXM on a system running CentOS 4.4.

Thank you.

↧