Hello guys,
We'll build a cluster with 910 computing nodes and two sets of storage systems, thus we are going to order two SX6536 switches, each of which has 34 18-port FDR leaf modules. Thus the two switches will have 2* 612=1224 ports. My first question then comes out that if we use a fat-tree topology for the cluster, can we get a network which is free of congestion and full line speed? And what kind of fabric connections is the best topology to get the congestion free network?
I asked the Mellanox technicians in China, but they seemed not good at it. Some guys said according to some modelling results, we had to use additional edge switches to get the congestion free network. I'm very confused about it!
As for the modelling, I’v read the paper “Infiniband Congestion Control: Modelling and validation”, and I’m going to do some modeling on the congestion problem. I download the OMNet++ Infiniband Flit Level Simulation Model at http://www.mellanox.com/page/omnet; however, I did not find the “ccmgr” module in that model. The Mellanox technicians in China don't know the module, too. Thus, my second question is that normally what kinds of modeling software are used to modelling the Infiniband network congestion problems. If using the OMNet++ Infiniband Flit Level Simulation Model, I also ask for help that where I can find the “ccmgr” module (or IB CC extension)?
Your help will be very appreciated!