Thanks, Blair. Surely I checked the NUMA topology. I have two NUMA nodes and each has 8 cores. While pinning VM to different cores/NUMA nodes, I've checked the MPI Bandwidth Performance. In case of 1B~32KB, the performance is too bad (<17% of host result) even though maximum bandwidth is OK in case of >=64KB.
↧