Hi Lin Chen,
You can read the source code of OSU microbenchmarks with the CUDA support to understand how to implement your code from OSU http://mvapich.cse.ohio-state.edu/benchmarks/
The MVAPICH2-GDR is free, but not open source. If you need to have an MPI that is free and open source, you can use Open MPI as an alternative. Both MVAPICH2-GDR and Open MPI have CUDA aware and can support GPUDirect RDMA.
Thank you,
Sophie.