Hi Lin Chen,
Also, other places that might interest you :
GDRCopy code is a good example on how to use the GPUDirect RDMA API: https://github.com/NVIDIA/gdrcopy
If you are looking for a CUDA + IB verbs level example, the ib_send_bw and ib_write_bw tests in perftest could serve as an example. A copy of the perftest can be found here: git://git.openfabrics.org/~grockah/perftest.git
Thank you,
Sophie.