Hi,
i saw you applied a good set of performance tuning parameters but still go through the recommended once and make sure you didn't leave anything impotent behind.
also, try to run your iperf client with -P (# of thread) equal to the number of processors in the machine.