Hallo,
I'm in a quite simmilar position as you are and I found the same issue with MPIO to be true for our enviroment.
I just went ahead and tested it on a LUN that had only semi-critical VMs on it and didn't see any impacts.
So I changed it for all LUNs and now all LUNs are using 4 paths instead of 1.
Regarding your question with the IOPS and command queue depth I have read similar things and looking forward if anyone comes up with an answer if increases performance and how it's done.
Regards,
Benjamin