How to run multiprocessing with cuda streams

Cross-post from here with more information about compute resources besides the needed memory.