For hyperparameter search I start multiple training-processes on the same GPU.
Unfortunately each process creates its own cuda context, consuming an excessive 700MB of memory.
Is there anything I can do to make the cuda context shared among multiple processes?
I’d love to use threads instead of processes, but according to the API this does bring even more issues.