I am trying to find a simple way to run a forward on a batch on two models on two GPUs at the same time. That means I do not want to distribute the batch of the same model across devices, but two models across devices. I thought that simple python multiprocessing can work, but running into issues due to pickling of the models and memory-related things.
So I want to do:
model1 = model1.to("cuda:0") model2 = model2.to("cuda:1") # run simultaneously model1(batch) model2(batch)
Any idea how to do this elegantly? In best case in a notebook environment.