Hi!
I’m trying to run many models on a single GPU by switching them in and out as needed (they don’t all fit in GPU memory together), but I’m finding that loading each model:
model = model.to('cuda')
takes 20-80ms (ex/ VGG16: ~80ms). If I want to load two different VGG16 models at a time, is there a way to parallelize loading such that the total load time is < 160ms?
Thanks!