I need to run inference of image recognition models. I have 8gpus, and I would like to run one image per gpu each iteration, because I do not want to crop the images.
I hope to use nn.DataParallel because everything is in a same process rather than muthi-processing. ( I need to gather the result and save it to disk).
The problem is that images have different sizes, and nn.DataParallel seems not to work with this. (because I use dataloader which cannot collate images with different sizes).
I do not use DistributedDataParallel, because my results are actually embedding vectors, and I have plenty of images whose embeddings would make gpu out of memory. If I have only one process, I can copy these results to host memory at each iteration. But the multi-processing does not support gather results on different processes if they lie in host memory.
How could I solve this problem please?