Loading Batch of Variable Size Tensors

anon543 · October 12, 2018, 3:38am

I built a convolutional autoencoder which consumes a variable size image dataset. For the dataloader, I used a custom collate_fn, but now I’m running into the problem that I can only load one image onto the GPU at a time. Since the bandwidth is really slow, I’m wondering if there is a way to load a list of pytorch tensors onto the GPU which have variable size? I know the maximum image size, so I have a fixed bound for the batch size, so this will never be a problem.

Mazhar_Shaikh · October 13, 2018, 7:33am

To the best of my knowledge, pytorch doesn’t have a list data type and hence can’t load a list of tensors as a list. It tries to convert a list of tensors into a single tensor, but all the tensors in the list need to have the same size.

What you could do instead is zero pad your images to either the maximum image size for the dataset or the batch. This will involve a trade off between being able to process larger batches vs performing unnecessary computations.

InnovArul · October 13, 2018, 10:46am

What do you mean by “bandwidth is really slow”? You mean the speed at which the CPU transfers data to GPU?

I am sorry, but I am unable to relate the premise with the question. If the bandwidth is slow, it is going to affect every data (fixed size, variable size) regardless of its size. Correct me if I am wrong.

anon543 · October 14, 2018, 8:19pm

What I mean is transferring data from the system memory to the GPU memory is slow, and I would rather have larger batches instead of having a batch size of one.

anon543 · October 14, 2018, 8:22pm

I was thinking about trying this strategy out, but I would have to figure out how to tell the cuda kernels what the actual image size is. E.g. if I have an image of size 137x200 but I want to embed it into a square 200x200 (and then insert it as image k in the image tensor) then I want to let the kernel know that image is only 137x200, to avoid having the net learn this buffer of zeros. It turns out if you try and build an autoencoder this way, it implicitly learns the shape and places images of a similar pixel size next to each other, which is not really what you want.