Multiple transform outputs

Hello! I am having (conceptual) issues with writing my own custom transform. Imagine that I have a image classification case and I wish to apply e.g 10 random crops for the same image so I write a custom transform just for that. Let’s say that I wish to do this because images are extremely large and I wish to avoid loading them many times (less epochs and larger batches). The problem I am having is what should be the output of this transform, since now I have multiple crops? First thing that comes to my mind is a list of tuples as (image, class), but then how would I feed this into my network without changing the architecture (I still wish to feed one image and get class scores).

Thanks in advance!

1 Like

You could use a similar approach as shown in the TenCrop approach, where the tensors are stacked to a 5d tensor first and later flattened such that the batch size increases.

1 Like

Thank you for your reply. I have tried implementing this, but the problem arises as my dataset does not acquire batch dimension until it passes through data loader. Is there any way to flatten after the data loader and before passing data to my network? Thanks!

This would be what the code example is doing, wouldn’t it?
The Dataset would return a tensor in the shape [crops, channels, height, width], while the DataLoader would add the batch dimension to [batch_size, crops, channels, height, width].
Inside the DataLoader loop you would then flatten it to [batch_size*crops, channels, height, width] and pass it to the model.

1 Like

Yeah, you’re right. For some reason I thought I would need to edit how DataLoader loads files, instead of simply flattening the data while looping. Anyways, thank you for your answer!