Create dataLoader with costume size labels

afshin67 · June 26, 2020, 4:13pm

I have a dataset with labels of size 64, and the inputs are images of size 15*17 in 3 channel. So, the feature (images) and labels are of size n*3*15*17 and n*64. I want to create a pytorch dataLoader to get mini-batch of data. I could do this by expanding the dimension of labels to n*3*15*64 and concatenate it with the feature (images) to get batches and then separate before passing them into the model. But, I think it is not a good idea since when n is quite big, this can be quite expensive in terms of memory usage, and also involves some unnecessary computation.

I appreciate any suggestion?

And, here is an example:

import torch import numpy as np 

n = 10 
a = np.ones((n,3,15,17))
b = np.ones((n,64))

data = torch.utils.data.DataLoader(train_data, 5, shuffle = True)

in which I am not sure how to construct train_data out of a and b.

Antonio_Ossa · June 26, 2020, 5:33pm

Hi @afshin67,

If I understand your question corectly, a possible solution is to create a CustomDataset that returns a tuple with label and data or a dictionary containing both, a common approach that is shown in this tutorial. A small example I crafted in case you find it useful:

class CustomDataset(Dataset):

    def __init__(self, a, b):
    	self.a = a
    	self.b = b

    def __len__(self):
        return len(self.a)

    def __getitem__(self, idx):
    	return self.a[idx], self.b[idx]
        # return {"a": self.a[idx], "b": self.b[idx]}

train_data = CustomDataset(a, b)
data = torch.utils.data.DataLoader(train_data, 5, shuffle=True)

Hope it helps!

afshin67 · June 26, 2020, 6:05pm

That is exactly what I was looking for.
Thanks for the response.