Creating a dataloader without target values

AlJazari · July 22, 2020, 9:10pm

Hi,

I am trying to create a dataloader that will return batches of input data that doesn’t have target data.
Here’s what I am doing:

torch_input = torch.from_numpy(x_train)
torch_target = torch.from_numpy(y_train)

ds_x = torch.utils.data.TensorDataset(torch_input)
ds_y = torch.utils.data.TensorDataset(torch_target)

train_loader = torch.utils.data.DataLoader(ds_x, batch_size=128, shuffle=False)
target_loader = torch.utils.data.DataLoader(ds_y, batch_size=128, shuffle=False)

I do this because my case study assumes that the train data (x_train) and their corresponding labels (y_train) do not exist on the same device.

However, the above loaders both return lists, not Tensors, and therefore the training always fails.

For illustration purposes: grabing a single batch and printing its type returns a list:

x= next(iter(train_loader))
print(type(x))

I found out that I can index x, as x[0] to access the batch as Tensors. However, I would like to know how to do this correctly (i.e., create dataloaders from x variables, without their target).

Hmrishav_Bandyopadhy · July 23, 2020, 7:07am

Hi,

It would be simple enough if you would create a Dataset class:

from torch.utils.data import Dataset

class Dset(Dataset):

    def __init__( self , x_train , transform=True):
        
        self.transform = transform
        self.x=x_train


    def __getitem__(self , index):
         if self.trainsform:
                self.x=torch.from_numpy(self.x)
         return self.x
        

    def __len__(self):
        return len(self.x)

And then use dataloaders like:

train = Dset(x_train)
target = Dset(y_train)
batch_size=128
train_loader = DataLoader(train, batch_size=batch_size,shuffle=False)
target_loader = DataLoader(target, batch_size=batch_size,shuffle=False)

While calling them in for training, try:

for epoch in range(epochs):
       for x_ in train_loader:
           print(type(x_))
           ...

AlJazari · July 23, 2020, 5:07pm

Thank you! This seems to have solved my main issue, but using the above code made the loader very slow. Actually, if I use it inside a loop:

for epoch in range(epochs):
       for x_ in train_loader:
           print(type(x_))

This takes five seconds to print the first time, and then I get an error message that the RAM is crushed for using all available memory!

Hmrishav_Bandyopadhy · July 23, 2020, 9:11pm

Try setting num_loaders=0 in dataloader and also reducing batch size-- if you are having GPU issues. I see you are using torch.from_numpy as the transform. If it is slow, try to follow this thread to fasten that up.