Hi,
I am trying to create a dataloader that will return batches of input data that doesn’t have target data.
Here’s what I am doing:
torch_input = torch.from_numpy(x_train)
torch_target = torch.from_numpy(y_train)
ds_x = torch.utils.data.TensorDataset(torch_input)
ds_y = torch.utils.data.TensorDataset(torch_target)
train_loader = torch.utils.data.DataLoader(ds_x, batch_size=128, shuffle=False)
target_loader = torch.utils.data.DataLoader(ds_y, batch_size=128, shuffle=False)
I do this because my case study assumes that the train data (x_train) and their corresponding labels (y_train) do not exist on the same device.
However, the above loaders both return lists, not Tensors, and therefore the training always fails.
For illustration purposes: grabing a single batch and printing its type returns a list:
x= next(iter(train_loader))
print(type(x))
I found out that I can index x, as x[0] to access the batch as Tensors. However, I would like to know how to do this correctly (i.e., create dataloaders from x variables, without their target).
Hi,
It would be simple enough if you would create a Dataset class:
from torch.utils.data import Dataset
class Dset(Dataset):
def __init__( self , x_train , transform=True):
self.transform = transform
self.x=x_train
def __getitem__(self , index):
if self.trainsform:
self.x=torch.from_numpy(self.x)
return self.x
def __len__(self):
return len(self.x)
And then use dataloaders like:
train = Dset(x_train)
target = Dset(y_train)
batch_size=128
train_loader = DataLoader(train, batch_size=batch_size,shuffle=False)
target_loader = DataLoader(target, batch_size=batch_size,shuffle=False)
While calling them in for training, try:
for epoch in range(epochs):
for x_ in train_loader:
print(type(x_))
...
Thank you! This seems to have solved my main issue, but using the above code made the loader very slow. Actually, if I use it inside a loop:
for epoch in range(epochs):
for x_ in train_loader:
print(type(x_))
This takes five seconds to print the first time, and then I get an error message that the RAM is crushed for using all available memory!
Try setting num_loaders=0
in dataloader and also reducing batch size-- if you are having GPU issues. I see you are using torch.from_numpy as the transform. If it is slow, try to follow this thread to fasten that up.