I’m trying to train a network. I need to load x images in training. But, I only need half of them for validation. Could you please tell me how should I handle this in my data loader?
Do I need two getitem? Is there any attribute to show whether It’s loading validation or training?
You should handle the data splitting beforehand.
For each dataset (train, val, test) create a
Dataset and wrap it in a
Then you can iterate your
DataLoaders and train or test your model.
I’ve already handled this. The problem is getitem function is both for validation and training data. This function gets index of val/train set array and it loads data and returns it.
This shouldn’t be a problem, since you are creating separate
Datasets for training and validation.
I created a small snippet, which might help you
# Create fake data
X_train = torch.randn(100, 3, 16, 16)
y_train = torch.Tensor(100).random_(0, 10).long()
X_val = torch.randn(50, 3, 16, 16)
y_val = torch.Tensor(50).random_(0, 10).long()
def __init__(self, X, y):
self.data = X
self.target = y
def __getitem__(self, index):
x = self.data[index]
y = self.target[index]
return x, y
for data, target in train_loader:
# Your training procedure...
for data, target in val_loader:
# Your validation procedure...
train_dataset = MyDataset(X_train, y_train)
val_dataset = MyDataset(X_val, y_val)
train_loader = DataLoader(train_dataset)
val_loader = DataLoader(val_dataset)
The problem is, I have different number of data for training and testing which means during validation the index in getitem is between 0 and 3. However, during training the index number is between 0 and 7. The problem is how to define an if in getitem.
In my example code I used
100 samples for the training set and
50 samples for validation.
You don’t need to concat both datasets if you want to use them for training and validation, respectively.
Is my code not working for you?