Different training and validation number of inputs

I’m trying to train a network. I need to load x images in training. But, I only need half of them for validation. Could you please tell me how should I handle this in my data loader?
Do I need two getitem? Is there any attribute to show whether It’s loading validation or training?


You should handle the data splitting beforehand.
For each dataset (train, val, test) create a Dataset and wrap it in a DataLoader.
Then you can iterate your DataLoaders and train or test your model.

1 Like

I’ve already handled this. The problem is getitem function is both for validation and training data. This function gets index of val/train set array and it loads data and returns it.

This shouldn’t be a problem, since you are creating separate Datasets for training and validation.

I created a small snippet, which might help you

# Create fake data
X_train = torch.randn(100, 3, 16, 16)
y_train = torch.Tensor(100).random_(0, 10).long()
X_val = torch.randn(50, 3, 16, 16)
y_val = torch.Tensor(50).random_(0, 10).long()

class MyDataset(Dataset):
    def __init__(self, X, y):
        self.data = X
        self.target = y
    def __getitem__(self, index):
        x = self.data[index]
        y = self.target[index]
        return x, y

    def __len__(self):
        return len(self.data)

def train():
    for data, target in train_loader:
        # Your training procedure...
        print data.shape
        print target.shape
def val():
    for data, target in val_loader:
        # Your validation procedure...
        print data.shape
        print target.shape

train_dataset = MyDataset(X_train, y_train)
val_dataset = MyDataset(X_val, y_val)
train_loader = DataLoader(train_dataset)
val_loader = DataLoader(val_dataset)


The problem is, I have different number of data for training and testing which means during validation the index in getitem is between 0 and 3. However, during training the index number is between 0 and 7. The problem is how to define an if in getitem.

In my example code I used 100 samples for the training set and 50 samples for validation.
You don’t need to concat both datasets if you want to use them for training and validation, respectively.
Is my code not working for you?