Different training and validation number of inputs

shahabty · January 18, 2018, 9:41pm

Hello,
I’m trying to train a network. I need to load x images in training. But, I only need half of them for validation. Could you please tell me how should I handle this in my data loader?
Do I need two getitem? Is there any attribute to show whether It’s loading validation or training?

Thanks,

ptrblck · January 19, 2018, 12:11am

You should handle the data splitting beforehand.
For each dataset (train, val, test) create a Dataset and wrap it in a DataLoader.
Then you can iterate your DataLoaders and train or test your model.

shahabty · January 19, 2018, 12:46am

I’ve already handled this. The problem is getitem function is both for validation and training data. This function gets index of val/train set array and it loads data and returns it.

ptrblck · January 19, 2018, 10:12am

This shouldn’t be a problem, since you are creating separate Datasets for training and validation.

I created a small snippet, which might help you

# Create fake data
X_train = torch.randn(100, 3, 16, 16)
y_train = torch.Tensor(100).random_(0, 10).long()
X_val = torch.randn(50, 3, 16, 16)
y_val = torch.Tensor(50).random_(0, 10).long()

class MyDataset(Dataset):
    def __init__(self, X, y):
        self.data = X
        self.target = y
    
    def __getitem__(self, index):
        x = self.data[index]
        y = self.target[index]
        return x, y

    def __len__(self):
        return len(self.data)

def train():
    for data, target in train_loader:
        # Your training procedure...
        print data.shape
        print target.shape
    
def val():
    for data, target in val_loader:
        # Your validation procedure...
        print data.shape
        print target.shape

train_dataset = MyDataset(X_train, y_train)
val_dataset = MyDataset(X_val, y_val)
train_loader = DataLoader(train_dataset)
val_loader = DataLoader(val_dataset)

train()
val()

shahabty · January 20, 2018, 9:39pm

The problem is, I have different number of data for training and testing which means during validation the index in getitem is between 0 and 3. However, during training the index number is between 0 and 7. The problem is how to define an if in getitem.

ptrblck · January 20, 2018, 10:13pm

In my example code I used 100 samples for the training set and 50 samples for validation.
You don’t need to concat both datasets if you want to use them for training and validation, respectively.
Is my code not working for you?