Unable to calculate validation loss in training loop

Hi,
I’m trying to fit a neural network using a training loop in Pytorch but I’m unable to calculate the validation error in the training loop because of an output-input size mismatch. The primary problem is that the size of the output in the validation set is always equal to the size of the output in the last stage of the training neural network and is not equal to the size of the input given by the validation data loader.

In the following code, the expected output size of prds is of length 40 but the code below gives prds of length 8 which is the size of y_pred in the last training loop. As a result the loss function doesn’t work because it gets one input of length 8 (prds) and one input of length 40 (y_val). I would be very gratful if someone could help me find a way to get prds of the correct length.

Note: If I run the validation set outside the entire training loop (i.e. after all the epochs are over) , the validation error calculated. Here is my code

net = MixedInputModel(emb_szs,len(contin_vars), 0.04, 1, [100,50], [0.0001,0.0001] ,y_range=y_range, use_bn=True, is_reg=True, is_multi=False)
loss = nn.MSELoss()
learning_rate = 1e-2
opt = optim.SGD(net.parameters(),lr = learning_rate,momentum = 0.9, weight_decay = 1e-3)
scheduler = torch.optim.lr_scheduler.StepLR(opt,1)
for epoch in range(1): 
  losses,losses_val=[],[]
  net.train()
  dl = iter(md.trn_dl)
​
  for t in range(len(list(md.trn_dl))): #number of batches
    l = next(dl)
    x_cat, x_cont,y = l
    #net.train()
    #opt.zero_grad()
    #a. Forward pass: compu
    y_pred = net(V(x_cat),V(x_cont))
    #print(y_pred)
    ls = loss(y_pred, V(y))
    losses.append(ls) 

    # b.Use the optimizer object to zero all of the gradients for the variables to be updated (which are the learnable weights of the model)
    opt.zero_grad()
    #c. Backward pass: compute gradient of the loss with respect to model parameters
    ls.backward()
    #d. Calling the step function on an Optimizer makes an update to its parameters
    opt.step()
    scheduler.step()

  #validation loop 
  net.eval()
  vali_dl = iter(md.val_dl)
  for tt in range(len(list(md.val_dl))):
    vdl = next(vali_dl)
    xv_cat,xv_cont,y_val = vdl
    prds = net(V(xv_cat),V(xv_cont))
    ls_val = loss(prds, V(y_val))
    losses_val.append(ls_val)

print(losses_val)
​
print(losses)

It doesn’t seem you are mixing up the variable names, so I’m currently unsure what might be wrong.
Could you print the shapes of xv_cat, xv_cont, preds, and y_val?
Are you using some view operation inside your model with a fixed batch size?

Unrelated to this issue, but some minor issues:

  • it looks like you are using the Variable class to wrap your tensors, which is deprecated since PyTorch 0.4. If you are using a newer version, you can just use tensors now
  • you are appending the losses in the training and validation loss, which are attached to the computation graph. This will increase your memory usage, so that you might run out of memory after a while. If you don’t need to call backward on these losses anymore and just store them for debugging purposes, use losses.append(ls.item()) instead.
  • to save some more memory, warp your validation loop in torch.no_grad(), which will avoid to store the intermediate activations, which are needed for backpropagation:
with torch.no_grad():
    net.eval()
    ...