Is my validation loss code correct? How to early stop and prevent overfitting?

Hi,
I am a new to pytorch. It will be helpful if you can comment on the below code, if I have done a mistake or if there is any better way to do it.
Also how to stop and save the best model?

for epoch in range(num_epochs):
  batch_idx=batch_idx+1
  train_loss = 0.
  full_val_loss=0.
  for x,y,y2 in loader:
    optimizer.zero_grad()
    x = embedding(x).to(device)
    input_size= x.shape[2]
    output1 = model(x)
    loss = criterion1(output1,y)
    loss.backward()
    
    optimizer.step()
    train_loss = train_loss + ((1 / (batch_idx + 1)) * (loss.data - train_loss))

  model.eval()
  with torch.no_grad():
    for x,y in val_loader:
      x = embedding(x).to(device)
      output1 = model(x)
      val_loss = criterion1(output1,y)
      full_val_loss = full_val_loss + ((1 / (batch_idx + 1)) * (val_loss.data - full_val_loss))
  
  model.train()
  print('Epoch [{}/{}], Loss: {:.4f} , val_loss: {:.4f} '.format(epoch+1, num_epochs, train_loss,full_val_loss))

What is y2 for in the train loop?

Why no torch.no_grad()? I thought it was useful to have in evaluation mode since it prevents the autograd engine from creating the graph for the backwards pass, so you save memory when you do not need it.

I thought that the graph is destroyed after you call .backward() and then created again next time you call forward().

Yes sorry I said that wrong. The graph is destroyed when you call .backward() using torch no grad may lower your gpu usage slightly but not a lot. I have tried both ways and for some reason my gpu memory has never really been different. You can keep it in however if it impacts your performance. I am still confused what y2 does though.