Hi,
I saw so many methods the training/validation part of a CNN and here is mine actually :
model.train()
for e in range(epoch):
train_sum_loss=0.0
validation_sum_loss=0.0
for inputs, labels in train_loader:
inputs = inputs.to(device)
labels = labels.to(device)
optimizer.zero_grad()
outputs = model.forward(inputs)
batch_loss = loss(outputs.squeeze(),labels)
batch_loss.backward()
optimizer.step()
train_sum_loss += batch_loss.item()
model.eval()
for inputs, labels in validation_loader:
inputs = inputs.to(device)
labels = labels.to(device)
outputs = model.forward(inputs)
batch_loss = loss(outputs.squeeze(),labels)
validation_sum_loss += batch_loss.item()
Results seems good, the network converge, no overfitting.
But if I put the model.train() just before the for training loop (like the validation loop), the NN overfit really fastly.
Which one of these methods is the good one ? On which result I can work ?
The model.train() needs to go there. If you put it outside as in your snippet, the model will only be in training mode for the first epoch. All subsequent epochs will be in evaluation mode.
I apologize. Just double checked the documentation. You do not need to use set_grad_enabled(). torch.no_grad() only sets the requires_grad=False temporarily.
x = torch.randn(3, requires_grad=True)
print(x.requires_grad)
print((x ** 2).requires_grad)
with torch.no_grad():
print((x ** 2).requires_grad)
print((x ** 2).requires_grad)