Why do we need to call `torch.no_grad()` for validation loop even after calling `loss.backward()`?

Cyber_punk · May 15, 2023, 1:05pm

The typical Pytorch training loop is:

for epoch in range(n_epochs):

	# Training
	for data in train_dataloader:
		input, targets = data
		optimizer.zero_grad()
		output = model(input)
		train_loss = criterion(output, targets)
		train_loss.backward()
		optimizer.step()

	# Validation
	with torch.no_grad():
		for input, targets in val_dataloader:
            output = model(input)
            val_loss = criterion(output, targets)

Since the gradients are computed by train_loss.backward(), why do we still need to use with torch.no_grad() for the validation part? Isn’t the purpose of torch.no_grad() that “Disabling gradient calculation”?

ptrblck · May 15, 2023, 2:13pm

Yes, no_grad will disable the gradient calculation and will reduce the memory usage by deleting intermediate activations directly, which would be needed for the backward pass otherwise.
Since you are never calling backward() during the validation phase no_grad() is an optional optimization.