That’s because you’re still returning “loss” which unless you have torch.no_grad()
holds onto a lot of data so that you can call loss.backward()
in the future.
Put everything in a function (including the GPU copies of input and targets) or use del
.
...
def step(input, targets):
inputs, targets = inputs.to(device), targets.to(device)
optimizer.zero_grad()
loss, predicted_output = forward_pass(model, inputs, targets, loss_criterion)
error_ = error(predicted_output.data.cpu(), targets.data.cpu().long().squeeze())
running_loss.update(loss.item(), inputs.size(0))
running_error.update(error_.item(), inputs.size(0))
for i in range(50):
for idx, (inputs, targets) in enumerate(loaders):
step(input, targets)