Detaching tensors from graph on the fly

Hey, I am not sure which is the best practice.

Say I have a standard training loop, and want to save my predictions to a list (that I will concatenate at the end to get some prediction metrics). Should my output go through .detach().cpu() at each batch, or is it OK to do it at the very end when I concatenate them ?

e.g. :

Version A:

x_list = []
for batch in batches:
   batch = batch.to(device)
   x_out = model(batch)
   loss = criterion(x_out, batch)
   optimizer.zero_grad()
   loss.backward()
   optimizer.step()
   x_list.append(x_out.detach().cpu())
x_list = torch.cat(x_list)

or
Version B

x_list=[]
for batch in batches:
   batch = batch.to(device)
   x_out = model(batch)
   loss = criterion(x_out, batch)
   optimizer.zero_grad()
   loss.backward()
   optimizer.step()
   x_list.append(x_out)
x_list = torch.cat(x_list).detach().cpu()

I assume version A is more memory efficient on a GPU but might slow down my code due to the non stop copy operation ?

You should .detach() the tensor before appending it to a list as otherwise the output could still keep the computation graph alive thus increasing the memory usage. You don’t necessarily need to move the data back to the CPU as I assume x_out is not huge.