Hi, I’d like to ask how to store cuda tensors without the need for I/O from GPU at the end of every training step.
Clearly below shows a negative example of how things should be done
# We assume that loss_history is an array
# and loss is a cuda tensor with size of [1]
loss_history.append(loss.item())
Does the following implementation avoid the I/O problem?
loss_history += [loss]
Please advice! Sorry for being a pytorch noob!