Hi,
I’m implementing a network, that in the forward loop does some computation with some intermediate variables, that are deleted after using del
. However, during the training phase, even if I’m using a batch size of 1, I got an out of memory error, and I don’t know the cause of this, but I’m guessing it’s related to gradient, because in the validation phase, I don’t have any error.
My code looks like:
C = my_subnetwork1(input_data)
# allocated memory 9 Gb | used memory 0.1 Gb
for i in range(5):
D = some_arithmetic_computation(C, i)
C = torch.softmax(D, 0)
del D
return C
however, If I inspect the allocated memory and the used memory at the end of each step of the loop, the used memory keeps increasing if I’m in the training mode, it grows like this: 1 Gb -> 1.97 -> 3.02 -> 4.2.
Can you point me to a direction to solve this?
Thank you in advance!