Hi I have received a model from someone else, and I am trying to train it with our flow.
For this model, the most convenient way is to generate one label for a single sample. So I am doing something like this:
total_loss=0
for i in range(len(dataset)):
pred = model(dataset[i])
gt = labels[i]
if i%batch_size==0:
total_loss.backward()
optimizer.step()
total_loss = 0
else:
total_loss += CrossEntropy(pred, gt)
I understand it is not the best practice to accumulate loss in this way. My observation is : when I used batch_size larger than 4, it will run for several epochs and then crash with CUDA Out of Memory error.
Is there a way in pytorch to get around this?
I found some solutions online which is to add “.item()” to total_loss, but I think it will make this node undifferentiable.