Hi I’m new to pytorch.
If I do
loss.backward for every 0.1% of my training set and do
optimizer.step for every 1% of my training set, what could be the problem?
Due to characteristic of my training data, I write my code as below
BATCH_SIZE = parameter from user for epoch in range(1,10001): i=0 (..........) for ..... in training_generator: (.......) loss.backward() i+=1 if i % BATCH_SIZE == 0 : optimizer.step() optimizer.zero_grad()
As far as I know,
loss.backward accumulate gradient in summation. This kind of accumulation of gradient is okay? Is there any way to divide gradient by
In my case, I use Adam optimizer.