Hi I’m new to pytorch.
If I do loss.backward
for every 0.1% of my training set and do optimizer.step
for every 1% of my training set, what could be the problem?
Due to characteristic of my training data, I write my code as below
BATCH_SIZE = parameter from user
for epoch in range(1,10001):
i=0
(..........)
for ..... in training_generator:
(.......)
loss.backward()
i+=1
if i % BATCH_SIZE == 0 :
optimizer.step()
optimizer.zero_grad()
As far as I know, loss.backward
accumulate gradient in summation. This kind of accumulation of gradient is okay? Is there any way to divide gradient by BATCH_SIZE
?
In my case, I use Adam optimizer.