Non scalar backward and self mini batch implementation

My program is using non scalar backward

and self mini batch implementation like bellow(because each data has huge size)

for i in range(len(batch_list)):

     output = network(batch_list[i])   #non scalar output
     grad = grad_fn(output, teacher[i])  #calc non scalar grad
     torch.autograd.backward([output],[grad])  #non scalar backwarding

optimizer.step()
optimizer.zero_grad() 

It seems my Network isn’t training correctly.

I think the section I cropped has a problem.

(Note that, for the simplicity the code isn’t original.

but it has a essence of my flowchart.)

Are there some weird point ?

In self implementation, how minibatch work?

I think i should divide gradient.

My question relates this one.

#Quote Mr SimonW
All autograd does is just to calculate the gradient, it has no notion of batching, and I don’t see how it can have different behavior with different batching mechanism.

You can divide the grad with the length of batch. It is needed especially when the batches are of different lengths.

If the batches are of same length, dividing by length of batch size can be thought to be absorbed by learning rate anyway, in which case its not needed.

1 Like

I appreciate your reply

In this case

torch.autograd.backward([output],[grad])

Is grad of each data in minibatch accumulated like bellow?

Thanks

Yes. It is accumulated as mentioned in the post.
This code looks fine:

1 Like

Thank you :slight_smile: