Graph attention network normally dose not support input to be a batch, I want to know that whether I can implement stochastic gradient descent by feed one data at one time, accumulate the loss and finally divide the loss by the batch_size that I define myself? Does it reach the same goal as input as a batch?
It should be like this:
### method1
batch_size = 64
loss_batch = 0
for i in range(batch_size):
output = model(data) # data.shape(224,224,3)
loss = calculate the loss for the output
loss_batch = loss_batch + loss
loss_batch = loss_batch / batch_size
loss_batch.backward()
### method2
output = model(data_batch) # data_batch.shape(batch_size, 224,224,3)
loss_batch = calculate the loss for the output
loss_batch.backward()
Yes, though your method1 is bad, it is better to accumulate gradients with loss.backward() in a loop, this releases “backward graphs”. But you should use method2, unless there are obstacles with some unimplemented batched operations.
Thanks for your reply.
I still have one question though.
If I accumulate the gradients with loss.backward() for n times and update the parameters with one opt.step(). Is it equivalent to n times of loss.backward() and n times of opt.step() with batch_size = 1. Or one loss.backward() and opt.step() with batch_size=n?