Batch gradient descent (Vanilla)


I would like to use the batch gradient version (BGD) and I am not sure to understand how to use it in pyTorch (yes, I already search on this forum but I still not understand).
The SGD implementation is a single step implementation but the user has to select randomly the data point. So is it true to say that the BGD is the SGD minibatch with batch_size equals to the number of data points ?

If it is true, I have the following implementation in pyTorch

trainloader =, batch_size=len(src_data))
optimizer   = SGD(model.parameters(), lr=0.1)
criterion   = MSELoss(reduction='sum')
for epoch in range(epoch):
   for inputs, labels in trainloader:
      outputs  = model(inputs)
      loss     = criterion(outputs, labels)

but results are quite bad (especially with the same learning used for SGD).
If my implementation is correct, I guess that BGD misses the/a local optimum and go further.

Even the definition of SGD is applicable to one data point but in Pytorch, as you said, SGD with batch_size > 1 will be similar to BGD.

Your code is, I think, okay. However, could you check your code part of optimizer. It seems model.parameters() is missing in SGD initialization. Please try to reduce the lr to see the result as well. lr=1e-3 would be a good init try.

Thank you!
I forgot in this snippet code (I just edited it) but I did not in my code.

I will try lower lr :wink: