Hi,

I would like to use the batch gradient version (BGD) and I am not sure to understand how to use it in pyTorch (yes, I already search on this forum but I still not understand).

The SGD implementation is a single step implementation but the user has to select randomly the data point. So is it true to say that the BGD is the SGD minibatch with batch_size equals to the number of data points ?

If it is true, I have the following implementation in pyTorch

```
trainloader = torch.utils.data.DataLoader(src_data, batch_size=len(src_data))
optimizer = SGD(model.parameters(), lr=0.1)
criterion = MSELoss(reduction='sum')
for epoch in range(epoch):
for inputs, labels in trainloader:
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
```

but results are quite bad (especially with the same learning used for SGD).

If my implementation is correct, I guess that BGD misses the/a local optimum and go further.