When training nn models using Pytorch, is there a difference regarding where we place the backward method? For example, which one of below is correct?

Calculate gradient across the batch:

```
for e in range(epochs):
for i in batches_list:
out = nn_model(i)
loss = loss_function(out, actual)
loss_sum += loss.item()
lstm.zero_grad()
loss.backward()
optimizer.step()
loss_list.append(loss_sum / num_train_obs)
```

Calculate gradient across the epoch:

```
for e in range(epochs):
for i in batches_list:
out = nn_model(i)
loss = loss_function(out, actual)
loss_sum += loss.item()
lstm.zero_grad()
loss_sum.backward()
optimizer.step()
loss_list.append(loss_sum / num_train_obs)
```