I am implementing a dependency parsing model using PyTorch and little bit confused about the situation that I explained below.

When calculating loss and backward the model; I tried different things.

- When I use the code below exactly, and make batch size 1 (1 batch in iteration):
- Loss looks like decreasing, however the predictions are not getting well after 20 epochs.

- When I use the code below exactly, and make batch size 100(0):
- I get an error:
`RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.`

- As the error message said, when I use inside_loss.backward(retain_graph=True) instead of inside_loss.backward(), the execution tooks too long. And the loss is both
**increasing**and**decreasing**randomly.

- I get an error:
- When I comment out the inside_loss and uncomment the lines after the for loop:
- The loss is not changing.

The code is here:

```
def forward(self, x,):
# x is the scores of my vocab. It has grad so cannot change its values.
# So take the clone
x_prime = x.clone()
# loss = Variable(torch.zeros(1), requires_grad=True)
loss_v = torch.zeros(1)
for i in range(x_prime.size(0)):
# Some operations that changes x_prime's values
# Calculate sentence_probs and sentence_scores from x_prime's values
eisner_values = eisner_torch(sentence_probs)
# Changed x_prime above; so update the x
x = x_prime
# Get the gold dependencies
gold_deps = return_gold_deps(i, sentence_to_dependencies)
if gold_deps is None:
continue
mask = np.greater(np.asarray(eisner_values), -1)
# Calculate hinge loss
inside_loss = hinge(sentence_scores, eisner_values, gold_deps, mask, 1)
# Calculate total loss in the batch
loss_v += inside_loss.data
inside_loss.backward()
# Optimizer step
if self.opt is not None:
self.opt.step()
self.opt.optimizer.zero_grad()
# loss.data = loss_v.data
# loss.backward()
# Optimizer Step
return loss_v
```

I use the Adam optimizer for this task:

```
model_opt = NoamOpt(model_size=d_model, factor=1, warmup=200,
torch.optim.Adam(model.parameters(), lr=0, betas=(0.9, 0.98), eps=1e-9))
```

What are the problems in here?

How can I solve that issue?

Thanks in advance.