Hi Nil!

Presumably elements of your `gradient`

array share parts of the same

*computation graph.* That is, the computation of `gradient[0]`

and, for

example, `gradient[1]`

partially overlap. Calling `gradient[0].backward()`

deletes `gradient[0]`

's computation graph, including any parts of it that

are shared by `gradient[1]`

's computation graph. So when you then call

`gradient[1].backward()`

, parts of `gradient[1]`

's computation graph

have been deleted, leading to the error you report.

`retain_graph = True`

tells autograd not to delete the computation graph,

so you could do something like:

```
optim.zero_grad()
for i, grad in enumerate (gradient): # gradient is a 50 dimensional array
loss = grad
if i < 49:
loss.backward (retain_graph = True)
else:
loss.backward()
optim.step()
```

(The final call to `loss.backward()`

does not have `retain_graph = True`

because you do need to delete computation graph at some point, typically

before calling `optim.step() and / or performing the next forward pass.)

This is a perfectly reasonable way to use autograd and `.backward()`

.

However, it’s likely to be inefficient, because you repeat (the shared part

of) the backward pass fifty times.

`loss.backward()`

computes the gradient of `loss`

with respect to the

parameters on which `loss`

depends and accumulates that gradient into

those parameters’ `.grad`

properties. But computing the gradient is a linear

operation (so that `grad_of (a + b) = grad_of (a) + grad_of (b)`

).

So you are likely better off with:

```
optim.zero_grad()
loss_total = 0
for grad in gradient:
loss_total = loss_total + grad
loss_total.backward()
optim.step()
```

This only performs a single backward pass (rather than fifty) and, up to

numerical round-off error, computes the same final gradient (as stored in

the various parameters’ `.grad`

properties) as does the version that called

`.backward()`

fifty times.

As an aside, you will probably also achieve additional efficiency (and code

cleanliness) if you can arrange your computation so that `gradient`

is a

single one-dimensional pytorch tensor of length fifty that is computed all at

once with pytorch tensor operations rather than an array of fifty length-one

pytorch tensors that is computed entry by entry.

Best.

K. Frank