Maybe I’m doing something wrong here, but using gradient clipping like

```
nn.utils.clip_grad_norm(model.parameters(), clip)
for p in model.parameters():
p.data.add_(-lr, p.grad.data)
```

makes my network train much slower than with `optimizer.step()`

.

Here’s what it looks like with gradient clipping, with `clip=5`

:

```
Epoch: 1/10... Step: 10... Loss: 4.4288
Epoch: 1/10... Step: 20... Loss: 4.4274
Epoch: 1/10... Step: 30... Loss: 4.4259
Epoch: 1/10... Step: 40... Loss: 4.4250
Epoch: 1/10... Step: 50... Loss: 4.4237
Epoch: 1/10... Step: 60... Loss: 4.4223
Epoch: 1/10... Step: 70... Loss: 4.4209
Epoch: 1/10... Step: 80... Loss: 4.4193
Epoch: 1/10... Step: 90... Loss: 4.4188
Epoch: 1/10... Step: 100... Loss: 4.4174
```

And without gradient clipping, everything else equal:

```
Epoch: 1/10... Step: 10... Loss: 3.2837
Epoch: 1/10... Step: 20... Loss: 3.1901
Epoch: 1/10... Step: 30... Loss: 3.1512
Epoch: 1/10... Step: 40... Loss: 3.1296
Epoch: 1/10... Step: 50... Loss: 3.1170
Epoch: 1/10... Step: 60... Loss: 3.0758
Epoch: 1/10... Step: 70... Loss: 2.9787
Epoch: 1/10... Step: 80... Loss: 2.9104
Epoch: 1/10... Step: 90... Loss: 2.8271
Epoch: 1/10... Step: 100... Loss: 2.6813
```

There is probably something I don’t understand, but I’m just switching out those two bits of code.