I have a sequence to sequence model with `nn.Transformer`

and in my training loop, I have:

```
for i in range(x.size(1)):
pred = model(x, tgt[:, 0:i + 1, :])
loss = loss_fn(pred, ground_truth[:, 0:i + 1, :])
loss.backward()
optimizer.step()
total_loss += loss.item()
cnt += 1
return total_loss / cnt
```

Where `x.size()`

is `[batch, seq, features]`

. So my sequence increases in each iteration, but of course, this takes a LONG time Can this be done in parallel somehow?