# With a Transformer, can teacher forcing be done in Parallel?

I have a sequence to sequence model with `nn.Transformer` and in my training loop, I have:

``````        for i in range(x.size(1)):
pred = model(x, tgt[:, 0:i + 1, :])
loss = loss_fn(pred, ground_truth[:, 0:i + 1, :])
loss.backward()
optimizer.step()
total_loss += loss.item()
cnt += 1
Where `x.size()` is `[batch, seq, features]`. So my sequence increases in each iteration, but of course, this takes a LONG time Can this be done in parallel somehow?