I have a sequence to sequence model with
nn.Transformer and in my training loop, I have:
for i in range(x.size(1)): pred = model(x, tgt[:, 0:i + 1, :]) loss = loss_fn(pred, ground_truth[:, 0:i + 1, :]) loss.backward() optimizer.step() total_loss += loss.item() cnt += 1 return total_loss / cnt
[batch, seq, features]. So my sequence increases in each iteration, but of course, this takes a LONG time Can this be done in parallel somehow?