I have a sequence to sequence model with nn.Transformer
and in my training loop, I have:
for i in range(x.size(1)):
pred = model(x, tgt[:, 0:i + 1, :])
loss = loss_fn(pred, ground_truth[:, 0:i + 1, :])
loss.backward()
optimizer.step()
total_loss += loss.item()
cnt += 1
return total_loss / cnt
Where x.size()
is [batch, seq, features]
. So my sequence increases in each iteration, but of course, this takes a LONG time Can this be done in parallel somehow?