How to use/train Transformer in Pytorch

dav-ell · April 8, 2020, 2:06pm

Answered my own question on this thread.

The code I replace it with is like this:

# Model requires both "inputs" and "targets"
for i in range(2, targets.size(1)):
  opt.zero_grad()
  trimmed_tgt = targets[:, :i].contiguous()
  in_tgt = trimmed_tgt[:, :-1]
  exp_tgt = trimmed_tgt[:, 1:]
  # Some code missing here, assume in_tgt gets converted to in_tgt_emb
  out = model(inp_emb, in_tgt_emb, tgt_mask=tgt_mask, 
src_key_padding_mask=inp_padding_mask, tgt_key_padding_mask=tgt_padding_mask)
  loss = criterion(out, exp_tgt)
  loss.backward()
  opt.step()
  sch.step()

Not sure if I’m supposed to accumulate losses in the loop there or not, but this seems to be getting more realistic results than I was getting before.