Question about Dealing with pad token when computing loss

Good morning.

I’m training image captioning model and just wondered if there’s any different between those two code.

# index 0 is for pad token
criterion = nn.CrossEntropyLoss(ignore_index=0)
'computing loss'
loss = criterion(pred, target)
loss.backward()
optimizer.step()
criterion = nn.CrossEntropyLoss()
'computing loss'
pad_location = torch.ne(target, 0)
loss = criterion(pred, target)
loss *= pad_location
loss.backward()
optimizer.step()

Thanks