I am working on image captioning task with PyTorch.
In seq2seq, padding is used to handle the variable-length sequence problems.
Additionally, mask is multiplied by the calculated loss (vector not scalar) so that the padding does not affect the loss.
In TensorFlow, i can do this as below.
# targets is an int64 tensor of shape (batch_size, padded_length) which contains word indices. # masks is a tensor of shape (batch_size, padded_length) which contains 0 or 1 (0 if pad otherwise 1). outputs = decoder(...) # unnormalized scores of shape (batch_size, padded_length, vocab_size) outputs = tf.reshape(outputs, (-1, vocab_size)) targets = tf.reshape(targets, (-1)) losses = tf.nn.sparse_softmax_cross_entropy_loss(outputs, targets) # loss of shape (batch_size*padded_length) masks = tf.reshape(masks, (-1)) loss = losses * masks
nn.CrossEntropyLoss() returns a scalar not tensor so that i can not multiply loss by masks.
criterion = nn.CrossEntropyLoss() outputs = decoder(features, inputs) # (batch_size, padded_length, vocab_size) loss = criterion(outputs.view(-1, vocab_size), targets.view(-1)) # this gives a scalar not tensor
How can i solve this problem?