Can torch.nn.AdaptiveLogSoftmaxWithLoss specifies a target value that is ignored and does not contribute to the input gradient like torch.nn.CrossEntropyLoss?

cswangjiawei · May 10, 2019, 7:31am

Can torch.nn.AdaptiveLogSoftmaxWithLoss specifies a target value that is ignored and does not contribute to the input gradient like torch.nn.CrossEntropyLoss? In many cases, we need to pad the text so that all the sequences are the same length, so we can process them in batch. So I think specifying a target value that is ignored and does not contribute to the input gradient is necessary.