@ptrblck thank you for your response. My confusion roots from the fact that Tensorflow allow us to use softmax in conjunction with BCE loss. Yes, I have 4-class classification problem. I have 1000 batch size and 100 sequence length. And the last dimension corresponds to the multi-class probability. If I use sigmoid I need it only on the third dimension. nn.CrossEntropy won’t be applicable as the dimensions are not right. How should I proceed in this case?