Regarding softmax layer

  • how can i find whether my layer has softmax or not?

nn.CrossEntropyLoss applies F.log_softmax internally on the input. The usual “layers” such as nn.ConvXd, nn.Linear etc. are not applying any non-linearity for you.
The same does of course not apply for custom user-defined layers.
If you are unsure about a specific layers, please refer to the docs, which would mention if an activation function is applied internally.

Thanks @ptrblck , I read the docs of my github repo they didnt mention about non-linearity function applied internally.
Can you please once go through my github repo code to have a glance whether my softmax function applied to last layer

I am using Imagenet dataset

In the ImageNet example I cannot find the usage of a softmax and nn.CrossEntropyLoss is used, which looks right.

Thanks @ptrblck for your reply, from this conversation I came to know that softmax is used only for calculating the CrossEntropyLoss rather than classification

Can I print or get the softmax output which is used in CrossEntropyLoss just for curiosity

Yes, you can add a softmax or log_softmax operation and e.g. print the output values.
As long as you don’t feed these values into nn.CrossEntropyLoss there won’t be a problem.

  • I have sent my output to softmax function and I am getting positive value in range(0,1)

  • I have inference code for taking multiple images in a folder, there I don’t have loss function(Cross entropy) neither Softmax Do I want to add cross entropy in my inference code? or it is ok

  • By adding softmax in my inference code will there be any change in my prediction or its(softmax) just used to convert logit into probability?

here is the code for Inference

    import torchvision
    import torchvision.transforms as transforms
    import torchvision.datasets as datasets
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                std=[1./255., 1./255., 1./255.])


    data_transforms = {
    'predict': transforms.Compose([
        transforms.Resize((256, 256)),

    dataset = {'predict' : datasets.ImageFolder("/content/XNOR-Net-PyTorch/ImageNet/networks/", data_transforms['predict'])}
    dataloader = {'predict':['predict'], batch_size=args.batch_size, shuffle=False, num_workers=args.workers,pin_memory=True)}
batch_time = AverageMeter()
    losses = AverageMeter()
    top1 = AverageMeter()
    top5 = AverageMeter()

    end = time.time()
    global bin_op
    bin_op = util.BinOp(model)

    for input, labels in dataloader['predict']:
      with torch.no_grad():
        input_var = torch.autograd.Variable(input)
    # compute output
      output = model(input_var)
1 Like
  1. That sounds right.

  2. If you want to calculate the “test loss” (and have targets during inference), you can use the criterion to calculate it. Otherwise, if you just want to predictions, it’s not necesary.

  3. The predictions won’t change and you can get the predicted class index via torch.argmax(output), where output can be the logits or the probabilities.

thanks @ptrblck, for your all replies

Referring to the example by @Ganga, I understand that during training,
the cross entropy loss is obtained and in that function softmax is also calculated before getting the loss.

However, during model inferencing, there is no explicit usage of softmax in the code but it output from model(input) gives probabilities?

How does the model give probabilities? Where is the softamx declared that when the model is called (for inferencing) softmax is executed on the logits?

You are correct that nn.CrossEntropyLoss will internally apply F.log_softmax and thus no softmax activation is used in the model.
The model thus outputs logits which have values in the range [-Inf, +inf].
If you want to calculate the probability for each class during inference, you can apply F.softmax on the ouput and process them further (just don’t calculate the loss with them).
However, to get the predicted classes, you could use torch.argmax(output, dim=1), which will return the same predicted class index using the logits or probabilities, since the softmax will not change the order of logits and probabilitiies, and the highest logit will get the highest probability.

1 Like