Regarding softmax layer

Ganga · September 15, 2020, 11:40am

how can i find whether my layer has softmax or not?

ptrblck · September 17, 2020, 8:07am

nn.CrossEntropyLoss applies F.log_softmax internally on the input. The usual “layers” such as nn.ConvXd, nn.Linear etc. are not applying any non-linearity for you.
The same does of course not apply for custom user-defined layers.
If you are unsure about a specific layers, please refer to the docs, which would mention if an activation function is applied internally.

Ganga · September 20, 2020, 8:12am

Thanks @ptrblck , I read the docs of my github repo they didnt mention about non-linearity function applied internally.
Can you please once go through my github repo code to have a glance whether my softmax function applied to last layer

I am using Imagenet dataset

ptrblck · September 21, 2020, 5:49am

In the ImageNet example I cannot find the usage of a softmax and nn.CrossEntropyLoss is used, which looks right.

Ganga · September 22, 2020, 3:12am

Thanks @ptrblck for your reply, from this conversation I came to know that softmax is used only for calculating the CrossEntropyLoss rather than classification

Can I print or get the softmax output which is used in CrossEntropyLoss just for curiosity

ptrblck · September 22, 2020, 4:56am

Yes, you can add a softmax or log_softmax operation and e.g. print the output values.
As long as you don’t feed these values into nn.CrossEntropyLoss there won’t be a problem.

Ganga · September 22, 2020, 6:50am

I have sent my output to softmax function and I am getting positive value in range(0,1)
I have inference code for taking multiple images in a folder, there I don’t have loss function(Cross entropy) neither Softmax Do I want to add cross entropy in my inference code? or it is ok
By adding softmax in my inference code will there be any change in my prediction or its(softmax) just used to convert logit into probability?

here is the code for Inference

    import torchvision
    import torchvision.transforms as transforms
    import torchvision.datasets as datasets
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                std=[1./255., 1./255., 1./255.])

    torchvision.set_image_backend('PIL')

    data_transforms = {
    'predict': transforms.Compose([
        transforms.Resize((256, 256)),
        transforms.CenterCrop(input_size),
        transforms.ToTensor(),
        normalize,
        ])
    }

    dataset = {'predict' : datasets.ImageFolder("/content/XNOR-Net-PyTorch/ImageNet/networks/", data_transforms['predict'])}
    dataloader = {'predict': torch.utils.data.DataLoader(dataset['predict'], batch_size=args.batch_size, shuffle=False, num_workers=args.workers,pin_memory=True)}
batch_time = AverageMeter()
    losses = AverageMeter()
    top1 = AverageMeter()
    top5 = AverageMeter()

    model.eval()
    end = time.time()
    global bin_op
    bin_op = util.BinOp(model)
    bin_op.binarization()
    #input=img

    for input, labels in dataloader['predict']:
      with torch.no_grad():
        input_var = torch.autograd.Variable(input)
        
        
    # compute output
      output = model(input_var)

ptrblck · September 22, 2020, 7:23am

That sounds right.
If you want to calculate the “test loss” (and have targets during inference), you can use the criterion to calculate it. Otherwise, if you just want to predictions, it’s not necesary.
The predictions won’t change and you can get the predicted class index via torch.argmax(output), where output can be the logits or the probabilities.

Ganga · September 22, 2020, 8:01am

thanks @ptrblck, for your all replies

prya · May 4, 2021, 5:52am

Referring to the example by @Ganga, I understand that during training,
the cross entropy loss is obtained and in that function softmax is also calculated before getting the loss.

However, during model inferencing, there is no explicit usage of softmax in the code but it output from model(input) gives probabilities?

How does the model give probabilities? Where is the softamx declared that when the model is called (for inferencing) softmax is executed on the logits?

ptrblck · May 4, 2021, 6:02am

You are correct that nn.CrossEntropyLoss will internally apply F.log_softmax and thus no softmax activation is used in the model.
The model thus outputs logits which have values in the range [-Inf, +inf].
If you want to calculate the probability for each class during inference, you can apply F.softmax on the ouput and process them further (just don’t calculate the loss with them).
However, to get the predicted classes, you could use torch.argmax(output, dim=1), which will return the same predicted class index using the logits or probabilities, since the softmax will not change the order of logits and probabilitiies, and the highest logit will get the highest probability.

pdc_87 · December 16, 2022, 11:45pm

Hi @ptrblck Is there any advantage to activate the last layer with softmax if I already used CrossEntropyLoss as loss function? Internallly, it uses log_softmax. What do you think?

ptrblck · December 17, 2022, 6:31am

No, using softmax on your outputs and passing it to nn.CrossEntropyLoss is wrong and might stall your training. As described before: you could still use it in case you want to print the probabilities or use them in another way, but don’t calculate the loss with the softmax output using nn.CrossEntropyLoss or nn.NLLLoss.