Please see the following thread for an implementation:

You should use LogSoftmax. You have to pass the output of Softmax
through log() anyway to calculate the cross entropy, and the
implementation of LogSoftmax is numerically more stable than (the
mathematically, but not numerically equivalent) log (Softmax).

If you don’t naturally have soft target labels (probabilities across the
classes), I don’t see any value in ginning up soft labels by adding
noise to your 0, 1 (one-hot) labels. Just use CrossEntropyLoss
with your hard labels.

(If your hard labels are encoded as 0, 1-style one-hot labels you will
have to convert them to integer categorical class labels, as those are
what CrossEntropyLoss requires.)

I want to use smooth labeling with the criterion=nn.CrossEntropyLoss() with batch size of 64. The labels are random number between 0.8 to 0.9 and the outputs are from sigmoid. The code is

label=(0.9-0.8)* torch.rand(b_size) + 0.8
label=label.to(device).type(torch.LongTensor)
# Forward pass real batch through D
netD=netD.float()
output = netD(real_cpu).view(-1)
# Calculate loss on all-real batch
output1=torch.zeros(64,64)
for ii in range(64):
output1[:,ii]=ii
for ii in range(64):
output1[ii,:]= output[ii].type(torch.LongTensor)
errD_real = criterion(output1, label)

and the error is:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

By applying (torch.LongTensor) all the labels and output become 0! and without (torch.LongTensor) it gave me error.

This is probably late to answer this. I am also not sure if it would work, but what if you try inserting a manual cross-entropy function inside the forward pass…
soft loss= -softlabel * log(hard label)

then apply hard loss on the soft loss the
which will be loss = -sum of (hard label * soft loss)
…but then you will have to make the softloss exp(loss)…to counteract repetitively log function.
I wonder how it would turn out.