I have printed both the Tensor y and DerivateX_g and put that in image for easy purpose.

None of them seem to have a `grad_fn`

.

Did you create these tensors with `requires_grad=True`

?

Yes i created those tensors and gave them requires_grad = True

That wonāt work and would fit the 3rd point I mentioned.

You would need to create the computation graph with differentiable operations, which will create a result tensor with a valid `grad_fn`

.

How to do that exaclty ?

Here is a simple code snippet:

```
x = torch.randn(1, 1, requires_grad=True)
lin = nn.Linear(1, 1) # your model or manual operations
out = lin(x)
print(out.grad_fn)
out.backward()
```

Your model might of course be more complicated.

okay , but my GAN Model is working with BCELoss but when I trying to use my own loss Function ie (WGANLOSS) it is not working only.

All the inbuilt loss functions work but not my own WGANLOSS function that i have created.

the loss function i am using is : total_loss = (torch.mean(h_fake) - torch.mean(h_real))

Could you check the `grad_fn`

attributes of `h_fake`

, `h_real`

, and `total_loss`

?

If thatās the complete loss calculation, it should not detach anything from the computation graph.

this worked for me in March2020

This just happened for me,

What i did wrong was

Instead of passing

`logits = model(images)`

in `loss=criterion(logits, labels)`

I was passing directly `loss=criterion(images, labels)`

In my case, the code that was supposed to compute some gradients was wrapped in:

```
with torch.no_grad():
# some code
```

Removing the expression did the trick.

Yep

I wrote it foreahead:

```
with torch.no_grad():
```

Hi

I have same issue, but my network is not really a simple neural net. I mean, in my case the output of network has been passed to other functions and the loss has been calculated with respect to output of these functions. I have implemented these functions and I am using MSELoss. but I get this error.

can any one help me?

In my case, too. I spend several hours to fix this issue and very glad that your snippet made my day!.

HI Ptrblck,

I want to use smooth labeling with the criterion=nn.CrossEntropyLoss() with batch size of 64. The labels are random number between 0.8 to 0.9 and the outputs are from Sigmoid. The code is

```
b_size=64
label=(0.9-0.8)* torch.rand(b_size) + 0.8
label=label.to(device).type(torch.LongTensor)
# Forward pass real batch through D
netD=netD.float()
output = netD(real_cpu).view(-1)
# Calculate loss on all-real batch
output1=torch.zeros(64,64)
for ii in range(64):
output1[:,ii]=ii
for ii in range(64):
output1[ii,:]= output[ii].type(torch.LongTensor)
errD_real = criterion(output1, label)
```

and the error is:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

By applying (torch.LongTensor) all the labels and output become 0!

You wonāt be able to convert your `output`

, which is a floating point tensor to a `LongTensor`

without detaching it from the computation graph, since gradient calculations are only implemented for floating point values.

If you want to apply label smoothing, I would recommend to have a look at this post.

Many thanks for your help.

My case indeed is a Binary classification with 0 1and 1 label. But, I want to use smooth labeling instead of 1 having random number from 0.8 to 0.9 and instead of 0 have labels between 0.01 to 0.3.

How I can used the posted code as you suggested? I have 64 labels i think the number of class will be 64 for me because labels are completely different ? Or would u recommend any other loss function which accept float target? Or any cross entropy code which accept float target ?

if I remove the .type(torch.LongTensor) it gave me error that expected long but get float .

Iām not how, how your use case is a binary classification, if you are dealing with a number of classes of 64?

Anyway, I just tried the linked code and itās working. Are you seeing any issues with it?

```
criterion = LabelSmoothingLoss(64, smoothing=0.1)
x = torch.randn(10, 64, requires_grad=True)
y = torch.randint(0, 64, (10,))
loss = criterion(x, y)
```

Hey,

Iām having a similar problem with this `RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn`

error.

Iām doing some facial analysis with this network. I have a bit of experience working in pytorch, but Iām currently out of my depth. Since Iām doing this as part of a larger project, I decided to wrap the FAN in a Pytorch Lightning module which, as far as I know, shouldnāt affect how freezing params works. I mention it in case there is a problem with wrapping a network like this:

```
class LightningFAN(LightningModule):
def __init__(self):
LightningModule.__init__(self)
self._fan = FAN(num_modules=4)
...
def train_freeze(self):
for i, param in enumerate(self._fan.parameters()):
param.requires_grad = False
trainable = [self._fan.l3, self._fan.m3, self._fan.top_m_3, self._fan.conv_last3, self._fan.bn_end3]
for module in trainable:
for param in module.parameters():
param.requires_grad = True
def forward(self, x):
return self._fan.forward(x)
...
def other_lightning_module_methods():
pass
...
def configure_optimizers(self):
optimizer = RMSprop(self.parameters(), lr=self.hparams.learning_rate, weight_decay=0.0)
return optimizer
```

When I call the train method on this, it errors. If I donāt freeze anything it works fine.

As a test, I started unfreezing layers until it trained correctly. I wrote a quick function to see which modules were frozen without problems

```
for key, val in self._fan.__dict__["_modules"].items():
print("Key: ", key)
try:
print("Weight Requires Grad: ", val.weight.requires_grad)
except Exception:
pass
try:
print("Weight Grad Function: ", val.weight.grad_fn, val.weight.grad)
except Exception:
pass
try:
print("Bias Requires Grad: ", val.bias.requires_grad)
except Exception:
pass
try:
print("Bias Grad Function: ", val.bias.grad_fn, val.bias.grad)
except Exception:
pass
print()
```

And as an output I get

```
Key: conv1
Weight Requires Grad: False
Weight Grad Function: None None
Bias Requires Grad: False
Bias Grad Function: None None
Key: bn1
Weight Requires Grad: False
Weight Grad Function: None None
Bias Requires Grad: False
Bias Grad Function: None None
Key: conv2
Key: conv3
Key: conv4
Key: m0
Key: top_m_0
Key: conv_last0
Weight Requires Grad: False
Weight Grad Function: None None
Bias Requires Grad: False
Bias Grad Function: None None
Key: bn_end0
Weight Requires Grad: False
Weight Grad Function: None None
Bias Requires Grad: False
Bias Grad Function: None None
Key: l0
Weight Requires Grad: False
Weight Grad Function: None None
Bias Requires Grad: True
Bias Grad Function: None None
Key: bl0
Weight Requires Grad: True
Weight Grad Function: None None
Bias Requires Grad: True
Bias Grad Function: None None
Key: al0
Weight Requires Grad: True
Weight Grad Function: None None
Bias Requires Grad: True
Bias Grad Function: None None
Key: m1
Key: top_m_1
Key: conv_last1
Weight Requires Grad: True
Weight Grad Function: None None
Bias Requires Grad: True
Bias Grad Function: None None
Key: bn_end1
Weight Requires Grad: True
Weight Grad Function: None None
Bias Requires Grad: True
Bias Grad Function: None None
Key: l1
Weight Requires Grad: True
Weight Grad Function: None None
Bias Requires Grad: True
Bias Grad Function: None None
Key: bl1
Weight Requires Grad: True
Weight Grad Function: None None
Bias Requires Grad: True
Bias Grad Function: None None
Key: al1
Weight Requires Grad: True
Weight Grad Function: None None
Bias Requires Grad: True
Bias Grad Function: None None
Key: m2
Key: top_m_2
Key: conv_last2
Weight Requires Grad: True
Weight Grad Function: None None
Bias Requires Grad: True
Bias Grad Function: None None
Key: bn_end2
Weight Requires Grad: True
Weight Grad Function: None None
Bias Requires Grad: True
Bias Grad Function: None None
Key: l2
Weight Requires Grad: True
Weight Grad Function: None None
Bias Requires Grad: True
Bias Grad Function: None None
Key: bl2
Weight Requires Grad: True
Weight Grad Function: None None
Bias Requires Grad: True
Bias Grad Function: None None
Key: al2
Weight Requires Grad: True
Weight Grad Function: None None
Bias Requires Grad: True
Bias Grad Function: None None
Key: m3
Key: top_m_3
Key: conv_last3
Weight Requires Grad: True
Weight Grad Function: None None
Bias Requires Grad: True
Bias Grad Function: None None
Key: bn_end3
Weight Requires Grad: True
Weight Grad Function: None None
Bias Requires Grad: True
Bias Grad Function: None None
Key: l3
Weight Requires Grad: True
Weight Grad Function: None None
Bias Requires Grad: True
Bias Grad Function: None None
```

The problem actually occurs when I freeze the gradient for the `l0`

moduleās bias. Why would that be? Does this mean that none of the layers after that one are computing gradients? Also, why do no layers have gradient functions at this stage?

The `.grad_fn`

of activations will be populated during the forward pass by default, not of the parameters.

Is your training working fine without using Lightning? If not, could you post an executable code snippet, so that we could have a look?