RuntimeError: element 0 of variables does not require grad and does not have a grad_fn

Been struggling on this issue for a couple of hours. For me (1D CNN signal classification in Google Colab), it was as simple as:

i have this error on my code
i have seached internet but i cant solve the problem
my code is:
def init_param(size,std=1):
return (torch.randn(size)*std).requires_grad_()


def train_epoch(lr,params):
for xb,yb in dltrain:

    for p in weight,bias:


The following code is the source of your error -

You are essentially re-creating pred here which causes it to lose its computation graph.
It looks like a classification problem; try exploring loss functions like this one rather than defining your own like you’ve done here which also naturally has its requires_grad=False.

RuntimeError: element 0 of variables does not require grad and does not have a grad_fn here’s my training and it comes under loss.backward I would like to keep require grads To false as i don’t want them but I have to keep them on for model to work can you pls help

Here another snippet

Could you please post the code for the forward method of your custom model class Net?
Rather than screenshots you could include the code by enclosing between ```.

Also, apparently you are using the training_step from lightning - I don’t use lightning so not sure, but if there’s any code, please post.

‘’'class ImageClassificationBase(nn.Module):

def training_step(self, batch):
    images, labels = batch 
    out = self(images)                  # Generate predictions
    loss = F.nll_loss(out, labels) # Calculate loss
    return loss

def validation_step(self, batch):
    images, labels = batch 
    out = self(images)                    # Generate predictions
    loss = F.nll_loss(out, labels)   # Calculate loss
    acc = accuracy(out, labels)           # Calculate accuracy
    return {'val_loss': loss.detach(), 'val_acc': acc}
def validation_epoch_end(self, outputs):
    batch_losses = [x['val_loss'] for x in outputs]
    epoch_loss = torch.stack(batch_losses).mean()   # Combine losses
    batch_accs = [x['val_acc'] for x in outputs]
    epoch_acc = torch.stack(batch_accs).mean()      # Combine accuracies
    return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}

def epoch_end(self, epoch, result):
    print("Epoch [{}], train_loss: {:.4f}, val_loss: {:.4f}, val_acc: {:.4f}".format(
        epoch, result['train_loss'], result['val_loss'], result['val_acc']))''''

then the model

‘’’ class Net(ImageClassificationBase):
def init(self):
super(Net, self).init()

    self.features = nn.Sequential(nn.Conv2d(in_channels=3,out_channels=16,
                                  nn.MaxPool2d(2, 2),
    self.avgpool = nn.AdaptiveAvgPool2d((56,56))
    self.classifier = nn.Sequential(nn.Linear(56 * 56 * 96, 256),
                                    nn.Linear(256, 512),
                                    nn.Linear(512, 17),
def forward(self, x):
    x = self.features(x)
    x = self.avgpool(x)
    x = x.view(-1, 56 * 56 * 96)
    x = self.classifier(x)
    return x ''''''

for param in model.parameters():
param.requires_grad = True

‘’'def accuracy(outputs, labels):
_, preds = torch.max(outputs, dim=1)
return torch.tensor(torch.sum(preds == labels).item() / len(preds))

def evaluate(model, val_loader):
outputs = [model.validation_step(batch) for batch in val_loader]
return model.validation_epoch_end(outputs)

def fit(epochs, lr, model, train_loader, val_loader, opt_func = torch.optim.Adam):

history = []
optimizer = opt_func(model.parameters(),lr)
for epoch in range(epochs):
    train_losses = []
    for batch in train_loader:
        loss = model.training_step(batch)
    result = evaluate(model, val_loader)
    result['train_loss'] = torch.stack(train_losses).mean().item()
    model.epoch_end(epoch, result)

return history

num_epochs = 100

opt_func = torch.optim.Adam

lr = 0.01

#sets the number of epochs

#fitting the model on training data and record the result after each epoch

history = fit(num_epochs, lr, model, train_load, valid_load, opt_func)


after few epochs the val loss keeps going up and training loss decreases and accuracy is 20-30% best, I want to turn the required gradient off in param_reuqire grads section but I keep running into runtime error element 0 does not require etc etc

Doesn’t this mean the model is already training and there’s no error when calling loss.backward() like you stated?

Anyway, please point me to the exact line where you face this error, as I didn’t quite understand this -

hello model runs when the Require grads are True but I want to run it with for param in model.parameters():
param.requires_grad = False

then it shows me to this error,

Yes, that’s expected.
Parameters are the leaf nodes and the ones with respect to which the gradient of the loss tensor is calculated. So, they need to have requires_grad=True.

Since you are already evaluating with torch.no_grad, there are no unnecessary gradient calculations taking place.

The loss problem has nothing to do with the parameters requiring gradients.

Try regularisation techniques.
Potentially, try learning rate schedulers and read about early stopping.

could you recommend me anything for this even based on the network and training parameters, my accuracy wont go above 33% Ideally 75/80% would be good. I really appreciate your help

This worked for me. Thanks.

I am trying to recreate CNN model from keras and facing runtime error on my loss.bakcward() (if I remove the function the model runs without learning)
I have check the model summary and saw summary was running fine with its channels and tensor, however due to my lack of knowledge I can not locate the bug. Can anyone give me an advice?

Then model :

from typing import List
class DNA_CNN_test2(nn.Module): # deepcre model  
    def __init__(self,
                 seq_len: int =1000,
                 #num_filters: List[int] = [64, 128, 64],
                 kernel_size: int = 8,
                 p = 0.25): # drop out value 
        self.seq_len = seq_len
        window_size = int(seq_len*(8/3000))
        # CNN module
        self.conv_net = nn.Sequential() # sequential containter. the forward() method of sequential accepts cany input and forwards it to yhe first module it contains 
        #num_filters = [4] + num_filters
        self.model = nn.Sequential(
            # conv block 1
            nn.Conv1d(4,64,kernel_size=kernel_size, padding='same'),
            nn.Conv1d(64,64,kernel_size=kernel_size, padding='same'), 
            # conv block 2
            nn.Conv1d(64,128,kernel_size=kernel_size, padding='same'),
            nn.Conv1d(128,128,kernel_size=kernel_size, padding='same'),
            # conv block 3
            nn.Conv1d(128,64,kernel_size=kernel_size, padding='same'),
            nn.Conv1d(64,64,kernel_size=kernel_size, padding='same'),
            nn.Linear(64*(seq_len//window_size**3), 1))
            #nn.Linear(128, 64),
            #nn.Linear(64*seq_len, 1))

    def forward(self, xb: torch.Tensor):
        """Forward pass."""
        # reshape view to batch_ssize x 4channel x seq_len
        # permute to put channel in correct order
        means (batch size, 4 channel - OHE(DNA), Seq.length  )
        xb = xb.permute(0, 2, 1).mean( dim = [1,2], keepdim = True).squeeze(dim= -1)
        out = self.conv_net(xb)
        return out 

loss_batch,train and test step

# +--------------------------------+
# | Training and fitting functions |
# +--------------------------------+

def loss_batch(model, loss_func, xb, yb, opt=None,verbose=False):
    Apply loss function to a batch of inputs. If no optimizer
    is provided, skip the back prop step.
    if verbose:
        print('loss batch ****')
        print("xb shape:",xb.shape)
        print("yb shape:",yb.shape)
        print("yb shape:",yb.squeeze(1).shape)

    # get the batch output from the model given your input batch 
    # ** This is the model's prediction for the y labels! **
    xb_out = model(xb.float())
    if verbose:
        print("model out pre loss", xb_out.shape)
        #print('xb_out', xb_out)
    loss = loss_func(xb_out, yb.float()) # for MSE/regression
    # __FOOTNOTE 2__
    if opt is not None: # if opt

    return loss.item(), len(xb)

def train_step(model, train_dl, loss_func, device, opt):
    Execute 1 set of batched training within an epoch
    # Set model to Training mode
    tl = [] # train losses
    ns = [] # batch sizes, n
    # loop through train DataLoader
    for xb, yb in train_dl:
        # put on GPU
        xb, yb =,
        # provide opt so backprop happens
        t, n = loss_batch(model, loss_func, xb, yb, opt=opt)
        # collect train loss and batch sizes
    # average the losses over all batches    
    train_loss = np.sum(np.multiply(tl, ns)) / np.sum(ns)
    return train_loss

def val_step(model, val_dl, loss_func, device):
    Execute 1 set of batched validation within an epoch
    # Set model to Evaluation mode
    with torch.no_grad():
        vl = [] # val losses
        ns = [] # batch sizes, n
        # loop through validation DataLoader
        for xb, yb in val_dl:
            # put on GPU
            xb, yb =,

            # Do NOT provide opt here, so backprop does not happen
            v, n = loss_batch(model, loss_func, xb, yb)

            # collect val loss and batch sizes

    # average the losses over all batches
    val_loss = np.sum(np.multiply(vl, ns)) / np.sum(ns)
    return val_loss

def fit(epochs, model, loss_func, opt, train_dl, val_dl,device,patience=1000):
    Fit the model params to the training data, eval on unseen data.
    Loop for a number of epochs and keep train of train and val losses 
    along the way
    # keep track of losses
    train_losses = []    
    val_losses = []
    # loop through epochs
    for epoch in range(epochs):
        # take a training step
        train_loss = train_step(model,train_dl,loss_func,device,opt)

        # take a validation step
        val_loss = val_step(model,val_dl,loss_func,device)
        print(f"E{epoch} | train loss: {train_loss:.3f} | val loss: {val_loss:.3f}")

    return train_losses, val_losses

def run_model(train_dl,val_dl,model,device,
              lr=1e-2, epochs=50, 
    Given train and val DataLoaders and a NN model, fit the mode to the training
    data. By default, use MSE loss and an SGD optimizer
    # define optimizer
    if opt:
        optimizer = opt
    else: # if no opt provided, just use SGD
        optimizer = torch.optim.SGD(model.parameters(), lr=lr)
    # define loss function
    if lossf:
        loss_func = lossf
    else: # if no loss function provided, just use MSE
        loss_func = torch.nn.MSELoss()
    # run the training loop
    train_losses, val_losses = fit(

    return train_losses, val_losses


RuntimeError                              Traceback (most recent call last)
Cell In[51], line 5
      2 DNA_CNN_test2 = DNA_CNN_test2(seq_len)
----> 5 DNA_CNN_test2_train_losses_lr4, DNA_CNN_test2_val_losses_lr4 = run_model(
      6     train_dl, 
      7     val_dl, 
      8     DNA_CNN_test2,
      9     device,
     10     epochs=100,
     11     lr= 1e-2
     12 )

Cell In[42], line 139, in run_model(train_dl, val_dl, model, device, lr, epochs, lossf, opt)
    136     loss_func = torch.nn.MSELoss()
    138 # run the training loop
--> 139 train_losses, val_losses = fit(
    140                             epochs, 
    141                             model, 
    142                             loss_func, 
    143                             optimizer, 
    144                             train_dl, 
    145                             val_dl, 
    146                             device)
    148 return train_losses, val_losses

Cell In[42], line 106, in fit(epochs, model, loss_func, opt, train_dl, val_dl, device, patience)
    103 # loop through epochs
    104 for epoch in range(epochs):
    105     # take a training step
--> 106     train_loss = train_step(model,train_dl,loss_func,device,opt)
    107     train_losses.append(train_loss)
    109     # take a validation step

Cell In[42], line 54, in train_step(model, train_dl, loss_func, device, opt)
     51 xb, yb =,
     53 # provide opt so backprop happens
---> 54 t, n = loss_batch(model, loss_func, xb, yb, opt=opt)
     56 # collect train loss and batch sizes
     57 tl.append(t)

Cell In[42], line 32, in loss_batch(model, loss_func, xb, yb, opt, verbose)
     29 # __FOOTNOTE 2__
     31 if opt is not None: # if opt
---> 32     loss.backward()
     33     opt.step()
     34     opt.zero_grad()

File /mnt/biostat/environments/parkj/dna2rna/lib/python3.11/site-packages/torch/, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
    477 if has_torch_function_unary(self):
    478     return handle_torch_function(
    479         Tensor.backward,
    480         (self,),
    485         inputs=inputs,
    486     )
--> 487 torch.autograd.backward(
    488     self, gradient, retain_graph, create_graph, inputs=inputs
    489 )

File /mnt/biostat/environments/parkj/dna2rna/lib/python3.11/site-packages/torch/autograd/, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    195     retain_graph = create_graph
    197 # The reason we repeat same the comment below is that
    198 # some Python versions print out the first line of a multi-line function
    199 # calls in the traceback and some print out the last line
--> 200 Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    201     tensors, grad_tensors_, retain_graph, create_graph, inputs,
    202     allow_unreachable=True, accumulate_grad=True)

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Based on your code self.conv_net is called in the forward method which is an empty nn.Sequential container. I’m not sure if the input will be directly returned in this case but if so, it would explain the error.

thank you for the reply. my input is written in specific way as it is DNA sequence and its corresponding value

class SeqDatasetOHE(Dataset):
Dataset for one-hot-encoded sequences
def __init__(self,
    # +--------------------+
    # | Get the X examples |
    # +--------------------+
    # extract the DNA from the appropriate column in the df
    self.seqs = list(df[seq_col].values)
    self.seq_len = len(self.seqs[0])
    # one-hot encode sequences, then stack in a torch tensor
    self.ohe_seqs = torch.stack([torch.tensor(one_hot_encode(x)) for x in self.seqs])

    # +------------------+
    # | Get the Y labels |
    # +------------------+
    self.labels = torch.tensor(list(df[target_col].values)).unsqueeze(1)
def __len__(self): return len(self.seqs)

def __getitem__(self,idx):
    # Given an index, return a tuple of an X with it's associated Y
    # This is called inside DataLoader
    seq = self.ohe_seqs[idx]
    label = self.labels[idx]
    return seq, label

#### buidling dataloader - batch size setting
currently batch size 4096 and it is fastai dataloader 

## constructed DataLoaders from Datasets.
def build_dataloaders(train_df,
#Batch size – Refers to the number of samples in each batch.
#Shuffle – Whether you want the data to be reshuffled or not.
Given a train and test df with some batch construction
details, put them into custom SeqDatasetOHE() objects. 
Give the Datasets to the DataLoaders and return.
# create Datasets    
train_ds = SeqDatasetOHE(train_df,seq_col=seq_col,target_col=target_col)
test_ds = SeqDatasetOHE(test_df,seq_col=seq_col,target_col=target_col)

# Put DataSets into DataLoaders
train_dl = DataLoader(train_ds, batch_size=batch_size, shuffle=shuffle)
test_dl = DataLoader(test_ds, batch_size=batch_size)

return train_dl,test_dl

train_dl, val_dl = build_dataloaders(train_df, val_df)

Therefore I used xb.permute to arrange the data for the forward function. Would this be causing error?

No, permute won’t break the graph. Did you understand my concern in my previous post and did you check the forward method? It seems no trainable parameter or layers are used at all, since you are calling into self.conv_net not self.model.

thank you the relpy. Appolosize for missing understanding as I am new to the pytorch. I will try new model and will come back to the forum again!

Hi i have this model

class FakeNews_Classifier(pl.LightningModule):

def init(self, config: dict):
self.config = config
self.pretrained_model = AutoModel.from_pretrained(config[‘model_name’], return_dict = True)
self.hidden = torch.nn.Linear(self.pretrained_model.config.hidden_size, self.pretrained_model.config.hidden_size)
self.classifier = torch.nn.Linear(self.pretrained_model.config.hidden_size, self.config[‘n_labels’])
self.loss_func = nn.BCEWithLogitsLoss(reduction=‘mean’)
self.dropout = nn.Dropout()

def forward(self, input_ids, attention_mask, labels=None):
# roberta layer
output = self.pretrained_model(input_ids=input_ids, attention_mask=attention_mask)
pooled_output = torch.mean(output.last_hidden_state, 1)
# final logits
pooled_output = self.dropout(pooled_output)
pooled_output = self.hidden(pooled_output)
pooled_output = F.relu(pooled_output)
pooled_output = self.dropout(pooled_output)
logits = self.classifier(pooled_output)
# calculate loss
loss = 0
if labels is not None:
loss = self.loss_func(logits.view(-1, self.config[‘n_labels’]), labels.view(-1, self.config[‘n_labels’]))
return loss, logits

def training_step(self, batch, batch_index):
loss, outputs = self(**batch)
self.log("train loss ", loss, prog_bar = True, logger=True)
return {“loss”:loss, “predictions”:outputs, “labels”: batch[“labels”]}

def validation_step(self, batch, batch_index):
loss, outputs = self(**batch)
self.log("validation loss ", loss, prog_bar = True, logger=True)
return {“val_loss”: loss, “predictions”:outputs, “labels”: batch[“labels”]}

def predict_step(self, batch, batch_index):
loss, outputs = self(**batch)
return outputs

def configure_optimizers(self):
optimizer = AdamW(self.parameters(), lr=self.config[‘lr’], weight_decay=self.config[‘weight_decay’])
total_steps = self.config[‘train_size’]/self.config[‘batch_size’]
warmup_steps = math.floor(total_steps * self.config[‘warmup’])
warmup_steps = math.floor(total_steps * self.config[‘warmup’])
scheduler = get_cosine_schedule_with_warmup(optimizer, warmup_steps, total_steps)
return [optimizer],[scheduler]
and i got this error :
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
can you plz tell me how to fix the error, i am new and i’m taring my best to understanding this , plz help me

Thanks solved the issue


I read all topic regarding error:

  File "/opt/homebrew/Caskroom/miniforge/base/envs/vitmm310/lib/python3.10/site-packages/torch/autograd/", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

and cannot match answer to my case

I got perceiver model from hugging face defined as below:

config = PerceiverConfig(d_model=self._token_size, num_labels=self._num_labels)
decoder = PerceiverClassificationDecoder(
    trainable_position_encoding_kwargs=dict(num_channels=config.d_latents, index_dims=1),
return PerceiverModel(config, decoder=decoder)

token_size = 800
num_labels = 7

as a input I pass tensor in shape [batch_size, 32,800]
and labels as tensor in shape [batch_size, 7]

I made training loop as follow:

criterion = torch.nn.MSELoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=envi_builder.config.learning_rate)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=envi_builder.config.step_size, gamma=0.5)

for epoch in range(envi_builder.config.n_epochs):
    loop = tqdm(dataloader_train, leave=True)
    for (inputs, labels) in loop:

        inputs =
        labels =

        outputs = model(inputs=inputs)
        logits = outputs.logits

        loss = criterion(logits, labels)


for loss attr I see that there is:

grad_fn = None
requires_grad = False

Perciver model was taken from huggingface

from transformers import PerceiverConfig, PerceiverModel
from transformers.models.perceiver.modeling_perceiver import (

any idea ?


I added before epoch loop

and logins from model starts to have grad_fn()

seems that something wrongs going on inside model forward () in part

        sequence_output = encoder_outputs[0]

        logits = None
        if self.decoder:
            if subsampled_output_points is not None:
                output_modality_sizes = {
                    "audio": subsampled_output_points["audio"].shape[0],
                    "image": subsampled_output_points["image"].shape[0],
                    "label": 1,
                output_modality_sizes = modality_sizes
            decoder_query = self.decoder.decoder_query(
                inputs, modality_sizes, inputs_without_pos, subsampled_points=subsampled_output_points
            decoder_outputs = self.decoder(
            logits = decoder_outputs.logits

which is still something that I cannot understand