Element 0 of tensors does not require grad and does not have a grad_fn How can i fix this

ptrblck · May 27, 2024, 12:54pm

Neither of the posted code snippets is properly formatted, minimal, or executable, so please post fix it in case you are stuck and need some debugging help.

Cuig_Thallann · May 27, 2024, 1:52pm

The third one up is a download of a Jupyter Notebook file. You can copy and past it into a .ipynb file and load it into Jupyter Notebook. You can then step through it and hopefully see the exception.

The very last is the dump of the error stack , I was hoping someone expert could look at the stack and tell me what caused the problem.

ptrblck · May 27, 2024, 2:07pm

The root cause of your original error:

is unknown and we would need a minimal and executable code snippet to debug it further.

The new issue:

is caused by a wrong target shape in the loss calculation:

criterion = nn.CrossEntropyLoss()

batch_size = 2
nb_classes = 10

output = torch.randn(batch_size, nb_classes, requires_grad=True)
target = torch.randint(0, nb_classes, (batch_size,))

# works
loss = criterion(output, target)

# fails since wrong shape
loss = criterion(output, target.unsqueeze(1))
# RuntimeError: 0D or 1D target tensor expected, multi-target not supported

Cuig_Thallann · May 27, 2024, 8:28pm

Thanks. I think I accidently posted one of my debug runs instead of the error stack from the code that AK presented.
The loss.grad_fn = None is coming back from cross_entropy().
I cannot figure out what "tensor[0] it is referring to.
Thanks so much for the help.
The correct AK code error I stack for his code is as follows.

RuntimeError Traceback (most recent call last)
Cell In[11], line 30
27 for p in parameters:
28 p.grad = None
—> 30 loss.backward()
32 #update
33 lr = 0.1 if i < 10000 else 0.01

File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch_tensor.py:522, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
512 if has_torch_function_unary(self):
513 return handle_torch_function(
514 Tensor.backward,
515 (self,),
(…)
520 inputs=inputs,
521 )
→ 522 torch.autograd.backward(
523 self, gradient, retain_graph, create_graph, inputs=inputs
524 )

File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\autograd_init_.py:266, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
261 retain_graph = create_graph
263 # The reason we repeat the same comment below is that
264 # some Python versions print out the first line of a multi-line function
265 # calls in the traceback and some print out the last line
→ 266 Variable.execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
267 tensors,
268 grad_tensors,
269 retain_graph,
270 create_graph,
271 inputs,
272 allow_unreachable=True,
273 accumulate_grad=True,
274 )

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Riyaj_Atar · June 15, 2024, 5:28am

Hi i am also getting same error. this is model archtecture


class LLM(nn.Module):
    
    def __init__(self,device='cuda'):
        
        super().__init__()
        
        self.peft_model_name = 'castorini/repllama-v1-7b-lora-passage' #'castorini/repllama-v1-7b-lora-doc' #'castorini/repllama-v1-7b-lora-passage'
        self.base_model_name = '/Llama-2-7b-hf/snapshots/8cca527612d856d7d32bd94f8103728d614eb852'
        # self.tokenizer = AutoTokenizer.from_pretrained(base_model_name)
        self.model = self.get_model()
        self.device = device
        self.model = self.model.to(self.device)

        
    def get_model(self):
        config = PeftConfig.from_pretrained(self.peft_model_name)
        base_model = AutoModel.from_pretrained(self.base_model_name)
        model = PeftModel.from_pretrained(base_model, self.peft_model_name)
        model = model.merge_and_unload()
        return model
        
    def forward(self,_input):
        _outputs = self.model(input_ids=_input["input_ids"],attention_mask=_input["attention_mask"])
        
        return _outputs

on this model , i am just extracting the last hidden state and passing to cross-entropy loss.
i checked that not using .no_grad() or .eval() . still getting same error

0]: trainer.train()
[rank0]: File “/opt/conda/lib/python3.10/site-packages/transformers/trainer.py”, line 1885, in train
[rank0]: return inner_training_loop(
[rank0]: File “/opt/conda/lib/python3.10/site-packages/transformers/trainer.py”, line 2216, in _inner_training_loop
[rank0]: tr_loss_step = self.training_step(model, inputs)
[rank0]: File “/opt/conda/lib/python3.10/site-packages/transformers/trainer.py”, line 3250, in training_step
[rank0]: self.accelerator.backward(loss)
[rank0]: File “/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py”, line 2130, in backward
[rank0]: self.scaler.scale(loss).backward(**kwargs)
[rank0]: File “/opt/conda/lib/python3.10/site-packages/torch/_tensor.py”, line 525, in backward
[rank0]: torch.autograd.backward(
[rank0]: File “/opt/conda/lib/python3.10/site-packages/torch/autograd/init.py”, line 267, in backward
[rank0]: _engine_run_backward(
[rank0]: File “/opt/conda/lib/python3.10/site-packages/torch/autograd/graph.py”, line 744, in _engine_run_backward
[rank0]: return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
[rank0]: RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

yingxuan · November 8, 2024, 3:43am

Hello! Did you manage to resolve this issue? I encounter the same problem too.