Neither of the posted code snippets is properly formatted, minimal, or executable, so please post fix it in case you are stuck and need some debugging help.
The third one up is a download of a Jupyter Notebook file. You can copy and past it into a .ipynb file and load it into Jupyter Notebook. You can then step through it and hopefully see the exception.
The very last is the dump of the error stack , I was hoping someone expert could look at the stack and tell me what caused the problem.
The root cause of your original error:
is unknown and we would need a minimal and executable code snippet to debug it further.
The new issue:
is caused by a wrong target shape in the loss calculation:
criterion = nn.CrossEntropyLoss()
batch_size = 2
nb_classes = 10
output = torch.randn(batch_size, nb_classes, requires_grad=True)
target = torch.randint(0, nb_classes, (batch_size,))
# works
loss = criterion(output, target)
# fails since wrong shape
loss = criterion(output, target.unsqueeze(1))
# RuntimeError: 0D or 1D target tensor expected, multi-target not supported
Thanks. I think I accidently posted one of my debug runs instead of the error stack from the code that AK presented.
The loss.grad_fn = None is coming back from cross_entropy().
I cannot figure out what "tensor[0] it is referring to.
Thanks so much for the help.
The correct AK code error I stack for his code is as follows.
RuntimeError Traceback (most recent call last)
Cell In[11], line 30
27 for p in parameters:
28 p.grad = None
—> 30 loss.backward()
32 #update
33 lr = 0.1 if i < 10000 else 0.01
File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch_tensor.py:522, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
512 if has_torch_function_unary(self):
513 return handle_torch_function(
514 Tensor.backward,
515 (self,),
(…)
520 inputs=inputs,
521 )
→ 522 torch.autograd.backward(
523 self, gradient, retain_graph, create_graph, inputs=inputs
524 )
File ~\AppData\Local\Programs\Python\Python312\Lib\site-packages\torch\autograd_init_.py:266, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
261 retain_graph = create_graph
263 # The reason we repeat the same comment below is that
264 # some Python versions print out the first line of a multi-line function
265 # calls in the traceback and some print out the last line
→ 266 Variable.execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
267 tensors,
268 grad_tensors,
269 retain_graph,
270 create_graph,
271 inputs,
272 allow_unreachable=True,
273 accumulate_grad=True,
274 )
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Hi i am also getting same error. this is model archtecture
class LLM(nn.Module):
def __init__(self,device='cuda'):
super().__init__()
self.peft_model_name = 'castorini/repllama-v1-7b-lora-passage' #'castorini/repllama-v1-7b-lora-doc' #'castorini/repllama-v1-7b-lora-passage'
self.base_model_name = '/Llama-2-7b-hf/snapshots/8cca527612d856d7d32bd94f8103728d614eb852'
# self.tokenizer = AutoTokenizer.from_pretrained(base_model_name)
self.model = self.get_model()
self.device = device
self.model = self.model.to(self.device)
def get_model(self):
config = PeftConfig.from_pretrained(self.peft_model_name)
base_model = AutoModel.from_pretrained(self.base_model_name)
model = PeftModel.from_pretrained(base_model, self.peft_model_name)
model = model.merge_and_unload()
return model
def forward(self,_input):
_outputs = self.model(input_ids=_input["input_ids"],attention_mask=_input["attention_mask"])
return _outputs
on this model , i am just extracting the last hidden state and passing to cross-entropy loss.
i checked that not using .no_grad() or .eval() . still getting same error
0]: trainer.train()
[rank0]: File “/opt/conda/lib/python3.10/site-packages/transformers/trainer.py”, line 1885, in train
[rank0]: return inner_training_loop(
[rank0]: File “/opt/conda/lib/python3.10/site-packages/transformers/trainer.py”, line 2216, in _inner_training_loop
[rank0]: tr_loss_step = self.training_step(model, inputs)
[rank0]: File “/opt/conda/lib/python3.10/site-packages/transformers/trainer.py”, line 3250, in training_step
[rank0]: self.accelerator.backward(loss)
[rank0]: File “/opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py”, line 2130, in backward
[rank0]: self.scaler.scale(loss).backward(**kwargs)
[rank0]: File “/opt/conda/lib/python3.10/site-packages/torch/_tensor.py”, line 525, in backward
[rank0]: torch.autograd.backward(
[rank0]: File “/opt/conda/lib/python3.10/site-packages/torch/autograd/init.py”, line 267, in backward
[rank0]: _engine_run_backward(
[rank0]: File “/opt/conda/lib/python3.10/site-packages/torch/autograd/graph.py”, line 744, in _engine_run_backward
[rank0]: return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
[rank0]: RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Hello! Did you manage to resolve this issue? I encounter the same problem too.