AttributeError: 'LambdaLR' object has no attribute 'param_groups' in deberta-v3-large

AttributeError Traceback (most recent call last)
in ()
1 for fold in range(5):
----> 2 training_loop(train_df.iloc[0:500,:],fold)
3

4 frames
in training_loop(df, fold)
30 for epoch in range(Cfg.epochs):
31 criterion=nn.CrossEntropyLoss(weight=weights.to(torch.float16).to(device),reduction=‘none’)
—> 32 loss,preds,labels=train_fn(model,train_loader,scheduler,optimizer,criterion)
33 print(f’training loss for {fold}th fold and {epoch+1} epoch is {loss}’)
34 train_score=pearsonr(preds,labels)[0]

in train_fn(model, train_loader, optimizer, scheduler, criterion)
24 scaler.scale(loss).backward()
25 if (batch_idx+1)%Cfg.accumulation_steps==0 or (batch_idx+1)==len(train_loader):
—> 26 scaler.step(optimizer)
27 scaler.update()
28 optimizer.zero_grad()

/usr/local/lib/python3.7/dist-packages/torch/cuda/amp/grad_scaler.py in step(self, optimizer, *args, **kwargs)
332
333 if optimizer_state[“stage”] is OptState.READY:
→ 334 self.unscale_(optimizer)
335
336 assert len(optimizer_state[“found_inf_per_device”]) > 0, “No inf checks were recorded for this optimizer.”

/usr/local/lib/python3.7/dist-packages/torch/cuda/amp/grad_scaler.py in unscale_(self, optimizer)
277 found_inf = torch.full((1,), 0.0, dtype=torch.float32, device=self._scale.device)
278
→ 279 optimizer_state[“found_inf_per_device”] = self.unscale_grads(optimizer, inv_scale, found_inf, False)
280 optimizer_state[“stage”] = OptState.UNSCALED
281

/usr/local/lib/python3.7/dist-packages/torch/cuda/amp/grad_scaler.py in unscale_grads(self, optimizer, inv_scale, found_inf, allow_fp16)
200 per_device_and_dtype_grads = defaultdict(lambda: defaultdict(list)) # type: ignore[var-annotated]
201 with torch.no_grad():
→ 202 for group in optimizer.param_groups:
203 for param in group[“params”]:
204 if param.grad is None:

AttributeError: ‘LambdaLR’ object has no attribute ‘param_groups’

Based on the error message it seems you are passing a scheduler to scaler.step() instead of an optimizer.

thanks for reply!
you can see the code i am passing scaler.step(optimizer) at line 26

The code doesn’t show what optimizer is, so I guess you are overriding it as seen here:

# works
p = nn.Parameter(torch.randn(1).cuda())
optimizer = torch.optim.Adam([p], lr=1e-3)

scaler = torch.cuda.amp.GradScaler()

loss = p * 2
scaler.scale(loss).backward()
scaler.step(optimizer) # works

# breaks
p = nn.Parameter(torch.randn(1).cuda())
optimizer = torch.optim.Adam([p], lr=1e-3)
optimizer = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda epoch: 0.95 ** epoch)

scaler = torch.cuda.amp.GradScaler()

loss = p * 2
scaler.scale(loss).backward()
scaler.step(optimizer)
# AttributeError: 'LambdaLR' object has no attribute 'param_groups'

thanks for reply!

sorry if i misunderstood your comment ‘’ The code doesn’t show what optimizer is’’ are you asking which optimizer i am using or you are referring to something else.
i am sure that i am not confusing scheduler with optimizer as you mentioned in your comment here ‘optimizer = torch.optim.Adam([p], lr=1e-3)
optimizer = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda epoch: 0.95 ** epoch)’’
fyi i am using AdamW optimizer, with get_cosine_schedule_with_warmup(),

That’s interesting. Could you post a minimal, executable code snippet which would reproduce the issue, please?