Below is the output with with torch.autograd.set_detect_anomaly(True):
[W python_anomaly_mode.cpp:104] Warning: Error detected in IndexBackward. Traceback of forward call that caused the error:
File "/home/experiment.py", line 481, in <module>
experiment('gym-experiment', variant=vars(args))
File "/home/experiment.py", line 438, in experiment
outputs = trainer.train_iteration(num_steps=variant['num_steps_per_iter'], iter_num=iter+1, print_logs=True)
File "/home/training/trainer.py", line 31, in train_iteration
train_loss = self.train_step()
File "/home/training/seq_trainer.py", line 24, in train_step
action_preds = action_preds.reshape(-1, act_dim)[attention_mask.reshape(-1) > 0]
(function _print_stack)
28%|βββββββββββββββββββββββββ | 2819/10000 [07:29<19:03, 6.28it/s]
Traceback (most recent call last):
File "/home/experiment.py", line 481, in <module>
experiment('gym-experiment', variant=vars(args))
File "/home/experiment.py", line 438, in experiment
outputs = trainer.train_iteration(num_steps=variant['num_steps_per_iter'], iter_num=iter+1, print_logs=True)
File "/home/training/trainer.py", line 31, in train_iteration
train_loss = self.train_step()
File "/home/training/seq_trainer.py", line 119, in train_step
loss.backward()
File "/home/anaconda3/envs/project_3_7/lib/python3.7/site-packages/torch/_tensor.py", line 255, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/anaconda3/envs/project_3_7/lib/python3.7/site-packages/torch/autograd/__init__.py", line 149, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: Function 'IndexBackward' returned nan values in its 0th output.
But I am not able to infer anything from here, action_preds only become Nan, when params have become Nan,
on further debugging I found out the first layer to encounter Nan value is:
transformer.h0.ln_1.weight
which stands for (ln_1): LayerNorm((128,), eps=1e-05, elementwise_affine=True)
in my model configuration, Base model that I am using is GPT2
.
Can you suggest something based on these insights, what might be wrong, like how can I use the loss function without encountering this?