Hello, I’m running the default implementation fo Microsoft’s DeBERTa with only one GPU and the implementation does not uses nn.DataParallel
, this differs from this issue
Running command:
.DeBERTa/experiments/glue/sst2_large.sh
Stacktrace:
11/19/2020 14:50:11|INFO|SST-2|00| device=cuda, n_gpu=1, distributed training=False, world_size=1
11/19/2020 14:50:15|INFO|SST-2|00| Training batch size = 32
11/19/2020 14:50:15|INFO|SST-2|00| Num steps = 12627
/content/DeBERTa/DeBERTa/deberta/disentangled_attention.py:151: UserWarning: Output 0 of SplitBackward is a view and is being modified inplace. This view is an output of a function that returns multiple views. Inplace operators on such views are being deprecated and will be forbidden starting from version 1.8. Consider using `unsafe_` version of the function that produced this view or don't modify this view inplace. (Triggered internally at /pytorch/torch/csrc/autograd/variable.cpp:491.)
query_layer += self.transpose_for_scores(self.q_bias.unsqueeze(0).unsqueeze(0))
/content/DeBERTa/DeBERTa/deberta/disentangled_attention.py:152: UserWarning: Output 2 of SplitBackward is a view and is being modified inplace. This view is an output of a function that returns multiple views. Inplace operators on such views are being deprecated and will be forbidden starting from version 1.8. Consider using `unsafe_` version of the function that produced this view or don't modify this view inplace. (Triggered internally at /pytorch/torch/csrc/autograd/variable.cpp:491.)
value_layer += self.transpose_for_scores(self.v_bias.unsqueeze(0).unsqueeze(0))
11/19/2020 14:50:15|ERROR|SST-2|00| Uncatched exception happened during execution.
Traceback (most recent call last):
File "/content/DeBERTa/DeBERTa/apps/train.py", line 448, in <module>
main(args)
File "/content/DeBERTa/DeBERTa/apps/train.py", line 255, in main
train_model(args, model, device, train_data, eval_data)
File "/content/DeBERTa/DeBERTa/apps/train.py", line 62, in train_model
trainer.train()
File "/content/DeBERTa/DeBERTa/training/trainer.py", line 136, in train
self._train_step(batch, bs_scale)
File "/content/DeBERTa/DeBERTa/training/trainer.py", line 201, in _train_step
loss, sub_size = self.loss_fn(self, self.model, sub)
File "/content/DeBERTa/DeBERTa/apps/train.py", line 58, in loss_fn
_, loss = model(**data)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/DeBERTa/DeBERTa/apps/sequence_classification.py", line 44, in forward
position_ids=position_ids, output_all_encoded_layers=True)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/DeBERTa/DeBERTa/deberta/deberta.py", line 120, in forward
output_all_encoded_layers=output_all_encoded_layers, return_att = return_att)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/DeBERTa/DeBERTa/deberta/bert.py", line 187, in forward
output_states = layer_module(next_kv, attention_mask, return_att, query_states = query_states, relative_pos=relative_pos, rel_embeddings=rel_embeddings)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/DeBERTa/DeBERTa/deberta/bert.py", line 132, in forward
query_states=query_states, relative_pos=relative_pos, rel_embeddings=rel_embeddings)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/DeBERTa/DeBERTa/deberta/bert.py", line 84, in forward
self_output = self.self(hidden_states, attention_mask, return_att, query_states=query_states, relative_pos=relative_pos, rel_embeddings=rel_embeddings)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/DeBERTa/DeBERTa/deberta/disentangled_attention.py", line 152, in forward
value_layer += self.transpose_for_scores(self.v_bias.unsqueeze(0).unsqueeze(0))
RuntimeError: diff_view_meta->output_nr_ == 0 INTERNAL ASSERT FAILED at "/pytorch/torch/csrc/autograd/variable.cpp":363, please report a bug to PyTorch.
Obs: Changing to torch==1.13.0 solves the issue