RuntimeError: diff_view_meta->output_nr_ == 0 INTERNAL ASSERT FAILED in GPU

Hello, I’m running the default implementation fo Microsoft’s DeBERTa with only one GPU and the implementation does not uses nn.DataParallel, this differs from this issue

Running command:
.DeBERTa/experiments/glue/sst2_large.sh

Stacktrace:

11/19/2020 14:50:11|INFO|SST-2|00| device=cuda, n_gpu=1, distributed training=False, world_size=1
11/19/2020 14:50:15|INFO|SST-2|00|   Training batch size = 32
11/19/2020 14:50:15|INFO|SST-2|00|   Num steps = 12627
/content/DeBERTa/DeBERTa/deberta/disentangled_attention.py:151: UserWarning: Output 0 of SplitBackward is a view and is being modified inplace. This view is an output of a function that returns multiple views. Inplace operators on such views are being deprecated and will be forbidden starting from version 1.8. Consider using `unsafe_` version of the function that produced this view or don't modify this view inplace. (Triggered internally at  /pytorch/torch/csrc/autograd/variable.cpp:491.)
  query_layer += self.transpose_for_scores(self.q_bias.unsqueeze(0).unsqueeze(0))
/content/DeBERTa/DeBERTa/deberta/disentangled_attention.py:152: UserWarning: Output 2 of SplitBackward is a view and is being modified inplace. This view is an output of a function that returns multiple views. Inplace operators on such views are being deprecated and will be forbidden starting from version 1.8. Consider using `unsafe_` version of the function that produced this view or don't modify this view inplace. (Triggered internally at  /pytorch/torch/csrc/autograd/variable.cpp:491.)
  value_layer += self.transpose_for_scores(self.v_bias.unsqueeze(0).unsqueeze(0))
11/19/2020 14:50:15|ERROR|SST-2|00| Uncatched exception happened during execution.
Traceback (most recent call last):
  File "/content/DeBERTa/DeBERTa/apps/train.py", line 448, in <module>
    main(args)
  File "/content/DeBERTa/DeBERTa/apps/train.py", line 255, in main
    train_model(args, model, device, train_data, eval_data)
  File "/content/DeBERTa/DeBERTa/apps/train.py", line 62, in train_model
    trainer.train()
  File "/content/DeBERTa/DeBERTa/training/trainer.py", line 136, in train
    self._train_step(batch, bs_scale)
  File "/content/DeBERTa/DeBERTa/training/trainer.py", line 201, in _train_step
    loss, sub_size = self.loss_fn(self, self.model, sub)
  File "/content/DeBERTa/DeBERTa/apps/train.py", line 58, in loss_fn
    _, loss = model(**data)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/DeBERTa/DeBERTa/apps/sequence_classification.py", line 44, in forward
    position_ids=position_ids, output_all_encoded_layers=True)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/DeBERTa/DeBERTa/deberta/deberta.py", line 120, in forward
    output_all_encoded_layers=output_all_encoded_layers, return_att = return_att)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/DeBERTa/DeBERTa/deberta/bert.py", line 187, in forward
    output_states = layer_module(next_kv, attention_mask, return_att, query_states = query_states, relative_pos=relative_pos, rel_embeddings=rel_embeddings)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/DeBERTa/DeBERTa/deberta/bert.py", line 132, in forward
    query_states=query_states, relative_pos=relative_pos, rel_embeddings=rel_embeddings)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/DeBERTa/DeBERTa/deberta/bert.py", line 84, in forward
    self_output = self.self(hidden_states, attention_mask, return_att, query_states=query_states, relative_pos=relative_pos, rel_embeddings=rel_embeddings)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/DeBERTa/DeBERTa/deberta/disentangled_attention.py", line 152, in forward
    value_layer += self.transpose_for_scores(self.v_bias.unsqueeze(0).unsqueeze(0))
RuntimeError: diff_view_meta->output_nr_ == 0 INTERNAL ASSERT FAILED at "/pytorch/torch/csrc/autograd/variable.cpp":363, please report a bug to PyTorch.

Obs: Changing to torch==1.13.0 solves the issue

PyTorch 1.13 wasn’t released yet :stuck_out_tongue: Which version creates this error and is 1.3 the working fallback?

Sorry, it was a typo. I’ve tested with 1.7, 1.6, and 1.5 and the error persists. With 1.3.0 I was able to run.

Thanks for the update, could you create an issue on GitHub so that we can track and fix it, please?