RuntimeError: diff_view_meta->output_nr_ == 0 INTERNAL ASSERT FAILED in GPU

Emanuel_Huber · November 19, 2020, 3:00pm

Hello, I’m running the default implementation fo Microsoft’s DeBERTa with only one GPU and the implementation does not uses nn.DataParallel, this differs from this issue

Running command:
.DeBERTa/experiments/glue/sst2_large.sh

Stacktrace:

11/19/2020 14:50:11|INFO|SST-2|00| device=cuda, n_gpu=1, distributed training=False, world_size=1
11/19/2020 14:50:15|INFO|SST-2|00|   Training batch size = 32
11/19/2020 14:50:15|INFO|SST-2|00|   Num steps = 12627
/content/DeBERTa/DeBERTa/deberta/disentangled_attention.py:151: UserWarning: Output 0 of SplitBackward is a view and is being modified inplace. This view is an output of a function that returns multiple views. Inplace operators on such views are being deprecated and will be forbidden starting from version 1.8. Consider using `unsafe_` version of the function that produced this view or don't modify this view inplace. (Triggered internally at  /pytorch/torch/csrc/autograd/variable.cpp:491.)
  query_layer += self.transpose_for_scores(self.q_bias.unsqueeze(0).unsqueeze(0))
/content/DeBERTa/DeBERTa/deberta/disentangled_attention.py:152: UserWarning: Output 2 of SplitBackward is a view and is being modified inplace. This view is an output of a function that returns multiple views. Inplace operators on such views are being deprecated and will be forbidden starting from version 1.8. Consider using `unsafe_` version of the function that produced this view or don't modify this view inplace. (Triggered internally at  /pytorch/torch/csrc/autograd/variable.cpp:491.)
  value_layer += self.transpose_for_scores(self.v_bias.unsqueeze(0).unsqueeze(0))
11/19/2020 14:50:15|ERROR|SST-2|00| Uncatched exception happened during execution.
Traceback (most recent call last):
  File "/content/DeBERTa/DeBERTa/apps/train.py", line 448, in <module>
    main(args)
  File "/content/DeBERTa/DeBERTa/apps/train.py", line 255, in main
    train_model(args, model, device, train_data, eval_data)
  File "/content/DeBERTa/DeBERTa/apps/train.py", line 62, in train_model
    trainer.train()
  File "/content/DeBERTa/DeBERTa/training/trainer.py", line 136, in train
    self._train_step(batch, bs_scale)
  File "/content/DeBERTa/DeBERTa/training/trainer.py", line 201, in _train_step
    loss, sub_size = self.loss_fn(self, self.model, sub)
  File "/content/DeBERTa/DeBERTa/apps/train.py", line 58, in loss_fn
    _, loss = model(**data)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/DeBERTa/DeBERTa/apps/sequence_classification.py", line 44, in forward
    position_ids=position_ids, output_all_encoded_layers=True)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/DeBERTa/DeBERTa/deberta/deberta.py", line 120, in forward
    output_all_encoded_layers=output_all_encoded_layers, return_att = return_att)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/DeBERTa/DeBERTa/deberta/bert.py", line 187, in forward
    output_states = layer_module(next_kv, attention_mask, return_att, query_states = query_states, relative_pos=relative_pos, rel_embeddings=rel_embeddings)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/DeBERTa/DeBERTa/deberta/bert.py", line 132, in forward
    query_states=query_states, relative_pos=relative_pos, rel_embeddings=rel_embeddings)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/DeBERTa/DeBERTa/deberta/bert.py", line 84, in forward
    self_output = self.self(hidden_states, attention_mask, return_att, query_states=query_states, relative_pos=relative_pos, rel_embeddings=rel_embeddings)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/content/DeBERTa/DeBERTa/deberta/disentangled_attention.py", line 152, in forward
    value_layer += self.transpose_for_scores(self.v_bias.unsqueeze(0).unsqueeze(0))
RuntimeError: diff_view_meta->output_nr_ == 0 INTERNAL ASSERT FAILED at "/pytorch/torch/csrc/autograd/variable.cpp":363, please report a bug to PyTorch.

Obs: Changing to torch==1.13.0 solves the issue

ptrblck · November 21, 2020, 10:15am

PyTorch 1.13 wasn’t released yet Which version creates this error and is 1.3 the working fallback?

Emanuel_Huber · November 22, 2020, 6:01pm

Sorry, it was a typo. I’ve tested with 1.7, 1.6, and 1.5 and the error persists. With 1.3.0 I was able to run.

ptrblck · November 23, 2020, 4:12am

Thanks for the update, could you create an issue on GitHub so that we can track and fix it, please?