Reaching `transformer` attribute while model is wrapped in DataParallel

Hi folks.

I successfully managed to use Huggingface transformers with Pytorch using a single GPU.

Now, I’m trying to use multiple gpus with DataParallel. While wrapped in DataParallel, my model begins as follows:

DataParallel(
  (module): DataParallel(
    (module): CustomTransformerModel(
      (transformer): RobertaForSequenceClassification(
        (roberta): RobertaModel(
          (embeddings): RobertaEmbeddings(
            (word_embeddings): Embedding(50265, 768, padding_idx=1)
            (position_embeddings): Embedding(514, 768, padding_idx=1)
            (token_type_embeddings): Embedding(1, 768)
            (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
[...]
etc..

Whareas while NOT wrapped in DataParallel, the model look like this:

CustomTransformerModel(
  (transformer): RobertaForSequenceClassification(
    (roberta): RobertaModel(
      (embeddings): RobertaEmbeddings(
        (word_embeddings): Embedding(50265, 768, padding_idx=1)
        (position_embeddings): Embedding(514, 768, padding_idx=1)
        (token_type_embeddings): Embedding(1, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
[...]
etc

As I try to reach the transformer as it is wrapped in DataParallel, I incur in the (predictable) error: AttributeError: 'DataParallel' object has no attribute 'transformer'.

I ran a quick search and found this thread: How to reach model attributes wrapped by nn.DataParallel?

but still, model.module.transformer outputs AttributeError: 'DataParallel' object has no attribute 'transformer'.

The thread is old though (2017).

How can I reach transformer attribute with Pytorch 1.3.1? Thanks!!

Looks like you have applied the Dataparallel class twice to the model