Hi folks.
I successfully managed to use Huggingface transformers with Pytorch using a single GPU.
Now, I’m trying to use multiple gpus with DataParallel. While wrapped in DataParallel, my model begins as follows:
DataParallel(
  (module): DataParallel(
    (module): CustomTransformerModel(
      (transformer): RobertaForSequenceClassification(
        (roberta): RobertaModel(
          (embeddings): RobertaEmbeddings(
            (word_embeddings): Embedding(50265, 768, padding_idx=1)
            (position_embeddings): Embedding(514, 768, padding_idx=1)
            (token_type_embeddings): Embedding(1, 768)
            (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
[...]
etc..
Whareas while NOT wrapped in DataParallel, the model look like this:
CustomTransformerModel(
  (transformer): RobertaForSequenceClassification(
    (roberta): RobertaModel(
      (embeddings): RobertaEmbeddings(
        (word_embeddings): Embedding(50265, 768, padding_idx=1)
        (position_embeddings): Embedding(514, 768, padding_idx=1)
        (token_type_embeddings): Embedding(1, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
[...]
etc
As I try to reach the transformer as it is wrapped in DataParallel, I incur in the (predictable) error: AttributeError: 'DataParallel' object has no attribute 'transformer'.
I ran a quick search and found this thread: How to reach model attributes wrapped by nn.DataParallel?
but still, model.module.transformer outputs AttributeError: 'DataParallel' object has no attribute 'transformer'.
The thread is old though (2017).
How can I reach transformer attribute with Pytorch 1.3.1? Thanks!!