Hi folks.
I successfully managed to use Huggingface transformers with Pytorch using a single GPU.
Now, I’m trying to use multiple gpus with DataParallel
. While wrapped in DataParallel
, my model begins as follows:
DataParallel(
(module): DataParallel(
(module): CustomTransformerModel(
(transformer): RobertaForSequenceClassification(
(roberta): RobertaModel(
(embeddings): RobertaEmbeddings(
(word_embeddings): Embedding(50265, 768, padding_idx=1)
(position_embeddings): Embedding(514, 768, padding_idx=1)
(token_type_embeddings): Embedding(1, 768)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
[...]
etc..
Whareas while NOT wrapped in DataParallel, the model look like this:
CustomTransformerModel(
(transformer): RobertaForSequenceClassification(
(roberta): RobertaModel(
(embeddings): RobertaEmbeddings(
(word_embeddings): Embedding(50265, 768, padding_idx=1)
(position_embeddings): Embedding(514, 768, padding_idx=1)
(token_type_embeddings): Embedding(1, 768)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
[...]
etc
As I try to reach the transformer
as it is wrapped in DataParallel
, I incur in the (predictable) error: AttributeError: 'DataParallel' object has no attribute 'transformer'
.
I ran a quick search and found this thread: How to reach model attributes wrapped by nn.DataParallel?
but still, model.module.transformer
outputs AttributeError: 'DataParallel' object has no attribute 'transformer'
.
The thread is old though (2017).
How can I reach transformer
attribute with Pytorch 1.3.1? Thanks!!