Speed up model transformation DistributedDataParallel

Hi, I have a model defined as the following:

Doc2VecModel(
  (word_embeddings): Embedding(23026, 400)
  (doc_embeddings): Embedding(23026, 400)
  (linear): Linear(in_features=400, out_features=23026, bias=True)
  (linear2): Linear(in_features=800, out_features=400, bias=True)
  (log_softmax): LogSoftmax(dim=1)
)

The program takes a long time to execute, wrapping the model in DDP ie:

DDP(model, device_ids=[gpu_id])

However, if I reduce the size of the model to something like this:

Doc2VecModel(
  (word_embeddings): Embedding(400, 400)
  (doc_embeddings): Embedding(400, 400)
  (linear): Linear(in_features=400, out_features=400, bias=True)
  (linear2): Linear(in_features=400, out_features=400, bias=True)
  (log_softmax): LogSoftmax(dim=1)
)

then it executes almost instantly. Is there a way to speed up the model transformation for the first model?

In DDP initialization the model parameters are broadcasted to all ranks in the DDP group, so there is not much way around that. The initialization is a 1 time cost so if the training over multiple hours the initialization time ratio will be minimal

the first model has 23026 num_embeddings, out_features which is 58 times larger compared to model 2 with 400, so it makes sense that 2nd model is much faster. It would be like trying to initialize model 2, but 58 times.