Speed up model transformation DistributedDataParallel

Johan_Adler · March 14, 2024, 11:55pm

Hi, I have a model defined as the following:

Doc2VecModel(
  (word_embeddings): Embedding(23026, 400)
  (doc_embeddings): Embedding(23026, 400)
  (linear): Linear(in_features=400, out_features=23026, bias=True)
  (linear2): Linear(in_features=800, out_features=400, bias=True)
  (log_softmax): LogSoftmax(dim=1)
)

The program takes a long time to execute, wrapping the model in DDP ie:

DDP(model, device_ids=[gpu_id])

However, if I reduce the size of the model to something like this:

Doc2VecModel(
  (word_embeddings): Embedding(400, 400)
  (doc_embeddings): Embedding(400, 400)
  (linear): Linear(in_features=400, out_features=400, bias=True)
  (linear2): Linear(in_features=400, out_features=400, bias=True)
  (log_softmax): LogSoftmax(dim=1)
)

then it executes almost instantly. Is there a way to speed up the model transformation for the first model?

H-Huang · March 18, 2024, 1:28pm

In DDP initialization the model parameters are broadcasted to all ranks in the DDP group, so there is not much way around that. The initialization is a 1 time cost so if the training over multiple hours the initialization time ratio will be minimal

the first model has 23026 num_embeddings, out_features which is 58 times larger compared to model 2 with 400, so it makes sense that 2nd model is much faster. It would be like trying to initialize model 2, but 58 times.