Hi, I’m using allennlp to do distributed bert training.
In their code, model has some customized functions, e.g., get_metrics, and get_regularization_penalty. After wrapping it with ddp, there is a comment says
# Using `DistributedDataParallel`(ddp) brings in a quirk wrt AllenNLP's `Model` interface and its
# usage. A `Model` object is wrapped by `ddp`, but assigning the wrapped model to `self.model`
# will break the usages such as `Model.get_regularization_penalty`, `Model.get_metrics`, etc.
# Hence a reference to Pytorch's object is maintained in the case of distributed training and in the
# normal case, reference to `Model` is retained. This reference is only used in
# these places: `model.__call__`, `model.train` and `model.eval`.
My question is what is the relationship between self.model and its wrapped version self._pytorch_model?
anlp_m = Model(ThePyTorchModel) (AllenNLP’s model class)
ddp_m and anlp_m wrap (i.e. contain a reference to) the (same) instance m as .module and .model usually.
Now AllenNLP doesn’t want to special case and write .model.module if isinstance(.model, DDP) else .model all the time, so it leaves .model to be the regular model m but stores ddp_m as the PyTorch ._pytorch_model.
So you should have anlp_m._pytorch_model.module is anlp_m.model return True, they are indeed the very same object.
Yes in the above sense (that you have an additional hierarchy level when going through DDP).