Missing key error when train with DistributedDataParallel

It seems that you are saving state_dict saved from a single-gpu model and loading it to your DDP model.
DDP models have their elements under .module.
ex) self.model.module.backbone._conv_stem
I’d recommend you to try loading the state_dict by
self.model.module.load_state_dict(state_dict).

You can find more details in this thread.