How to load a model on single GPU that was trained on dataparallel?

it says always that there is a mismatch in layers. I guess it is not all loaded into on GPU.

It’s recommended to store the model.module.state_dict() for data parallel models as explained here.
If you’ve stored the model.state_dict(), all keys will include the .module keyword, which you would have to remove as described here.

I do exactly this but when I loaded it says mismatch.

Could you post the error message, please?

Error(s) in loading state_dict for Cyclic:
Missing key(s) in state_dict