PyTorch How to optimize only part of a DataParallel model?

How can one optimize only part of a Dataparallel model, while perserving the data-parallel behaviour in an appropriate way.

For example:

parallel_model  = DataParallel(model)
optimizer = torch.optim.SGD(parallel_model.last_conv_layer.parameter(),lr=0.01)
torch.nn.modules.module.ModuleAttributeError: 'DataParallel' object has no attribute 'last_conv_layer'

And, if one use parallel_model.module.last_conv_layer.parameter() to obtain trainable weights, would DataParallel still function as expected?

Thanks in advance.

Yes, accessing the underlying layers via the .module attribute will work. You could alternatively create the optimizer before wrapping the model into nn.DataParalllel.