PyTorch How to optimize only part of a DataParallel model?

CHENSY · June 18, 2021, 7:57am

How can one optimize only part of a Dataparallel model, while perserving the data-parallel behaviour in an appropriate way.

For example:

parallel_model  = DataParallel(model)
optimizer = torch.optim.SGD(parallel_model.last_conv_layer.parameter(),lr=0.01)

error：
torch.nn.modules.module.ModuleAttributeError: 'DataParallel' object has no attribute 'last_conv_layer'

And, if one use parallel_model.module.last_conv_layer.parameter() to obtain trainable weights, would DataParallel still function as expected?

Thanks in advance.

ptrblck · June 18, 2021, 8:08am

Yes, accessing the underlying layers via the .module attribute will work. You could alternatively create the optimizer before wrapping the model into nn.DataParalllel.