How can one optimize only part of a Dataparallel model, while perserving the data-parallel behaviour in an appropriate way.
For example:
parallel_model = DataParallel(model)
optimizer = torch.optim.SGD(parallel_model.last_conv_layer.parameter(),lr=0.01)
error:
torch.nn.modules.module.ModuleAttributeError: 'DataParallel' object has no attribute 'last_conv_layer'
And, if one use parallel_model.module.last_conv_layer.parameter()
to obtain trainable weights, would DataParallel still function as expected?
Thanks in advance.