I want to train only the last fc layer in my pretrained CNN model with distributed data parallel module.
I tried to make the whole model to eval mode and then change the fc layer to train.
model.module.eval()
model.module.fc.train()
and I got following error msg,
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/app/train_action_model_apex.py", line 466, in main_worker
train_model(args, root_dir)
File "/app/train_action_model_apex.py", line 235, in train_model
trainer.train_epoch(epoch, use_amp=True)
File "/app/trainers/action_model_trainer.py", line 202, in train_epoch
self.optimize_model(loss_dict[self.update_loss_name], use_amp)
File "/app/trainers/action_model_trainer.py", line 68, in optimize_model
scaled_loss.backward()
File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.5/dist-packages/apex/amp/handle.py", line 117, in scale_loss
yield (loss.float())*loss_scale
File "/app/trainers/action_model_trainer.py", line 68, in optimize_model
scaled_loss.backward()
File "/usr/local/lib/python3.5/dist-packages/torch/tensor.py", line 107, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/usr/local/lib/python3.5/dist-packages/torch/autograd/__init__.py", line 93, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: expected scalar type Half but found Float
How can I properly fix the problem?