How to do transfer learning using CPU with a model trained using DataParallel?

yx0123 · August 11, 2020, 1:20pm

I found a pretrained model online that was trained using DataParallel and I want to retrain the final layer using my own dataset. I load the model and modify the last layer as follows:

model_ft = torch.load(base_model, map_location={'cuda:0': 'cpu'})
for param in model_ft.module.parameters():
    param.requires_grad = False
num_ftrs = model_ft.module.classifier.in_channels
features = list(model_ft.module.classifier.children())[:-1] # remove last later
features.extend([nn.Conv2d(num_ftrs, n_classes*n_atom_outputs, kernel_size=1, bias=True)])
model_ft.module.classifier = nn.Sequential(*features)

When I run the code, I get the following error during forward propagation: Traceback (most recent call last):
File “code/train.py”, line 157, in train_model
outputs = model(inputs)
File “/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py”, line 325, in call
result = self.forward(*input, **kwargs)
…
…
…
File “/usr/local/lib/python3.5/dist-packages/torch/nn/parallel/_functions.py”, line 94, in _get_stream
if _streams[device] is None: IndexError: list index out of range

I believe this is because the model has been wrapped with DataParallel but I do not have a GPU. May I know how can I remove DataParallel from a pretrained model? Or is there a better way to resolve the error? Thank you!

user_123454321 · August 11, 2020, 2:04pm

Can you try this…

model_dict = torch.load(model_path, map_location='cpu')
for (key, val) in list(model_dict.items()):
  value = val.clone()
  del model_dict[key]
  model_dict[key.replace('module.', '')] = value
model.load_state_dict(model_dict)

yx0123 · August 11, 2020, 2:19pm

I tried and got the following error: File “/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py”, line 366, in getattr
type(self).name, name))
AttributeError: ‘DataParallel’ object has no attribute ‘items’

I also tried using

for (key, val) in list(model_dict.module.items()):

and got this error: File “/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py”, line 366, in getattr
type(self).name, name))
AttributeError: ‘DPN’ object has no attribute ‘items’

user_123454321 · August 11, 2020, 2:22pm

Ah…so the model is not saved as dictionary, but the object itself is saved ? Try this…

model = torch.load(model_path, map_location='cpu')
model = model.module

yx0123 · August 11, 2020, 2:41pm

I have already manually replaced model_ft with model_ft.module in my code. I am able to access the model attributes but the IndexError is actually during the forward propagation step.

user_123454321 · August 11, 2020, 3:08pm

Oh ok…
It works for a small module I tried…
Can you send me the model file ?
If not, you can try opening in google colab using GPU and then save it without the data parallel…I know, not very elegant…

yx0123 · August 11, 2020, 3:20pm

Hi, you can download the model file by going to this link https://s3.amazonaws.com/hello-tc/spacenet3-code/trained_models/model01.pth

Could you send me your code if it works with this model? Really appreciate your help!

This is the source code that was used to train the original models https://github.com/SpaceNetChallenge/RoadDetector/blob/master/pfr-solution/code/train.py
I’m using the same code but edited it to load the pretrained model instead of the DPN model, and froze the first few layers of the pretrained model.

yx0123 · August 12, 2020, 2:11am

I tried this and it works! I guess when I replaced model with model.module manually I missed out some of them, thanks!