How to use DataParallel with some layers fixed?

I’m trying to finetune a model with 2 GPUs. And I want to fix some layers and train the rest parameters.
I used DataParallel to support 2 GPUs computation but only 1 GPU is occupied.

Here is part of my code:

model_ft = torch.nn.DataParallel(model, device_ids = [0, 1])
model_ft = model_ft.cuda()
fix = torch.nn.Sequential(*list(list(list(model_ft.children())[0].children())[0].children())[:6]).parameters()
base = torch.nn.Sequential(*list(list(list(model_ft.children())[0].children())[0].children())[6:]).parameters()
boost = torch.nn.Sequential(*list(list(model_ft.children())[0].children())[1:]).parameters()

for param in fix:
    param.requires_grad = False

optimizer_ft = torch.optim.SGD([
    {'params': base, 'lr': 0.001},
    {'params': boost, 'lr': 0.001}
], lr=0.001, momentum=0)

model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler, num_epochs=70)

using ‘nvidia-smi’ I can see that only 1 GPU is occupied and another GPU is free.
But if I don’t fix these layers, DataParallel works fine.

1 Like