So I think I found out what the cause for having weird error (see below) when trying to freeze layers using the following the following method (context manager also fails to freeze layers) is using DataParallel (model = nn.DataParallel(model)) across multiple GPUs. I’ve been running my model on 2 identical GPUs (gtx1080) and when I tried to freeze weights, I got the error shown below. When I don’t apply DataParallel and just use single GPU, freezing layers works like a charm as shown in countless examples across the internet and on this forum, but when DataParallel is used, it gives out a weird error (doesn’t make sense to me… the layers it should try to freeze are not leaf variables)
Could anyone help me understand what is happening and if this is a bug in pytorch? (regardless of whether it is or not, I’d like to know if there is a workaround)
for param in model.parameters(): param.requires_grad = False
RuntimeError: you can only change requires_grad flags of leaf variables. If you want to use a computed variable in a subgraph that doesn’t require differentiation use var_no_grad = var.detach().