No gradient in layers text classification tutorial


I was running the tutorial on text classification, exactly as in opacus/building_text_classifier.ipynb at master · pytorch/opacus · GitHub, but I get the following error when I try to train:

AttributeError: The following layers do not have gradients: [‘module.bert.encoder.layer.11.attention.self.query.weight’, ‘module.bert.encoder.layer.11.attention.self.query.bias’, ‘module.bert.encoder.layer.11.attention.self.key.weight’, ‘module.bert.encoder.layer.11.attention.self.key.bias’, ‘module.bert.encoder.layer.11.attention.self.value.weight’, ‘module.bert.encoder.layer.11.attention.self.value.bias’, ‘module.bert.encoder.layer.11.attention.output.dense.weight’, ‘module.bert.encoder.layer.11.attention.output.dense.bias’, ‘module.bert.encoder.layer.11.attention.output.LayerNorm.weight’, ‘module.bert.encoder.layer.11.attention.output.LayerNorm.bias’, ‘module.bert.encoder.layer.11.intermediate.dense.weight’, ‘module.bert.encoder.layer.11.intermediate.dense.bias’, ‘module.bert.encoder.layer.11.output.dense.weight’, ‘module.bert.encoder.layer.11.output.dense.bias’, ‘module.bert.encoder.layer.11.output.LayerNorm.weight’, ‘module.bert.encoder.layer.11.output.LayerNorm.bias’, ‘module.bert.pooler.dense.weight’, ‘module.bert.pooler.dense.bias’, ‘module.classifier.weight’, ‘module.classifier.bias’]. Are you sure they were included in the backward pass?

Could someone help me understand why this is happening?
I’m on ubuntu and am using python 3.8.5


Based on cell 8 it seems you are freezing some layers and train only others:

trainable_layers = [model.bert.encoder.layer[-1], model.bert.pooler, model.classifier]
total_params = 0
trainable_params = 0

for p in model.parameters():
        p.requires_grad = False
        total_params += p.numel()

for layer in trainable_layers:
    for p in layer.parameters():
        p.requires_grad = True
        trainable_params += p.numel()

print(f"Total parameters count: {total_params}") # ~108M
print(f"Trainable parameters count: {trainable_params}") # ~7M

so I would assume that the frozen parameters do not have valid gradients.
I’m however unsure where this message is raised from and if it’s an error etc. so could you explain the issue a bit more?

Hmm makes sense. The issue arises when virtual_step() is called:
… in
… line 282, in virtual_step
… line 435, in virtual_step
… line 179, in clip_and_accumulate
… line 263, in _named_grad_samples
where the error is thrown

I’m unsure what virtual_step() does and assume it’s coming from a 3rd party library?
Do you know, if this method expects all .grad attributes to be set and if so, could you filter the frozen parameters out while passing them to the optimizer?

Hi @anna_l !
Thanks for your question and for taking interest in opacus.

I’d need some more info to be able to help, as I wasn’t able to reproduce the issue in my setup.

  • Can you please share which versions of transformers and opacus are you using?
  • Does the error happen on the first training iteration or later?

To comment on some of the discussion points above:

  • virtual_step() is a method defined in PrivacyEngine in opacus. It a way to simulate large batches without heavy memory footprint.
  • In our tutorial we indeed freeze some layers, as correctly pointed out. However, the error above lists trainable layers as not having gradients, which is not what should happen. (e.g. bert.encoder.layer.11 is bert.encoder.layer[-1])

Hi @ffuuugor, pardon slow reply. It happens on the first training iteration, my transformers version is 4.6.1

Sorry, but I’m still having trouble reproducing the issue.
I’ve tried multiple package versions (opacus 0.13, 0.14, master), but none produce the error you’ve described.

Can you maybe share a Colab notebook with the error to help find the reason?

PS: While investigating this we’ve found and fixed quite bad memory inefficiency, so thanks for pointing that way :slight_smile: