Hi,
I tried the tutorial notebook on Text Classification. It works well. However, I don’t understand if I don’t freeze any layer, there will be a problem in training step. More specifically:
/usr/local/lib/python3.7/dist-packages/opacus/optimizers/optimizer.py in clip_and_accumulate(self)
397 g.view(len(g), -1).norm(2, dim=-1) for g in self.grad_samples
398 ]
--> 399 per_sample_norms = torch.stack(per_param_norms, dim=1).norm(2, dim=1)
400 per_sample_clip_factor = (self.max_grad_norm / (per_sample_norms + 1e-6)).clamp(
401 max=1.0
RuntimeError: stack expects each tensor to be equal size, but got [8] at entry 0 and [1] at entry
Any idea ? Thanks
Hello @long21wt
Thank you for reporting this. This is likely a bug in our tutorial. Do you mind sending us your full stack error, along with our template Colab and post here the link?
Please paste your colab link here. Remember: SET IT TO PUBLIC
Thank you. Here is the link:
As far as I know, it seems like you would need to modify forwarding method of BERT ( lxuechen/private-transformers: make differentially private training of transformers easy (github.com))
And Roberta works out of the box with opacus in other experiments.
Best
Thanks for creating this. We are looking into this!
Hi,
After a while, I’m back to this issue. By printing the model’s parameters:
for n, p in model.named_parameters():
print("{:50s} {}".format(n, list(p.grad_sample.shape) if hasattr(p, "grad_sample") else None))
I found the position_embeddings
cause the problem to the optimizer, do you have any idea to fix this ?
_module.bert.embeddings.word_embeddings.weight [7, 28996, 768]
_module.bert.embeddings.position_embeddings.weight [1, 512, 768]
_module.bert.embeddings.token_type_embeddings.weight [7, 2, 768]
_module.bert.embeddings.LayerNorm.weight [7, 768]
_module.bert.embeddings.LayerNorm.bias [7, 768]
I will try to take a look. In the meantime, I believe that functorch can alleviate the issue because it computes per-sample gradients in a different way (using the “no_op” version of the grad sample module, see e.g. https://github.com/pytorch/opacus/blob/main/examples/cifar10.py)