I use
for p in model.parameters():
p.requires_grad = False
to freeze a T5 model, but when I print parameters that require grad, there is still one parameter with the size 32000x512
. What is this? Is it embeddings matrix? Should I freeze it too? It seems backward gradients affect this one remaining parameter