You’re not using an optimizer with momentum by chance?
For those, the momentum will cause updates even when the gradients are zero.
(It’s also doing funny things to the statistics, probably, but with dropout we rarely think about it too much.)
As a trick you can (at least you could last time I checked) set the gradients to “None” instead of just zeroing them and then the parameters won’t be updated.
Some people like TensorboardX.
Best regards
Thomas