As you note, most of the time it should make no difference.
You might have more than one model in the same optimizer, GAN’s typically use two separate optimizers, but you might have a feature extractor and two heads that are separate and you train them together. Here the optimizer contains more than just one model’s parameters.
Or just part of a model (e.g. finetuning a pretrained model). No need to zero grad for parameters that don’t require gradients and so cannot have one.
But I’d not worry about it too much. In my experience, it is more common to use optimizer.zero_grad(), so I’d probably take that unless you have a reason not to.