I have a similar problem and my current solution is to write my own apply functions, using named_modules() and named_parameters(), which adds filtering by name or class for module and name for parameters.
Weight initialisation methods like xavier_normal_() won’t work on BatchNorm2d, even though they have ‘weight’ parameters, because they are onl;y 1d tensors.
An additional question is, can parameters that can be initialised be uniquely identified?
I suppose I could have a tensor shape check and a specific apply method for weight initialisation, or a lambda test, but I’m still not sure if every parameter named ‘weight’ with 2 or more dimensions can be initialised this way.