Updating the weights in a multioutput neural network

I guess the issue you are seeing in the previous example is due to using an optimizer with internal states such as Adam.
Even if the gradients are zero for certain parameters, Adam might still update them if valid running estimates were already created.
By setting the gradients to None you are changing this behavior and Adam will skip the parameter updates for all parameters where .grad==None.