I am currently learning about hyperparameter optimization in Neural Networks and also learned about techniques like Hyperparamter Gradient Descent.
I am actually curious, if this method (Hyperparamter Gradient Descent) could be used to optimize things like padding and stride in a backward pass in PyTorch? If not, how do you alternatively tune these parameters if you are training a model using PyTroch?