I want to find the gradient with respect fo a function w, where w is an MLP that selects part of my input data to use. Then I use another neural network for normal training. The psuedocode would be something like:
x_train, y_train =w(training_data)
output = model(x_train)
update(model)
update(w)
How do I make sure that w is included in propagation?