I am writing a Neural Network, which output is not used directly for the loss-function, but rather as the input for a simulation model. After the simulation ran, I am using the simulated_value and the real_value (target) to define pretty much a MSE-Loss.
In pseudo code my loss looks something like this:
def simulation_loss(sim_params=nn_model_output, batch=batch):
sim_value = RUN_SIMULATION(sim_params)
sim_value.requires_grad = True
real_value = batch['target']
loss = nn.MSE(input=sim_value, target=real_value)
return loss
It seems, that because I am cutting off the original gradient from nn_model_output, NN-model does not converge.
How can I run this sort of loss-function including a simulation model? Do I need to define a custom torch.autograd.Function with a backwards call? And if so, how should I pass the gradient, if there is no derivative from the simulation model possible? Or should I treat the simulation as an activation-function?
Yes, detaching a tensor from the computation graph and calling .requires_grad = True on the already detached tensor will not reattach it to the original computation graph.
Yes, if your simulation model is not using differentiable PyTorch operations.
I don’t know why no derivative is possible, but in this case you might not be able to backpropagate and compute gradients at all.
All activation functions are differentiable, so also unclear how to understand this statement.
The simulation model is a Finite-Element-Model running separately in MatLab. I am not quite sure, how a feasible derivative of the simulation model might look like. So, does that mean, this kind of combined computation between NN & Simulation Model is not possible?
I wouldn’t claim it’s impossible, but you would still need to figure out how the backward pass could be calculated. In the end you need to pass gradients to the simulation model (e.g. from the loss) and backpropagate it to the trainable PyTorch model.
You could try to “estimate” these gradients somehow if a direct computation is not possible, but the gradient has to flow through the simulation model somehow (unless you don’t want to backpropagate at all and want to use another optimization method).
I could use the discretized output from the simulation, use it to calculate through the difference the slope and therefore a simplified derivative. This might work.
As this is the first time I am writing such a custom torch.autograd.Function, can you please give me a rough structure of the backwards call? Thank you very much!
This tutorial shows you how the backward is implemented.
The backward method expects a grad_output tensor, representing the incoming gradient w.r.t. the layer’s output, calculates a new gradient w.r.t. its inputs, and returns it.