Hi all, Im having difficults to do a differential machine learning model, because it seems im having problem in the training part, more specially in the loss part.
The differential ML model has this loss:
loss = 0.5 * self.criterion(output, label) + 0.5 * self.criterion(gradients * self.lambda_j, dydx_batch * self.lambda_j)
While the stantard ML model has this one:
loss = self.criterion(output, label)
In the diff one, when I use:
gradients = torch.autograd.grad(outputs=output,
inputs=input,
grad_outputs=output.data.new(output.shape).**fill_(0)**,
create_graph=True,
retain_graph=True,
allow_unused=True)[0]
i have the same result of the standard model, and the same result if i calculate the gradients as:
output.sum().backward(retain_graph=True)
gradients = input.grad
In the other hand, when i do:
gradients = torch.autograd.grad(outputs=output,
inputs=input,
grad_outputs=output.data.new(output.shape).**fill_(1)**,
create_graph=True,
retain_graph=True,
allow_unused=True)[0]
the result is different but the prevision is a little bit worst…
So, whats the reason to use fill_(1), because i tested other values, for example, 2 or .-1, and the results where even worst?
And the other doubt: whats the difference using one way to calculate gradients and the other?