Below is a toy code.
(The actual model and USERFUNC are more complex.)
comp = network_output + 1j*network_output
comp = comp * 123
func_out = torch.abs(comp)
model = nn.Linear(1,1)
criterion = nn.MSELoss()
x = torch.randn(1,1)
out = model(x)
out_c = out.clone()
lossA = criterion(out, x)
out_prime = USERFUNC(out_c)
lossB = criterion(out_prime, x)
a = 0.5
loss = lossA + a * lossB
The toy code seems to work normally, but the actual learning result seems strange.
I think the cause is related to gradient.
If only loss A is applied, it will operate normally, but if loss B is applied, the result will be wrong.
I thought USERFUNC was wrong, but this should not be a problem because I checked that the input of USERFUNC was operated with a = torch.randn (3,256,256, device='cuda', requirements_grad=True).
What is the problem? I’ve been wandering this problem for months now…
Please let me know if you need more explanation.