I ran into this issue while working on CT image reconstruction. I have written a small script (attached) which will reproduce the issue.

script: https://colab.research.google.com/drive/1k6QTHNMnA1OJfuRWfXfOil1ZP5yxsJrz

What I am doing in this script is the following: I am loading the MNIST images and applying the ray transform operator on them. Let’s denote the ground-truth images by x_0 and their CT measurement as y = A(x_0). I am creating the Radon transform op. A and then wrapping it as a PyTorch module using ODL (https://github.com/odlgroup/odl). Subsequently, I am computing an estimate of x_0 by minimizing J(x) = || y - A(x) ||_2^2 w.r.t. x (basically, running Landweber-type iterations, but using Adam instead of SGD). Surprisingly, the gradient of J(x) w.r.t x goes to 0 just after 1 iteration, regardless of the initialization (which is random Gaussian, btw). The problem, apparently, is not with the optimizer setting in Pytorch, because, I just tested the minimization of J_1(x) = || x_0 - x ||_2^2 for a sanity check and it works perfectly fine. I tested with a different forward operator A’ (namely, the forward operator that removes a set of pixels from the image and retains the rest), and there it works perfectly fine as well.I would really appreciate any help in fixing this particularly strange issue. Thanks in advance.

P.S. It is surprising that although the gradient vanishes, the cost keeps changing.