Weights are not updated when using custom function before loss calculation

Hello community,

I am trying to implement a model-based GAN, where I try to predict simulation parameters from the generator and use the generator’s output to calculate simulations from a custom monte Carlo simulation function. I provide these simulated images to the discriminator with true images to calculate the loss.

I then try to update both generator and discriminator based on the losses calculated in the above stage.

The problem with this is, weights of the generator are not updating.

  1. Is this approach feasible for PyTorch, where I have to break the graph on purpose?

  2. Is there a better way to address this issue?

  3. Is it possible if I define my simulation script as a custom torch layer?

I am attaching a simple example flow for reference.

class Discriminator(nn.Module):
    def __init__(self, in_features):
        super().__init__()
        self.disc = nn.Sequential(
            nn.Linear(in_features, 128),
            nn.ReLU(),
            nn.Linear(128, 1),
            nn.Sigmoid(),
        )

    def forward(self, x):
        return self.disc(x)


class Generator(nn.Module):
    def __init__(self, n_params, scalar, test_transform=None):
        super().__init__()
        self.scalar = scalar
        self.test_transform = test_transform

        self.gen = nn.Sequential(
            nn.Linear(n_params, 2 * n_params),
            nn.ReLU(),
            nn.Linear(2 * n_params, 2 * n_params),
            nn.ReLU(),
            nn.Linear(2 * n_params, n_params),
            nn.Tanh(),
        )

    def forward(self, x):
        p1 = self.gen(x)
        params = self.scalar.inverse_transform(p1.clone().detach().numpy())
        params = p1 + torch.tensor([100, 1.0e-12, 100, 1e-3, -1 * math.pi / 180])
        # Generate simulation
        im = simulate_image(params[0])
        fake = self.test_transform(image=np.swapaxes(np.array(im), 1, -1))
        fake = torch.from_numpy(fake["image"])
        fake = fake.contiguous()
        return fake

I also tried without converting my p1 Tensor to numpy and directly giving tensor to my simulation function. But that also didn’t work.

Hi Truss!

Probably not. To make pytorch’s autograd and gradient-descent-based
optimizers work, you would have to “unbreak” the graph, patching it
back together somehow.

The core problem is that it is hard to make the result of a Monte Carlo
simulation differentiable with respect to your model parameters.

Consider the derivative of the area of a circle with respect to its radius.
Now consider calculating the area with a Monte Carlo “simulation”
(integration). Sample two-dimensional points uniformly from the square
x in [-1.0, 1.0]; y in [-1.0, 1.0] and count the fraction of
those points that fall in a circle of radius r <= 1.0. You then have
area = 4 * fraction. In this scheme, how would you calculate the
derivative of area with respect to r?

You can imagine generating “simple” random numbers (say uniform
or normal), using them to drive a Monte-Carlo process that is
implemented with pytorch differentiable tensor functions, and then
using autograd to compute the gradient of some final loss function
with respect to parameters that characterize the Monte-Carlo process
(as well as with respect to the random numbers, if you so choose).

But whether you can do this in practice depends on the details of
your Monte-Carlo process.

Yes, in principle. (But it may be impractical or impossible for your use
case.)

You would do this by defining a custom autograd function (your “custom
torch layer”).

It will have a forward() method that can do most anything – numpy
calculations, a Monte Carlo simulation, any sort of black box. But it
also has to have a matching backward() method that calculates the
gradient (or an approximation to, or surrogate for the gradient) of the
output of forward() with respect to its input.

It’s this backward() method that “patches the graph back together.”

Best.

K. Frank

Thank you @KFrank for the detailed answer.
My assumptions are also similar for 1 and 2.
I will try approach 3 and it might work as you suggested.
I don’t require gradients for the simulation layer as I just want to update params prediction network.
Does it make sense if I give precalculated dummy gradients in backward function of the custom layer?
Thank you in advance.!

Hi Truss!

I don’t really understand what you are saying here.

But let me ask this question: Should the update you apply to your
prediction-network parameters depend in any way on the output
of the simulation layer? If not, I can’t see any reason that you would
need to run the simulation layer to perform the prediction-network
update. If so, it would seem that you would have to backpropagate
gradients of your final loss function (that depends on your simulation
layer) through the simulation layer back to the prediction network.

This doesn’t make sense to me. If your custom simulation layer
matters, dummy gradients will obscure its structure and make it not
matter to the prediction-layer update. (And if your simulation layer
doesn’t matter, you should be able to update your prediction layer
based solely on its output without passing anything – forward or
backward – through the simulation layer.)

At a conceptual level – ignoring gradients and specific optimization
algorithms – your loss function is supposed to tell you how good a
job the parameters of your prediction network are doing. What role
does your simulation play in telling you whether your prediction
network is performing well or poorly? Or to ask the same question
more concretely, how does your final loss function depend on your
simulation layer and its output?

Best.

K. Frank

The flow of my network is in a way where parameters are learnt in neural network. In a simple way, it can be compared with normal GAN where generator here does not generate image but just bunch of parameters. These parameters are updated by calculating losses by comparing simulated image with true image in discriminator.

Simulator generates images from learnt parameters from my prediction network (Generator), and this simulated image from my simulation algorithm is then compared with true image in discriminator. Losses are also calculated here. So, in a way, network is not important at all in simulation stage. I hope this answers it. But when I tried developing network like in my sample code, it is not updating any weights for my prediction parameters part. (Generator class weights)

To summarize,

gen = Generator()
learnt_params = gen(noisy_input)
#Generate simulated image
fake= simulate_image(learnt_params)
disc_real = disc(real).view(-1)
lossD_real = criterion(disc_real, torch.ones_like(disc_real))
disc_fake = disc(fake).view(-1)

 lossD_fake = criterion(disc_fake, torch.zeros_like(disc_fake))
 lossD = (lossD_real + lossD_fake) / 2
 disc.zero_grad()
 lossD.backward(retain_graph=True)
 opt_disc.step()
### Train Generator: min log(1 - D(G(z))) <-> max log(D(G(z))
output = disc(fake).view(-1)
lossG = criterion(output, torch.ones_like(output))

I hope the issue is more clear now.
Thank yo

Yes, simulated image generated using simulation layer (only numpy functions) will be compared with real image in discriminator stage to train it and then use discriminator to calculate losses for generator.
In simple words, I am using actual GAN structure, but instead of generator generating whole image from noise, I use generator to generate parameters which I use in simulation layer.
I hope this is more clear now.

Role of simulation is just to simulate image from predicted parameters, but it helps calculating losses as I explained before. I dont have any real known values for parameters and I can only compare real images and simulated images for their distribution matching.
Here is the sample code…

fake_params = gen(params)
simulated_image = simulate_image(fake_params) #consists of only numpy functions
disc_real = disc(real)
disc_fake = disc(fake)
lossD_real = criterion(disc_real, torch.ones_like(disc_real))
lossD_fake = criterion(disc_fake, torch.zeros_like(disc_fake))
lossD = (lossD_real + lossD_fake) / 2
disc.zero_grad()
lossD.backward(retain_graph=True)
opt_disc.step()
output = disc(fake)
lossG = criterion(output, torch.ones_like(output))
gen.zero_grad()
lossG.backward()
opt_gen.step()

I hope now the connection and role of the simulator are more clear.

Hi Truss!

Reading between the lines, you are, as I understand it, trying to train
the discriminator to tell simulated images apart from real ones, and to
train the generator to generate simulation parameters that lead to
simulated images that are hard to tell apart from real ones.

So, yes, in order to update the generator “weights” you do need to
backpropagate the final discriminator loss back through the simulator.
This is the only way the generator knows whether it did a good job
of generating parameters that produce simulated images that look
realistic to the discriminator.

The simulation process is an important, non-trivial part of how the
generator “weights” produce the simulated images that are used to
calculate the final loss, so to train the generator “weights” you have
to backpropagate through it properly.

Best.

K. Frank