Backward error on kl divergence

Hi, I’m trying to optimize a distribution using kl divergence. Here’s the code:

mu1 = torch.tensor([0.3, 0.9], requires_grad=True)
mu2 = torch.tensor([0.5, 0.5])
b1 = torch.distributions.Binomial(1,mu1)
b2 = torch.distributions.Binomial(1,mu2)
opt = torch.optim.Adam(params=[mu1])
kl = torch.distributions.kl_divergence

eps = 100
for i in range(eps):
    opt.zero_grad()
    l = kl(b1, b2).mean()
    l.backward()
    opt.step()

When I changed eps to 1, everything worked as normal. However if I increased eps, I got the following error:
“Trying to backward through the graph a second time, but the buffers have already been freed…”

Hi,

I think you should rebuild the computation graph in each iteration, since when calling .backward() it will clear up the graph built before, see more details from this thread.

1 Like

Yes, I think

b1 = torch.distributions.Binomial(1,mu1)
b2 = torch.distributions.Binomial(1,mu2)

should be in the for-loop.

1 Like

Hi Sebastian,

Can you please explain why these two distributions need to be reinstantiated on each forward pass?

My understanding was that the class definition doesnt execute any code that the graph depends on. The kl method does the forward pass. Why does class initiatilsation matter?

Oh I see its because mu_1 and mu_2 are parameters and every step depends on them. Thanks!

Actually no, I still dont see it. It looks like each call of the for-loop is independent of the previous call. What am I missing?