Backward error on kl divergence

Hung_Nguyen · March 17, 2019, 3:36pm

Hi, I’m trying to optimize a distribution using kl divergence. Here’s the code:

mu1 = torch.tensor([0.3, 0.9], requires_grad=True)
mu2 = torch.tensor([0.5, 0.5])
b1 = torch.distributions.Binomial(1,mu1)
b2 = torch.distributions.Binomial(1,mu2)
opt = torch.optim.Adam(params=[mu1])
kl = torch.distributions.kl_divergence

eps = 100
for i in range(eps):
    opt.zero_grad()
    l = kl(b1, b2).mean()
    l.backward()
    opt.step()

When I changed eps to 1, everything worked as normal. However if I increased eps, I got the following error:
“Trying to backward through the graph a second time, but the buffers have already been freed…”

MariosOreo · March 18, 2019, 3:12am

Hi,

I think you should rebuild the computation graph in each iteration, since when calling .backward() it will clear up the graph built before, see more details from this thread.

rasbt · March 18, 2019, 5:28am

Yes, I think

b1 = torch.distributions.Binomial(1,mu1)
b2 = torch.distributions.Binomial(1,mu2)

should be in the for-loop.

Raza_Habib · June 4, 2020, 5:26pm

Hi Sebastian,

Can you please explain why these two distributions need to be reinstantiated on each forward pass?

My understanding was that the class definition doesnt execute any code that the graph depends on. The kl method does the forward pass. Why does class initiatilsation matter?

Raza_Habib · June 4, 2020, 5:26pm

Oh I see its because mu_1 and mu_2 are parameters and every step depends on them. Thanks!

Raza_Habib · June 4, 2020, 5:28pm

Actually no, I still dont see it. It looks like each call of the for-loop is independent of the previous call. What am I missing?