Hi, I’m trying to optimize a distribution using kl divergence. Here’s the code:

mu1 = torch.tensor([0.3, 0.9], requires_grad=True)
mu2 = torch.tensor([0.5, 0.5])
b1 = torch.distributions.Binomial(1,mu1)
b2 = torch.distributions.Binomial(1,mu2)
opt = torch.optim.Adam(params=[mu1])
kl = torch.distributions.kl_divergence
eps = 100
for i in range(eps):
opt.zero_grad()
l = kl(b1, b2).mean()
l.backward()
opt.step()

When I changed eps to 1, everything worked as normal. However if I increased eps, I got the following error:
“Trying to backward through the graph a second time, but the buffers have already been freed…”

I think you should rebuild the computation graph in each iteration, since when calling .backward() it will clear up the graph built before, see more details from this thread.

Can you please explain why these two distributions need to be reinstantiated on each forward pass?

My understanding was that the class definition doesnt execute any code that the graph depends on. The kl method does the forward pass. Why does class initiatilsation matter?