I’m working with energy-based models, using contrastive divergence to train my model.
In each iteration, I’m generating samples from my model using the Markov chain Monte Carlo algorithm:
mcmc_samples = generate_samples(weights, n_samples)
Then I calculate the energy of these samples and the energy of my train samples. The loss is the delta between the mean of the energies.
The complete training process looks something like this:
weights = torch.nn.Parameter(...) optimizer = torch.optim.SGD([weights], lr=lr) for i in range(iterations): mcmc_samples = generate_samples(weights, n_samples) mcmc_energies = get_energies(weights, mcmc_samples) train_samples_energies = get_energies(weights, samples) loss = - mcmc_energies.mean() + train_samples_energies.mean() optimizer.zero_grad() loss.backward() optimizer.step()
This works well for small models. For large models I get a cuda out of memory exception when calling
I do not get an exception when I call
generate_samples with the same arguments outside the training process, when
weights is a tensor and not a parameter.
I understand that this is because more memory is required to store the data later needed for gradient calculation when using parameters. But in this case, the gradient is independent of the
Is there a way to tell PyTorch to ignore the
generate_samples line of code during the gradient calculation?
Maybe there’s a different approach to using PyTorch with Monte Carlo sampling?
Currently, I solved this by computing the gradients on my own and implementing the optimization step as well, but I would like to use PyTorch automatic differentiation and out-of-the-box optimizers.