Hello!
This is a snippet of my code (create_graph
is required in my use case, and energy
is a big CNN)
for _ in range(n_steps):
sample_energy = energy.forward(sample)
sample_score = grad(sample_energy.sum(), sample, create_graph=True)[0]
value, sample = f(value, sample, sample_energy, sample_score)
loss = g(value)
loss.backward()
optimizer.step()
optimizer.zero_grad()
What are my options to reduce the memory footprint of these model + grad calls?
f
and g
perform common, uninteresting Tensor
operations like multiplication, summation, etc …