Two issues I'm encountering while optimizing an Ellipse to a Circle as a beginner

MartensCedric · September 23, 2024, 2:49pm

I have pretty simple code

import torch
import random

image_width, image_height = 128, 128

def apply_ellipse_mask(img, pos, axes):
    mask = torch.zeros_like(img)
    for r in range(image_height):
        for c in range(image_width):
            val = ((c - pos[0])**2) / axes[0]**2 + ((r - pos[1])**2) / axes[1]**2
            assert not torch.isnan(val)
            mask[r][c] = torch.where(0.9 < val < 1, torch.tensor(1.0),  torch.tensor(0.0))

    return img * (1.0 - mask) + mask

random.seed(0xced)

sphere_radius = image_height / 3
sphere_position = torch.tensor([image_width / 2, image_height / 2 ,0], requires_grad=True)

ref_image = apply_ellipse_mask(torch.zeros(image_width, image_height, requires_grad=True), sphere_position, [sphere_radius, sphere_radius, sphere_radius])

ellipsoid_pos = torch.tensor([sphere_position[0], sphere_position[1], 0], requires_grad=True)
ellipsoid_axes = torch.tensor([image_width / 3 + (random.random() - 0.5) * image_width / 5, image_height / 3 + (random.random() - 0.5) * image_height / 5, image_height / 2], requires_grad=True)

optimizer = torch.optim.Adam([ellipsoid_axes], lr=0.1)
criterion = torch.nn.MSELoss()
for _ in range(100):

    optimizer.zero_grad()
    current_image = torch.zeros(image_width, image_height, requires_grad=True)
    current_image = apply_ellipse_mask(current_image, ellipsoid_pos, ellipsoid_axes)

    # mixed_img = (((ref_image + current_image) / 2.0) * 255).byte()
    # mixed_img = Image.fromarray(mixed_img.numpy(), mode='L')
    # mixed_img.show()
    loss = criterion(current_image, ref_image)
    loss.backward()
    print(_, loss)
    optimizer.step()

However, I’m encoutering two issues that as a Pytorch beginner I find hard to understand.

The first one is that this code throws an error.

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

I don’t understand why it would be trying to backward through the same graph a second time? or am I directly accessing saved tensors after they were freed? I don’t understand why this happens.

The other issue is the following:

ellipsoid_axes.grad is None for my function. I believe the issue arises in

apply_ellipse_mask in the line
mask[r][c] = torch.where(0.9 < val < 1, torch.tensor(1.0), torch.tensor(0.0)). How can I fix this?

KFrank · September 24, 2024, 1:11am

Hi Cedric!

You have several problems here, but let me start with a core
conceptual issue:

In order to compute your ellipse-mismatch loss function, you
compare two zero-one mask images. The problem is that
such a loss function isn’t (usefully) differentiable. As you vary,
for example, ellipsoid_axes, some pixel will remain at zero
over some range and then discontinuously change to one,
where it will remain over some further range. The derivative
of that pixel with respect to ellipsoid_axes will be zero over
the ranges where the pixel is constant and undefined (or inf,
if you prefer) where the pixel jumps from zero to one.

Pytorch uses gradient descent to minimize loss functions, but
when your gradient is zero (almost) everywhere, gradient
descent doesn’t have the information it needs to know in which
direction in parameter space to move to make the loss function
smaller.

(If you were to use some other algorithm that could minimize
your non-differential loss function effectively, you would be
able to fit your ellipse reasonably well.)

You could make your current general scheme work by letting
your “masked” pixel values vary continuously from zero to
one as you cross your 0.9 < val < 1 boundaries. Then your
MSELoss loss function would end up being differentiable, and
gradient descent would work to optimize it.

You define several tensors incorrectly by initializing them with
requires_grad=True. ref_image is one of them.

As a general rule, you only want to set requires_grad=True
for tensors that you will be training (optimizing) and that
therefore need to have gradients with respect to them
computed.

In your case these would be ellipsoid_pos and ellipsoid_axes.

This error is sort of moot, but it’s worth understanding why it
occurs in your specific (incorrect) code.

You run:

ref_image = apply_ellipse_mask(torch.zeros(image_width, image_height, requires_grad=True), sphere_position, [sphere_radius, sphere_radius, sphere_radius])

once, outside of your optimization loop. The (unnamed) tensor
to which you apply apply_ellipse_mask() is defined with
requires_grad = True and therefore when you call
loss.backward(), autograd backpropagates through
the call to apply_ellipse_mask(), releasing the part of the
computation graph that connects ref_image to the unnamed
tensor. Because ref_image is computed only once, outside
of the optimization loop, that part of the graph is not rebuilt
and you get your backward-a-second-time error when
loss.backward() is called a second time in the second
iteration of your optimization loop.

This is correct. val is connected to ellipsoid_axes in
pytorch’s computation graph, but, because your torch.where()
is not (usefully) differentiable, mask[r][c] is not connected
to val. So backpropagation never reaches ellipsoid_axes
and its gradient is not set.

As outlined above, make your loss function (and all steps
that lead up to it) (usefully) differentiable and loss.backward()
will backpropagate to ellipsoid_axes and populate its
gradient.

Best.

K. Frank