Train a custom parameter vector

I want to optimize a custom vector of parameters, which are used to generate a 2D-matrix.
This matrix is then compared to a ground truth image. Basically, it’s similar to a Generative network, except the model itself doesn’t generate the image.

So, I have something like this:

class DipoleModel(torch.nn.Module):
  def __init__(self):
    super().__init__() 
    # the trainable vector of parameters
    self.weight = torch.nn.Parameter(torch.rand(9))        
      
  def forward(self, input_image):       
    canvas = np.zeros((input_image.shape[0], input_image.shape[1]))        
    
    x, y, z = self.weight[:3]
    pitch, yaw, roll = self.weight[3:6]
    w, h = self.weight[6:8]
    m = self.weight[-1]
    
    # generate the image (numpy 2D array)
    output_image = some_package.generate_image(m=m, position=(x,y,z), size=(w, h), rotation=(pitch, yaw, roll))                
    output_image = torch.tensor(output_image, requires_grad=True)
    
    return output_image     

I have to set requires_grad=True to my output_image here, because otherwise I get the error:

element 0 of tensors does not require grad and does not have a grad_fn

Anyway, then it should be trained something like this:

model = DipoleModel()
model.train()

optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=0.1)
loss_function = torch.nn.MSELoss()

# ground truth image
input_image = torch.rand((320, 320))

for epoch in range(100):
  optimizer.zero_grad()
  output_image = model(input_image)
  loss = loss_function(input_image, output_image)
  loss.backward()
  optimizer.step()

But the optimizer doesn’t do anything, the parameters stay the same and don’t change, and the loss doesn’t change either.

What would be the correct way to do this?

Creating a new tensor via:

output_image = torch.tensor(output_image, requires_grad=True)

will detach it from the computation graph. Setting requires_grad=True does not change it and will track all differentiable operations involving this tensors after this line of code.
If some_package.generate_image is not using PyTorch you would need to implement the backward passes via custom autograd.Functions as described here.

Ok, that is a good lead, thank you. I’ve implemented something like this, based on the example you referenced:

class DipoleModel(torch.autograd.Function):     
    @staticmethod
    def forward(self, input_vector):          
        output_image = some_package.generate_image(input_vector)
        output_image = torch.tensor(output_image, dtype=torch.float32)
        return output_image
    
    @staticmethod
    def backward(ctx, grad_output):      
        # grad_output is same shape as image
        # do something with the grad_output?
        pass
        
        
# Initialize parameters
initial_parameters = torch.nn.Parameter(torch.rand(9), requires_grad=True)

# Target image
target_image = torch.rand((320, 320), dtype=torch.float32, requires_grad=True)

# Loss function
criterion = torch.nn.MSELoss()

num_epochs = 100

lr = 0.01

# Optimization loop
pbar = tqdm(range(num_epochs))
for epoch in pbar:  # replace with the number of epochs you want

    generated_image = DipoleModel.apply(initial_parameters)
    
    loss = criterion(generated_image, target_image)    
    loss.backward()
    
    with torch.no_grad():
        initial_parameters -= lr * initial_parameters.grad        
        initial_parameters.grad = None

The main problem now, is because from PyTorch’s point of view, that parameter vector doesn’t directly participate in the generation of the image, it doesn’t know what the connection between the image and the vector is.

And when I compute the gradient, it (grad_output) has the same shape as the generated and target images. In this case, it’s 320 x 320, which I somehow have to compress into a 9 x 1 vector to update the weights.

But because there’s no direct connection between the two, it’s quite tricky.