Differentiable image indexing

I’m working on a neural network to predict an aperture which is rotating. The output values are the machine parameters of this aperture, but for calculating the loss, I translate these to an image in order to calculate a effective targeted 3D volume. Unfortunately, my algorithm for creating these images is not differentiable and I do not obtain any gradients for these. Depending on the output, image pixels are open (one) or closed (zero) (last line of code basically).

I’m using pytorch-lightning, but I do not think this is related to my problem since this index operation is not differentiable.

    img = torch.zeros(IMG_SIZE)
    
    # isocenter as image origin
    isocenter = 0.5 * torch.Tensor(IMG_SIZE)
        
    # calc image positions for every leaf pair. open 1 closed 0
    for leaf_index in range(NUM_LEAFS):
        leaf_pair = leaf[leaf_index, :]
        leaf_pair = torch.round(leaf_pair)
        x1 = isocenter[1] + leaf_pair[0] * IMG_RES[1]
        x2 = isocenter[1] + leaf_pair[1] * IMG_RES[1]
        
        x1 = x1.clamp(0, IMG_SIZE[1] - 1).int()
        x2 = x2.clamp(0, IMG_SIZE[1] - 1).int()
        y1 = leaf_index * IMG_RES[0]
        y2 = y1 + IMG_RES[0]
        img[y1:y2, x1:x2] = 1 
    

Does anyone have an idea how to overcome this problem? Any help is much appreciated, thanks!

I think you would need to be more specific in your problem setting, maybe you can enhance your code snippet to be self-contained (i.e. runnable), maybe even with just 1 leaf.

Also, if we think of the backward as a functional that maps dloss/doutput to dloss/dinput, you would need to say what output has a gradient and what input should have one, and maybe something about your loss. Indexing itself is discrete and thus cannot produce meaningful gradients, but obviously, people have found ways to formulate losses on coordinates (e.g. Intersection over Union) to be able to deal with the need to predict shapes. Very likely the solution involves some way of talking more about coordinates and less about pixels.

Best regards

Thomas

1 Like

Hi Thomas,

thanks for your answer, I added some details on my code in order to be runnable, I hope this is understandable and detailed enough since it contains the principle of this method/problem.

import torch
import matplotlib.pyplot as plt

closed = torch.Tensor([0.,0.])
aperture = torch.Tensor([-100,100])
leafs = torch.cat([closed.repeat(30,1), aperture.repeat(20,1), closed.repeat(30,1)])

IMG_SIZE = (400,400)
IMG_RES = (5,1)

img = torch.zeros(IMG_SIZE)
img = img.type_as(leafs)
    
# isocenter as image origin
isocenter = 0.5 * torch.Tensor(IMG_SIZE)
isocenter = isocenter.type_as(leafs)
        
for leaf_index in range(leafs.shape[0]):
     leaf_pair = leafs[leaf_index, :]
     leaf_pair = torch.round(leaf_pair)
     x1 = isocenter[1] + leaf_pair[0] * IMG_RES[1]
     x2 = isocenter[1] + leaf_pair[1] * IMG_RES[1]
     
     x1 = x1.clamp(0, IMG_SIZE[1] - 1).int()
     x2 = x2.clamp(0, IMG_SIZE[1] - 1).int()
     y1 = leaf_index * IMG_RES[0]
     y2 = y1 + IMG_RES[0]
     img[y1:y2, x1:x2] = 1
     
plt.imshow(img)

The procedure dealing with this images is more complex to obtain a 3D volume - but differentiable because it just involves stacking up the images and rotate it, in the end I sum up all cubes and get an effective volume. Considering that I actually need this binary images in order to calculate these steps.

I do have the real aperture values, so I would do this for the real and the predicted apertures and take the L1/L2 loss of this 3D volume. I already did training on predicting the raw parameters which worked out well.

I’ve already had a look on methods proposed in other threads concerning my issue like grid sampling or intersection over union as you’ve suggested. However, I do not think this can be applied in my case since the actual image is only an intermediate step. There might be no real workaround which would preserve the postprocessing of these images.

Best regards,
Marco

I must admit I don’t understand how you’d go from pixels to loss.
If you later compare where the prediction “has 1” and where the target does, you’re right in the setting that IoU would solve.

The process from the aperture images going to the resulting 3D volume for which I would calculate the loss is a bit more coding effort involving rotations and summing up each volume of one step of the rotation. I think this approach would be considerable - since the resulting volume is highly irregular, calculating IoU will be more complex. I will check this out. Thanks Thomas for your effort.