I have some images “X” which I pass through a network that outputs some boundary boxes. Based on these boundary boxes I crop “X” and get “cropped X”. The “cropped X” is reduced to grayscale and the mean is computed (per image).

In case the mean is < 0.8, I change it to 0, otherwise, it keeps its value. This is then used with a SmoothL1Loss with the hopes that I can train the network to output boundary boxes that contain some content and it’s not blank, i.e. their grayscale mean is lower than 0.8.

Up until the computation of the boundary boxes, I understand the gradient flow. From there on, I would like some advice/insights.

I slice the images “X” based on the boundary boxes, i.e.:

temp_x_2 = temp_x[:, int(bbxes[i, j, 2]):int(bbxes[i, j, 2] + bbxes[i, j, 0]),

int(bbxes[i, j, 3]):int(bbxes[i, j, 3] + bbxes[i, j, 1])]

(A) Do I need to set the “requires_grad” to True for temp_x? Note that bbxes have requires_grad set to True.

Following that, I manually convert them to grayscale and compute the mean, all using torch operations, i.e.:

temp_x_2[0, :, :] = temp_x_2[0, :, :] * 299/1000

temp_x_2[1, :, :] = temp_x_2[1, :, :] * 587/1000

temp_x_2[2, :, :] = temp_x_2[2, :, :] * 114/1000

temp_x_2 = torch.sum(temp_x_2, dim=0)

… = torch.mean(temp_x_2)

(B) Does this look (1) right and (2) efficient? I avoid pillow operations to keep the gradient flow going.

Finally, when I pass the means into SmoothL1Loss with the target set to 0, I get “leaf variable has been moved into the graph interior”. © Why is this the case?

Any insights & advice would be greatly appreciated!