# A More Forgiving Vision NN Model?

Hi,

I’m training a conv neural network to recognize features in an image using a similar method to this paper: Stacked Hourglass

The targets I am using are artificial heatmaps generated from the feature (x, y) locations. The reason I am using heatmaps instead of single points is that I want the training to be more forgiving. I just need the heatmap’s maximum value near the feature location.

However, the network is training itself TO the heatmap - so I end up with a blobby looking output that isn’t recognizing what I want it to.

What I want to try instead is training the model to the L2 between the feature location, and the maximum value location of the convolutional network output.

What I need is help converting the network output to the right format.

``````input_image = [B, C, H, W]
target = [B, C, loc] # where loc = [y, x]
``````

I can calculate the maximum no problem…

``````def maximum(tensor):
assert len(tensor.size()) == 4, "Tensor must be of size 4 [BxCxHxW]"

batches = []
for b in tensor:
joints = []
for c in b:
s = c.size()
maxValue = 0
ymax = 0
xmax = 0
for y in range(s):
for x in range(s):
value = c[y,x]
if value > maxValue:
maxValue = value
ymax = y
xmax = x
t = [ymax, xmax]
joints.append(t)
batches.append(joints)

maxT = torch.Tensor(batches)
return maxT
``````

But that function breaks the backpropogation (I get this error:)

``````RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
``````

Therefore I need to use the `torch.max()` function, but I can’t for the life of me figure out how to do so, since it calculates maximums over only one dimension at a time…

So, how do I convert `[B, C, H, W]` to `[B, C, 2]` but maintain the gradient so the loss function and optimizer can operate normally?

Ideally I’d like for the model to also be exportable to ONNX and hopefully then to CoreML.

Thanks,

• Use `max(inp.view(B, C, -1), 2)` and `numpy.unravel_index` (or just `//` and `%`) after.
• You cannot differentiate argmax, it’s mathematically impossible. People suggested various things (e.g. with softargmax or argsoftmax - I cannot remember - that was defined as `torch.arange(W*H).view(1, 1, W * H)*torch.softmax(inp.view(B, C, -1), dim=2)` or somesuch. It’s not always entirely clear how to interpret that, but it seems that some people like it.