Hi,
I’m training a conv neural network to recognize features in an image using a similar method to this paper: Stacked Hourglass
The targets I am using are artificial heatmaps generated from the feature (x, y) locations. The reason I am using heatmaps instead of single points is that I want the training to be more forgiving. I just need the heatmap’s maximum value near the feature location.
However, the network is training itself TO the heatmap - so I end up with a blobby looking output that isn’t recognizing what I want it to.
What I want to try instead is training the model to the L2 between the feature location, and the maximum value location of the convolutional network output.
What I need is help converting the network output to the right format.
input_image = [B, C, H, W]
target = [B, C, loc] # where loc = [y, x]
I can calculate the maximum no problem…
def maximum(tensor):
assert len(tensor.size()) == 4, "Tensor must be of size 4 [BxCxHxW]"
batches = []
for b in tensor:
joints = []
for c in b:
s = c.size()
maxValue = 0
ymax = 0
xmax = 0
for y in range(s[0]):
for x in range(s[1]):
value = c[y,x]
if value > maxValue:
maxValue = value
ymax = y
xmax = x
t = [ymax, xmax]
joints.append(t)
batches.append(joints)
maxT = torch.Tensor(batches)
return maxT
But that function breaks the backpropogation (I get this error:)
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Therefore I need to use the torch.max()
function, but I can’t for the life of me figure out how to do so, since it calculates maximums over only one dimension at a time…
So, how do I convert [B, C, H, W]
to [B, C, 2]
but maintain the gradient so the loss function and optimizer can operate normally?
Ideally I’d like for the model to also be exportable to ONNX and hopefully then to CoreML.
Thanks,