So i used the MobileNetV3 architecture to train an object detector with heat maps instead of bounding boxes.
I didn’t change anything in the MobileNetV3 arch except raising the number of output neurons to 2704 so it can be resized to 52x52 image with a sigmoid activation function , this should enable me to subtract the heat maps from the output with MSE loss function and get a good results from the model.
The problem that this doesn’t happen and after some research i found out that i need to give weights for the background pixels less than the foreground ones and my question how i can do something like this ?
And if someone tried training with heatmaps, is my configuration good or there is something better ?
I haven’t studied the architecture of MobileNetv3, but to generate heatmaps(or in general activation maps as such), a fully convolutional network should suffice where there is a feature extraction stem followed by a 1x1 conv to bring down the number of channels to 1 keeping the spatial dims as it is.
Coming to the problem of class imbalance, you can counter it using
Balanced/Weighted Cross Entropy or
Dice loss. These loss functions are generally used to tackle class imbalance in detection/segmentation tasks.
yeah, i made it this way but through a Linear layer and resizing it.
i thought about cross entropy but the link here didn’t advice it.
Are you sure that the link says
Cross Entropy is not a good choice? Nevertheless,
Balanced Cross Entropy/Dice loss has shown good results for problems like these and it is always better to try and see before arriving at a conclusion.
Check out EAST paper: https://arxiv.org/abs/1704.03155
Got this error, has any clues for a solution ?
Expected object of scalar type Long but got scalar type Float for argument #2 'target' in call to _thnn_nll_loss2d_forward
Passing ground truth tensor of type
float to loss function?
i solved it, it seems Cross Entropy only accepts
Long as data type