So i used the MobileNetV3 architecture to train an object detector with heat maps instead of bounding boxes.
I didn’t change anything in the MobileNetV3 arch except raising the number of output neurons to 2704 so it can be resized to 52x52 image with a sigmoid activation function , this should enable me to subtract the heat maps from the output with MSE loss function and get a good results from the model.
The problem that this doesn’t happen and after some research i found out that i need to give weights for the background pixels less than the foreground ones and my question how i can do something like this ?
And if someone tried training with heatmaps, is my configuration good or there is something better ?
I haven’t studied the architecture of MobileNetv3, but to generate heatmaps(or in general activation maps as such), a fully convolutional network should suffice where there is a feature extraction stem followed by a 1x1 conv to bring down the number of channels to 1 keeping the spatial dims as it is.
Coming to the problem of class imbalance, you can counter it using Balanced/Weighted Cross Entropy or Dice loss. These loss functions are generally used to tackle class imbalance in detection/segmentation tasks.
Are you sure that the link says Cross Entropy is not a good choice? Nevertheless, Balanced Cross Entropy/Dice loss has shown good results for problems like these and it is always better to try and see before arriving at a conclusion.
Check out EAST paper: https://arxiv.org/abs/1704.03155