I added several attention mechanisms (SE,CBAM,CA,ECA) to the backbone (resnet50) in the centernet model, but all the results are within 1% (F1) of what they were before the addition.
I tried to add the attention mechanisms after the convolution of layers1-4, or the first and last layers in the code below, but all the results also changed very little.
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x) # 128x128x64 → 128x128x256
x = self.layer2(x) # 128x128x256 → 64x64x512
x = self.layer3(x) # 64x64x512 → 32x32x1024
x = self.layer4(x)
x = self.avgpool(x)
x = x.view(x.size(0), -1)
x = self.fc(x)
But I read in some papers that they also added attention to the backbone （resnet）, and ablation experiments showed a 2-3% improvement just by adding attention.
What data/task are your testing the approach on? How do you measure the performance? Are you using the same metrics as the original papers? There are many things that could be going on.
Are you testing the approach on the same data as the original papers were? Could you please refer us to these papers and provide your implementation?
Note that a large amount of academic research overestimates the performance gains. There are many reasons for this, but one of them is that papers which do not show large improvements are simply not accepted as often. Hence, the survivor bias leads to over-optimistic evaluations of newly proposed methods.
Thank you for your response. I’m currently working on table structure recognition, which is a less common class of target detection tasks that detect given cells in a table. The model I am using is the centernet model proposed in 2019, which has been widely used in target detection in recent years. I used a publicly available dataset WTW.
in I refer to a number of articles published in general journals that also utilize a modified centernet for detection of various objects, the difference is that they all have some minor tweaks to the network, and the same is that they all add various attention mechanisms .
The metrics used are all f1 scores and AP values, which are very common in object detection.
Since I’m still in my first year of graduate school just starting to work on this direction, I wanted to start with some general improvements to my model. I read in one of the top conference journals of 2021 that they made a significant contribution to table structure recognition using an improved centernet, and the same this year,which is why I chose this as the base model. Indeed the original article’s overestimation of its own performance can be felt in some of the reproduction experiments.
Now I do have some bold ideas for improvements, but they are difficult to implement in python code, so I would like to start by writing a normal journal, using some common modules, as in many small journals, to fulfill the task given to me by my mentor.
I’d like to try some slightly more complex changes next, maybe I should beef up my programming skills first