Trying to understand significance of same convolution in CenterNet heatmap generation

Hello altruists,
I am new to this domain and trying to understand a block of model. I am looking into the code for a paper named “CenterNet: Objects as Points”. Let’s say we have feature from the backbone and the shape of feature = [12, 256, 100, 100]. Now, this feature is fed into the below block:

bbox_tower= Sequential(
        Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        GroupNorm(32, 256, eps=1e-05, affine=True)
        (ReLU()
        Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        GroupNorm(32, 256, eps=1e-05, affine=True)
         ReLU()
         Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
         GroupNorm(32, 256, eps=1e-05, affine=True)
        ReLU()
        Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (GroupNorm(32, 256, eps=1e-05, affine=True)
        ReLU()
      )

So, in this block we are doing same convolution 4 times with Group Normalization. I am trying to understand what we are expecting after doing the same convolutions on the feature.

Then the output from the bbox_tower is passed to the below block:

agn_hm= Conv2d(256, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))

I am assuming in the above block class agnostic heatmap is generated from the bbox_tower output. Am I right?

Could anyone please give me a brief idea about the working mechanism of the above code snippet and guide me how exactly we are achieving class agnostic heatmap from the above two module.

Thanks in advance!!!