Any suggestions for regression task in deep learning?

background

I am studying how to use video data to predict temporal regression value. Firstly, I tried using some backbone with pretrained parameters. Here is a part of my code.

class Mynet_LSTM(nn.Module):
   def __init__(self, model, num_classes):
       super().__init__()
       self.featureModel = nn.Sequential(*list(model.children())[:-1])
       self.linear = nn.Sequential(nn.LayerNorm(2048), nn.Linear(2048, 256))
       self.rnn = nn.LSTM(256, 128, 2, bidirectional=True, dropout=0.3)
       self.classifier = nn.Sequential(nn.LayerNorm(256), nn.Linear(256, num_classes))

       for m in self.modules():
           if isinstance(m, nn.Linear):
               n = m.in_features
               y = 1.0 / np.sqrt(n)
               m.weight.data.uniform_(-y, y)
               m.bias.data.fill_(0)

   def forward(self, x):
       b, c, t, h, w = x.shape
       x = einops.rearrange(x, 'b c t h w -> (b t) c h w', b=b, t=t)
       x = self.featureModel(x).squeeze()  # -> (b t), 2048
       x = self.linear(x)  # -> (b t), 256
       x = einops.rearrange(x, '(b t) d -> b t d', b=b, t=t)
       x, _ = self.rnn(x)
       x = einops.rearrange(x, 'b t d -> (b t) d', b=b, t=t)
       x = self.classifier(x)
       x = einops.rearrange(x, '(b t) d -> b t d', b=b, t=t)
       return x

# this is an example how to use my Net
model = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x8d_wsl')
mymodel = Mynet_LSTM(model, 2)
frame_for_example = torch.rand(2, 3, 16, 400, 400) # b, c, t, h, w

The result are not good as I expected as shown in the picture below. In the training time, it worked so efficiently, but then the performing became worse during the validating time. After few days of debugging, I found out that this is because no matter what the input data is, my network will always show the output as “the last given ground_truth”. By the way, I am using nn.MSELoss() as my criterion.

the blue line is my target value and the orange one is my prediction. Using MSE as my criterion.

In the case of using CCC as my criterion, the metrics for my tasks should commonly be CCC as well. My results are shown below; even keep training for more than 5-6 epoch, the results are still the same.

the blue line is my target value and the orange one is my prediction. Using CCC I found in the github as my criterion.

question

I would like to know why this strange result happened?
And is there any suggestion for me to train the regression network?
Thanks
–Joanna