Getting the same output for every sample in validation


I have built a CNN-GRU network for a custom video dataset, in which I give every frame to the CNN, whose output is later fed to GRU and a prediction for the video is obtained at the end. I have been using this structure for a while with different datasets and now I am facing a problem that never occured before.

I am giving 1 video sample per batch, and for all of the videos I get the exact same (~15 significant figures) regressed output within the same epoch for validation (in training the outputs differ). In other epochs, the validation outputs change, but again, they are the same with the samples in the same epoch. In addition, training and validation accuracies does not improve. I have checked these:

  • The inputs that I fed the model with are not the same even though the output is the same.
  • I have disabled the bias to check whether it is dominating the weights. Nothing changed.

I don’t know where to look more. Any help would be appreciated.

2 Random ideas.
Have you checked hidden state? Perhaps you are reinitilizing it each time you feed a sample.
Try to check if it behaves like that with other weights. If so, it’s probably a bug in your code rather than wrong validation.
Are the outputs of the CNN different? so that the problem happens in the RNN

1 Like

Sorry for the late response.

  • My hidden states seem to be working fine. They are not reinitialized at every step.
  • I have observed that the outputs of CNN are so close. But I don’t think that the problem is either in CNN or RNN. There is another bug probably.

I found out that even with perfectly randomly generated input videos result in the same output. But whenever the parameters of the model changes (with training) the output changes. So I conclude that there might have been some kind of overfitting. But I don’t know why. I am adding the codes for investigation. I am still seeking for help :((

This is my train function.

def train(model, optimizer, epoch, train_loader, args):
    traindebug = True
    total_loss = 0
    outlist = []
    sig = nn.Sigmoid()
    #criterion = nn.MSELoss()
    for batch_idx, (data, target,vname) in tqdm(enumerate(train_loader),total = len(train_loader)):
        data = torch.FloatTensor(data)
        if args["cuda"]:
            data, target = data.cuda(), target.cuda()

        data, target = Variable(data), Variable(target).float()
        # Generating random tensor for trial
        randt = torch.rand(1, 200, 3, 224, 224)
        randt = randt.cuda()
        randt = Variable(randt)
        randout = model(randt)
        output = model(data)
        del randt
        del randout
        #loss = criterion(output, target)
        if losstype == 'mae':
            loss = F.l1_loss(output, target)
        elif losstype == 'cross_entropy':
            loss = F.binary_cross_entropy(output, target)
        elif losstype == 'BCEWithLogits':
            pos_weight = torch.ones([class_count])
            criterion = nn.BCEWithLogitsLoss(pos_weight=pos_weight).cuda()
            # bunu kullanırken outputu sigmoidden çıkarmak gerekebilir
            loss = criterion(output, target)
        elif losstype == 'sigmoid':
            loss = sig(torch.abs(output-target))
        total_loss += loss.item()
    total_loss /= len(train_loader.dataset)
    print('Train epoch {} Average loss: {:.4f})'.format(epoch, total_loss))

    return outlist, [total_loss]

This is my validation function:

def test(model, test_loader, epoch, args, is_cuda):
    testdebug = True
    test_loss = 0
    outlist = []
    for batch_idx, (data, target,vname) in tqdm(enumerate(test_loader),total = len(test_loader)):
        data = torch.FloatTensor(data)
        if is_cuda:
            data, target = data.cuda(), target.cuda()

        with torch.no_grad():
            data, target = Variable(data), Variable(target).float()
            output = model(data)
            randt = torch.rand(1, 200, 3, 224, 224)
            randt = randt.cuda()
            randt = Variable(randt)
            randout = model(randt)
            y_pred = torch.round(torch.sigmoid(output))

            if losstype == 'mae':    
                loss = F.l1_loss(output, target)
            elif losstype == 'cross_entropy' or losstype == 'sigmoid':
                loss = F.l1_loss(output.round(), target)
                loss = (y_pred == target).float().min()
            del randt
            del randout
            test_loss += loss.item() / target.size()[0]

    test_loss /= len(test_loader.dataset)
    acc = 1 - test_loss
    print('Validation epoch {} Average accuracy: {:.4f}'.format(epoch,acc))

    return outlist, [acc]

This is the forward function of the model:

    def forward(self, x):
            batch_size, timesteps, C, H, W = x.size()
            self.c_in = x.view(batch_size * timesteps, C, H, W)
            self.c_out = self.cnn(self.c_in)
            self.r_in = self.c_out.view(batch_size, timesteps, -1)
            self.r_out, self.h_n = self.rnn(self.r_in)
            return self.linear(self.r_out[:, -1, :])

If someone need more info about the code, I can give it right away.

Few things, Variable is deprecated ince 0.4
I’m wondering couple of things:
What is loss_type sigmoid? Sigmoid is an activation/function not a loss. You shouldn’t round
Why do you use pos_weight = torch.ones([class_count]) It doesn’t make sense at all to me.
Have a look at docs about it.

Lastly, you shouldn’t use round and casting types. Even if you ideally want your output to be binary, at the training time outputs have to cover an interval. They cannot be binary.

Lastly, L1 is not usually the best way if you can use other losses.
Why do you have

            elif losstype == 'cross_entropy' or losstype == 'sigmoid':
                loss = F.l1_loss(output.round(), target)

But then you use l1 instead?

Hi again,

You are right about the stuff you are complaining about, but due to millions of times of minor changes, some sections of code does not mean anything/not the best way to implement.

  • I started with Variable and did not change that, will remove in latter versions of this code.
  • I thought maybe the network tends to give intermediary outputs (0.5 when in between 0-1) since I do not punish the intermediary losses that much. Sigmoid type of “custom” loss penalizes the big mistakes not much than the smaller mistakes, but that didnot work either. That was just a trial, I normally use cross entropy.
  • Rounding is only in testing which I created like: if the regressed output is closer to 1, predict 1, otherwise predict 0. So they are not binary in training.
  • L1 might not be the best loss in testing I know, but when I test the sample with the method I explained, the loss is either 1 or 0 (it can be thought as the accuracy). You can think this one as a not-the-best implementation of accuracy.

Some of the parts are just written so fast that I did not created the optimal structure, but even so, I think that should work fine.

In case you have binary states, 0 or 1, network will learn to map into those values by using BCEwithlogits if your GT is binary.

Obviously initial values will tend to be 0.5, but it should become “more” binary through epochs. Don’t underestimate the loss you use, it’s very important.
What are you trying to predict? Classification it seems. I would fit to Cross entropy. Be sure it’s properly done. Then you can try to reduce frame rate. Are you using a pre-trained CNN (would be a nice idea).

I’m encountering similar problems on my transformer-gru network. Have you solved that?