2 Classes ResNet18 CNN with Half the acc during validation step. What can cause this?

What could cause a validation step to look so much different from a training step ?

The network running is almost the same as this Places365 one, but with only two classes.
The Validation Dataset is equally divided in the two classes.

It almost looks like the network is ignoring one of the classes during training.


Do you calculate the validation accuracy using the complete validation dataset or are you plotting the accuracy sequentially?

Is the training or validation dataset imbalanced?

The test dataset is a little imbalanced. 1,700,000 for the first class, and 1,000,000 for the second.

I’m plotting it sequentially, logging each validation batch, but I do calculate the average and print it. It it about 50%.
Plotting part:

from tensorboardX import SummaryWriter
writer = SummaryWriter()


        if i % args.print_freq == 0:
            print('Epoch: [{0}][{1}/{2}]\t'
                  'Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t'
                  'Data {data_time.val:.3f} ({data_time.avg:.3f})\t'
                  'Loss {loss.val:.4f} ({loss.avg:.4f})\t'
                  'Prec@1 {top1.val:.3f} ({top1.avg:.3f})'.format(
                   epoch, i, len(train_loader), batch_time=batch_time,
                   data_time=data_time, loss=losses, top1=top1))
        writer.add_scalar('loss', loss.item(), globaliter_train/args.print_freq)
        writer.add_scalar('Accuracy', top1.val, globaliter_train/args.print_freq)
        globaliter_train += 1

Thanks for the information.
In that case your model might overfit on a class.
To counter it, you could use a weighted criterion or oversample the minority class using WeightedRandomSampler.

I tried using them both in the past, but either I used them wrong or it just did not work.
I created a balanced version of the training dataset, and will post the results here if it solves my issues.

Balanced datasets have not solved my issue.
This is one epoch of validation, and it looks just like the batches from before.

What could be the issue ?
For comparison, this is a graph using Transfer Learning from the Places365 dataset:

This has 8 epochs of training and validation.

Does one batch contain 64 samples?
If so, how do you calculate the validation accuracy?

I just made a correction in the last response. The blue graph represents on epoch of training, not one batch. In that case, the batch size was 112 images, and it looped for 650 times to cover the entire validation dataset of 73.200 ( I just noticed I had a /10 division when plotting to graph, and that’s why it looks like 64/65 setps).

I also found out that I was plotting the top1 value insted of prec1. I do not think this can be the issue, since the average top1 was calculated to be arrount 50%. Nevertheless I will start a test with prec1.val insted of top1.val in a few minutes.
This is what I am doing:

        # measure accuracy and record loss
        prec1 = accuracy(output.data, target, topk=(1, ))
        losses.update(loss.item(), input.size(0))
        top1.update(prec1.item(), input.size(0))

        writer.add_scalar('Validation Loss', loss.item(), globaliter_val)
        writer.add_scalar('Validation Accuracy', top1.val, globaliter_val)

def accuracy(output, target, topk=(1,)):
    """Computes the precision@k for the specified values of k"""
    maxk = max(topk)
    batch_size = target.size(0)

    _, pred = output.topk(maxk, 1, True, True)
    pred = pred.t()
    correct = pred.eq(target.view(1, -1).expand_as(pred))

    res = []
    for k in topk:
        correct_k = correct[:k].view(-1).float().sum(0)
        res.append(correct_k.mul_(100.0 / batch_size))
    return res

Update: The change I made to the plot function did not affect the end result.Same shape, but now going to 650 instead of 65.

I ran the same model wit the same dataset and same parameters using Keras, and it converged.
I’m not sure what is happening with the pytorch version.