I am addressing a 4 class classification problem. It’s a 1D data set with ~145000 samples and 70 features. MLP architecture: [70 - 400 - 4]. Adam with 10^-5 learning rate. batch size=32. Currently I’m getting train accuracy of ~55% and validation accuracy of ~52-53%. I was hoping addressing class imbalance would improve network performance.

I tried the following to overcome class imbalance problems.

**Try 1: Weighted sampling**

u = np.unique(labels_t)

w = np.histogram(labels_t, bins=np.arange(min(u), max(u)+2))

weights = 1/torch.Tensor(w[0])

sampler = torch.utils.data.sampler.WeightedRandomSampler(weights.double(), batch_size)

train_data = torch.utils.data.TensorDataset(features_t,labels_t)

train_loader = torch.utils.data.DataLoader(train_data, batch_size, sampler=sampler, shuffle=False)

val_data = torch.utils.data.TensorDataset(features_v,labels_v)

validation_loader = torch.utils.data.DataLoader(val_data, batch_size, shuffle=False)

**Try 2: Weighted Loss**

u = np.unique(labels_t)

w = np.histogram(labels_t, bins=np.arange(min(u), max(u)+2))

weights = 1/torch.Tensor(w[0])

loss = F.nll_loss(output, target, weight=weights)

^changed both in train function and validation function

Neither of these seems to give improvements over simply training without addressing the issue of class imbalance. Is there anything I’m overlooking?

Thanks in advance!