RNN is not learning anything

Hi,
I implemented rnn model, but it seems that it is not learning from data.
Initially my training data is unbalanced -1871 of zeroes and 229 of ones and at each epoch I get results varying from 63%-65%. When I print out predicted values I see that they are always zeroes.

Can you please help me to find any solution to this problem? I tried to experiment with WeightedRandomSampler,RandomSampler and varying weights in CrossEntropyLoss criterion. Each time I end up with all predictions equal to zero. Below is my code:

batch_size = 100
train_data=myDatasetTrain()
test_data =myDatasetTest()
num_train = len(train_data)
indices = list(range(num_train))
split=int(num_train*7/10)
train_idx, valid_idx = indices[:split], indices[split:]
train_sampler = SequentialSampler(train_idx)
valid_sampler = SequentialSampler(valid_idx)
train_loader = DataLoader(dataset=train_data,sampler=train_sampler, batch_size=batch_size, drop_last=True)
valid_loader = DataLoader(dataset=train_data,sampler=valid_sampler, batch_size=batch_size, drop_last=True)
test_loader = DataLoader(dataset=test_data,shuffle=True, batch_size=batch_size, drop_last=True)

weight = [0.3, 0.7]
class_weights = torch.FloatTensor(weight)
criterion = nn.CrossEntropyLoss(weight=class_weights)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01) 

N_INPUTS = 16
N_NEURONS = 20
N_OUTPUTS = 2
n_layers=20
drop_prob=0.85

class ImageRNN(nn.Module):
def __init__(self, batch_size, n_inputs, n_neurons,drop_prob, n_outputs,n_layers):
    super(ImageRNN, self).__init__()
    
    self.n_neurons = n_neurons
    self.batch_size = batch_size
    self.n_inputs = n_inputs
    self.n_outputs = n_outputs
    self.n_layers=n_layers
    self.drop_prob = drop_prob
    self.basic_rnn = nn.RNN(self.n_inputs,self.n_neurons,self.n_layers)
    self.dropout = nn.Dropout(drop_prob)
    self.FC = nn.Linear(self.n_neurons, self.n_outputs)
    
def init_hidden(self):
    return (torch.zeros(self.n_layers, self.batch_size, self.n_neurons))

def forward(self, X):
    X = X.unsqueeze(dim=0) 
    self.batch_size = X.size(1)
    self.hidden = self.init_hidden()
    output, self.hidden = self.basic_rnn(X, self.hidden)
    out = self.dropout(output)
    out = self.FC(out)
    
    return out.view(-1, self.n_outputs) 
model = ImageRNN(batch_size, N_INPUTS, N_NEURONS, drop_prob, N_OUTPUTS,n_layers)

n_epochs = 300
valid_loss_min = np.Inf


for epoch in range(n_epochs):
train_loss = 0.0
train_acc = 0.0

model.train()
for data, target in train_loader:
    optimizer.zero_grad()
    output = model(data)
    loss = criterion(output, target)
    loss.backward()
    optimizer.step()
    train_loss += loss.detach().item()
    train_acc += get_accuracy(output, target, batch_size)
print('Epoch:  %d | Loss: %.4f | Train Accuracy: %.2f' 
      %(epoch, train_loss / len(train_loader), train_acc/len(train_loader)))

Thank you so much in advance!!!

Did neither the WeightedRandomSampler nor the loss function weighting help at all?
Is your data sorted in any way? If so, could you use class weighting for your criterion and a SubsetRandomSampler?

Also, could you post the code for using the WeightedRandomSampler so that we can make sure it was used properly?

Thank you for your help!
I probably define weights in wrong way…
weights=[0.5, 0.5]
weights = torch.DoubleTensor(weights)

As far as I understand I assign equal probability for each class and indeed, it gives approximately equal number of zroes and ones in trainloader.

sampler = torch.utils.data.sampler.WeightedRandomSampler(weights, 300, replacement=True)

Is it incorrect?

I defined weights in another way, like in example you provided

class_sample_counts=[1871,229]
weights = 1. / torch.tensor(class_sample_counts, dtype=torch.float)
samples_weights = weights[train_data.y]

sampler = torch.utils.data.sampler.WeightedRandomSampler(
weights=samples_weights,
num_samples=len(samples_weights),
replacement=True)

Model is still not learning anything, my results almost the same,varying around 50% with each epoch:

train_acc 50.42857142857143
train_acc 50.0
train_acc 50.19047619047619
train_acc 49.0

Why does it happen?

Could you check if the target distribution in each batch is now approx. equal? If so, you could play around with some hyperparameters, e.g. lowering the learning rate might help.

Whenever you are facing such problems simply print data outputted by the data_loader and in most cases you can find your bug. So is the problem solved now.

Now, data from trainloader seems to be correct, it looks like this

d tensor([[-0.5081, -0.6468, -0.6676, …, 0.2078, 0.2027, 0.2055],
[-1.2508, -1.1893, -1.3128, …, 8.4809, 8.6478, 8.4245],
[ 0.6323, 0.5460, 0.6442, …, -0.2325, -0.2392, -0.2345],
…,
[-0.0174, 0.0557, 0.2380, …, -0.3327, -0.3175, -0.3333],
[-0.6470, -0.6476, -0.5937, …, -0.3838, -0.3793, -0.3828],
[ 0.1735, 0.2239, 0.2804, …, -0.1445, -0.1382, -0.1391]])
t tensor([0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1,
1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1,
1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1,
1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1,
1, 0, 0, 1])


But why NN not learning anything?

It looks like correct, when I print from train_loader. Predictions look like random numbers 0 or 1 with each epoch(

data tensor([[ 0.1041, -0.1237, -0.2148, …, -0.4751, -0.4867, -0.4756],
[ 0.1770, 0.2372, 0.1080, …, -0.6973, -0.7080, -0.6980],
[-1.0250, -0.9747, -1.1029, …, 1.8648, 1.8507, 1.8675],
…,
[-0.8286, -0.8282, -1.0329, …, 0.7890, 0.8069, 0.7968],
[-0.5453, -0.4008, -0.4164, …, 0.2117, 0.2365, 0.2151],
[ 1.2337, 1.2667, 1.1302, …, -0.3912, -0.3921, -0.3910]])
target tensor([1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0,
1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1,
1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0,
1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 1, 0])
pred tr tensor([0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0,
1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0,
0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0,
0, 1, 0, 0])

Maybe your hypeparameters or the model architecture are not suitable for the data.
I would recommend to use a small sample of your data (e.g. just 10 samples) and try to overfit it using your code as a sanity check. If your model isn’t able to learn this small data sample, you might have other bugs in your code.

target distribution in each batch is now approximately equal, but varying learning rate doesn’t help either.
Still, with each epoch I get accuracy 49-50%. Predictions look like random numbers at each epoch

data tensor([[ 0.1041, -0.1237, -0.2148, …, -0.4751, -0.4867, -0.4756],
[ 0.1770, 0.2372, 0.1080, …, -0.6973, -0.7080, -0.6980],
[-1.0250, -0.9747, -1.1029, …, 1.8648, 1.8507, 1.8675],
…,
[-0.8286, -0.8282, -1.0329, …, 0.7890, 0.8069, 0.7968],
[-0.5453, -0.4008, -0.4164, …, 0.2117, 0.2365, 0.2151],
[ 1.2337, 1.2667, 1.1302, …, -0.3912, -0.3921, -0.3910]])
target tensor([1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0,
1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1,
1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0,
1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 1, 0])
pred tr tensor([0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0,
1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0,
0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0,
0, 1, 0, 0])

You are absolutely right, I can’t reach overfitting even with 10 samples dataset, i just tried feedforfard nn, the same result. May be there is problem with format of data I feed to nn…
can you please have a look?

This is how I load data to notebook, initially my files are stored in separate txt.files:

class myDatasetTrain(Dataset):
    def __init__(self):
    path = "...' 
    all_files = glob.glob(os.path.join(path, "*trn.ssv")) 
    frame_train = pd.DataFrame()
    list_ = []
    for file_ in all_files:
        df = pd.read_csv(file_,index_col=None, sep='\s+', header=None)
        list_.append(df)
        frame_train = pd.concat(list_)

    self.len=frame_train.shape[0]

self.x=torch.from_numpy(pd.DataFrame(scaler.fit_transform(frame_train.iloc[:,:-1])).as_matrix()).float()
self.y=torch.from_numpy(frame_train.iloc[:,-1].replace(255, 1).values).long()
def getitem(self, index):
return self.x[index],self.y[index]
def len(self):
return self.len

batch_size = 1
train_data=myDatasetTrain()
class_sample_counts=[1871,229]
weights = 1. / torch.tensor(class_sample_counts, dtype=torch.float)
samples_weights = weights[train_data.y]

sampler = torch.utils.data.sampler.WeightedRandomSampler(
weights=samples_weights,
num_samples=len(samples_weights),
replacement=True)

train_loader = DataLoader(dataset=train_data,sampler=sampler, batch_size=batch_size, drop_last=True)

And this is what I get when print from train_loader:

for d,t in train_loader:
print(‘d’,d)
print(‘t’,t)

d tensor([[-1.0678, -1.0603, -0.8322, -1.0248, 0.8292, 0.7950, 0.5398, 0.8180,
-0.4322, -0.0671, 0.9796, -0.4490, 0.1750, 0.1924, 0.1636, 0.2060]])
t tensor([1])
d tensor([[ 1.1501, 1.0182, 0.8165, 0.8637, -1.0991, -0.9765, -0.8587, -0.9004,
1.3153, 0.8916, 0.4266, 0.9797, -0.9870, -0.9993, -1.0128, -1.0002]])
t tensor([0])
d tensor([[ 0.2682, 0.1219, 0.1722, 0.1781, -0.4644, -0.3578, -0.4453, -0.4443,
0.4593, 0.2648, 0.6636, 0.6395, -1.1267, -1.0854, -1.0255, -1.0883]])
t tensor([1])
d tensor([[ 0.2682, 0.1219, 0.1722, 0.1781, -0.4644, -0.3578, -0.4453, -0.4443,
0.4593, 0.2648, 0.6636, 0.6395, -1.1267, -1.0854, -1.0255, -1.0883]])
t tensor([1])
d tensor([[ 0.2682, 0.1219, 0.1722, 0.1781, -0.4644, -0.3578, -0.4453, -0.4443,
0.4593, 0.2648, 0.6636, 0.6395, -1.1267, -1.0854, -1.0255, -1.0883]])
t tensor([1])