(Got a solution!) The test accuracy of my two-hidden-layer Forward Neural Network classifying MNIST data became 100% gradually

ChangGao · September 8, 2017, 1:21am

Hi! I’m learning to design a FNN using PyTorch with a GPU. And I came accross the problem that the test accuracy of my work turned to be 100% when the epoch is around 20.
After debugging, I found that the labels of the training data and test data were changed. When the test accuracy changed to be 100%, the labels of test data were all the same, so were the training data.
But I really couldn’t figure out where my code changed the labels. T_T
My python version is 2.7, my torch version is 0.2.0_2 and my GPU is Titan xp. I’m using PyCharm to program.
Here is my code.
I used the data provided by Michael Nielsen’s web, Neural Networks and Deep Learning. Thank him for leading me into this area.

import random
import pickle
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable

### load data
f = open('/home/data/mnist.pkl')#

training_data, validation_data, test_data = pickle.load(f)

### load data and stack the target with data
training_data = torch.cat((torch.from_numpy(training_data[0]).float(), torch.from_numpy(training_data[1]).float()), dim=1)

validation_data = torch.cat((torch.from_numpy(validation_data[0]).float(), torch.from_numpy(validation_data[1]).float()), dim=1)

test_data = torch.cat((torch.from_numpy(test_data[0]).float(), torch.from_numpy(test_data[1]).float()), dim=1)

Here is the FNN net.

### Forward Neural Network
class FNN_NET (nn.Module):
    def __init__(self):
        super(FNN_NET, self).__init__()
        self.linear1 = nn.Linear(784, 30)
        self.linear2 = nn.Linear(30, 30)
        self.linear3 = nn.Linear(30, 10)

    def forward(self, x):
        x = F.relu(self.linear1(x))
        x = F.relu(self.linear2(x))
        x = F.relu(self.linear3(x))
        return x

model = FNN_NET()
model.cuda()

batch_size = 30
learning_rate = 0.03

optimizer = optim.SGD(model.parameters(), lr=learning_rate)

This is how I trained the net.

def train(epoch):
    model.train()
    data_temp = training_data# train with training_data or validation_data

    random.shuffle(data_temp)
    mini_batches = [data_temp[k:k+batch_size] for k in xrange(0, len(data_temp), batch_size)]
    for batch_idx, data_mini_batch in enumerate(mini_batches):
        ###
        data, target = torch.split(data_mini_batch, 784, dim=1)# split the data and target
        target = vetcorize_result(len(data_mini_batch), target)# vectorize the target
        target = torch.from_numpy(target).float()# change the data type to use in GPU
        
        data,target = data.cuda(), target.cuda()
        data, target = Variable(data), Variable(target)

        optimizer.zero_grad()
        output = model(data)
        loss = F.mse_loss(output, target, size_average=True)
        loss.backward()
        optimizer.step()

The test.

def test(epoch):
    model.eval()
    test_loss = 0
    correct = 0

    data_temp = test_data# we can also test the model using the validation data

    random.shuffle(data_temp)
    mini_batches = [data_temp[k:k+batch_size] for k in xrange(0, len(data_temp), batch_size)]
    for batch_idx, data_mini_batch in enumerate(mini_batches):
        ###
        data, target = torch.split(data_mini_batch, 784, dim=1)# split the data and target
        target_temp = target# store to use in the "correct"
        target = vetcorize_result(len(data_mini_batch), target)# vectorize the target
        target = torch.from_numpy(target).float()# change the data type
        ###
        data,target = data.cuda(), target.cuda()# change the data type to use GPU
        data, target = Variable(data), Variable(target)
        ###
        output = model(data)
        test_loss = F.mse_loss(output, target, size_average=True).data[0] # sum up batch loss
        pred = output.data.max(1, keepdim=True)[1]
        correct += pred.cpu().eq(target_temp.long()).sum()

    test_loss /= len(mini_batches)
    print ( 'Train Epoch: {} \t test loss: {},\t test accuracy: {:.3f}%'.format(epoch, test_loss, 100 * correct/float(len(data_temp)) ) )

Here I changed the labels of the data to be a one hot vector.

def vetcorize_result(batch_size, target):
    result = np.zeros((batch_size, 10))
    target = np.int_(target.numpy())
    for idx in range(batch_size):
        result[idx, target[idx]] = 1
    return result

Finally, the train and test of the model.

for epoch in range(1, 30):
    train(epoch)
    test(epoch)

ChangGao · September 8, 2017, 2:31pm

Actually, if I move the “test(epoch)” out of the epoch, testing only after all epochs of training are done, then the problem is solved. But I still can’t figure out why this could work. Is it possible that with the test procedure following training could change the data we read in?