My neural network cannot learning features with my data loading method

cat-loves-donuts · July 20, 2019, 10:20pm

Well, I know this question might be not very suitable to ask here, but I really need some help.

Well, I copy the code from the tutorial of pytorch website:https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py

I ran this code and it worked perfectly. Because I have already download the CIFR-10 data set, I decided to load the data directly from my computer. However, I only changed the data loading method and the neural network stopped working. And the most interesting thing is that I used the same data loading method to train another NN to identify the MNIST data set and it worked perfectly. But it did not work on CIFR-10.

And I downloaded the CIFR-10 data set, found a code to transform those data into .JPG images and classified them into 10 different files with names like “cat”,“airplane” and so on… I do not know why, but this is my code about data loading, I wrote them as 3 functions:

def load_train_dataset():
data_path = ‘C:\Users\…\CIFR-10\train’
train_dataset = torchvision.datasets.ImageFolder(
root=data_path,
transform=torchvision.transforms.ToTensor()
)
train_loader = torch.utils.data.DataLoader(
train_dataset,
batch_size=64,
num_workers=0,
shuffle=True
)
return train_loader

def load_test_dataset():
data_path = ‘C:\Users\…\CIFR-10\test’
test_dataset = torchvision.datasets.ImageFolder(
root=data_path,
transform=torchvision.transforms.ToTensor()
)
test_loader = torch.utils.data.DataLoader(
test_dataset,
batch_size=64,
num_workers=0,
shuffle=False
)
return test_loader

def test_dataset():
data_path = ‘C:\Users\…\CIFR-10\test’
test_dataset = torchvision.datasets.ImageFolder(
root=data_path,
transform=torchvision.transforms.ToTensor()
)
return test_dataset

ptrblck · July 20, 2019, 10:57pm

Do you get an error or what do you mean by “the neural network stopped working”?

cat-loves-donuts · July 20, 2019, 11:14pm

Well thank you so much for helping me.

I mean the loss is not changing. I tried it for couple of times. The loss always around 2.28 to 2.34. And it did not get down. And the test accuracy is 9% to 20%.

I trained for 50000 train data set and test it on 10000 test data set.

ptrblck · July 20, 2019, 11:29pm

Could you try to add the Normalization to your datasets?
Did you change something else besides the origin of the dataset?

cat-loves-donuts · July 20, 2019, 11:32pm

No, I never changed anything in the data set and I checked the data set to see if I mixed the data set or missed so data. But everything is fine.

Well, do you mean BuntchNormal layer? Because I found when I used BuntchNormal layer the results are too perfect to believe in it. Then I decided to remove that layer.

ptrblck · July 20, 2019, 11:34pm

No, I meant the Normalization in the transform from the tutorial:

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

cat-loves-donuts · July 20, 2019, 11:55pm

Well, I tried that. But looks like it does not work:
Train epoch: 100, loss: 2.299

Train epoch: 200, loss: 2.303

Train epoch: 300, loss: 2.299

Train epoch: 400, loss: 2.302
I do not know why…

ptrblck · July 21, 2019, 12:10am

Where did you download the data from?

cat-loves-donuts · July 21, 2019, 12:46pm

I downloaded it from the website: https://www.cs.toronto.edu/~kriz/cifar.html

And use a code to rewrote them as .JPG file.

ptrblck · July 21, 2019, 12:57pm

Could you post the code or give some information about how you are converting the data to JPG? I would like to check, if there is an error in the processing pipeline.

cat-loves-donuts · July 22, 2019, 12:42pm

Well, I am sorry I deleted that code…

But I tried to change the data loading method to the tutorial style and use it on my training code. Unsurprisingly, the NN still cannot learn…

Well I suppose my training code must have some problems…

But I cannot find it…

Could you please help me to check the code?

def train_model(model, learning_rate, nsamples, load_train_dataset):
model.train(True)
epoch = 0
print_loss = 0
criterion = nn.CrossEntropyLoss()
optimizer_module = optim.SGD(model.parameters(), lr=learning_rate)
device = torch.device(“cuda:0” if torch.cuda.is_available() else “cpu”)
model = model.to(device)
for i, (data, target) in enumerate(load_train_dataset):
data = data.to(device)
target = target.to(device)
if i > nsamples:
break
else:
# train network
optimizer_module.zero_grad()
out = model(data)
loss = criterion(out, target)
print_loss = loss.data.item()

        loss.backward()
        optimizer_module.step()
        epoch += 1
        if epoch % 100 == 0:
            print('\n Train epoch: {}, loss: {:.4}'.format(epoch, loss.data.item()))
return print_loss

And the training result is this:
Train epoch: 100, loss: 2.277

Train epoch: 200, loss: 2.28

Train epoch: 300, loss: 2.306

Train epoch: 400, loss: 2.331

Train epoch: 500, loss: 2.263

Train epoch: 600, loss: 2.284

Train epoch: 700, loss: 2.255