How does the Cifar10 tutorial make sure to use the test set is actually the test set if both use the same path to load both test and train?

Brando_Miranda · April 12, 2018, 12:32am

I was looking at:

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

and was getting weird errors. How do I know that the dataloader is actually using the right data set if both are using the same path? (beyond the fact that the argument train=False, is that enough)?

SimonW · April 12, 2018, 1:27am

If you are not trusting a very clearly named flag, perhaps the best way to clear your doubt is not asking on a forum, but reading the source code: https://github.com/pytorch/vision/blob/master/torchvision/datasets/cifar.py

Brando_Miranda · April 12, 2018, 2:21pm

instead of being publicly rude and try to humiliating you can point out your suggestion and skip the shaming.

SimonW · April 12, 2018, 8:12pm

I am sorry about my attitude, and should have say it in better ways. I apologize for it.

However, I still think that the flag is very clearly named, and it makes sense for a directory root to contain both train and test splits. But this doesn’t justify my attitude. I’m sorry.

Brando_Miranda · April 12, 2018, 8:23pm

No worries Simon I understand.

The reason I asked is because I have nearly zero test error which makes no sense. So I just didn’t know where to look anymore…my question also reflects my own frustration (as yours) because this is very simple and very clearly labeled.

ptrblck · April 12, 2018, 8:26pm

Could you explain your training procedure a bit?
Probably there is some “information leakage” or your model is just badass!

SimonW · April 12, 2018, 8:55pm

Cifar 10 is quite small scale, so maybe your model is just really good at it.

If you share your training procedure/model, I’d be happy to discuss.

Brando_Miranda · April 13, 2018, 3:42am

Ok finally! I got a small reproducible example that gets this ridiculously low test error:

github.com

brando90/my_cifar10_pytorch/blob/master/my_cifar10.py

import torch
from torch.autograd import Variable
import torch.optim as optim

import torchvision
import torchvision.transforms as transforms

from math import inf

from pdb import set_trace as st

def error_criterion(outputs,labels):
    max_vals, max_indices = torch.max(outputs,1)
    train_error = (max_indices != labels).sum().data[0]/max_indices.size()[0]
    return train_error

def evalaute_mdl_data_set(loss,error,net,dataloader,enable_cuda,iterations=inf):
    '''
    Evaluate the error of the model under some loss and error with a specific data set.
    '''

This file has been truncated. show original

the errors are as follows:

$ python my_cifar10.py
running main
[-1, -1], (train_loss: 2.3172860145568848, train error: 0.0054) , (test loss: 2.317185878753662, test error: 0.0038)
about to start training
[0, 5], (train_loss: 2.22599835395813, train error: 0.015160000000000002) , (test loss: 2.0623881816864014, test error: 0.0066)
[1, 5], (train_loss: 2.014406657218933, train error: 0.00896) , (test loss: 1.9619578123092651, test error: 0.0195)
[2, 5], (train_loss: 1.9428715705871582, train error: 0.01402) , (test loss: 1.918603539466858, test error: 0.0047)
[3, 5], (train_loss: 1.9434458494186402, train error: 0.01192) , (test loss: 1.9194672107696533, test error: 0.0125)
[4, 5], (train_loss: 1.8804980754852294, train error: 0.00794) , (test loss: 1.8549214601516724, test error: 0.004)
[5, 5], (train_loss: 1.8573726177215577, train error: 0.010159999999999999) , (test loss: 1.8625996112823486, test error: 0.0158)
[6, 5], (train_loss: 1.8454653739929199, train error: 0.01524) , (test loss: 1.8155865669250488, test error: 0.0122)
[7, 5], (train_loss: 1.8140610456466675, train error: 0.01066) , (test loss: 1.808283805847168, test error: 0.0101)
[8, 5], (train_loss: 1.8036894083023072, train error: 0.00832) , (test loss: 1.799634575843811, test error: 0.007)
[9, 5], (train_loss: 1.8023016452789307, train error: 0.0077399999999999995) , (test loss: 1.8030155897140503, test error: 0.0114)
Done

this is my terrific breakthrough model @ptrblck. The amazing magnificent 1 conv 2 FCs NN. AlexNet^2. X’’'D

ptrblck · April 13, 2018, 5:50am

So this is the MirandaNet
I’ll try it on my machine later and have a look at the code.

ptrblck · April 13, 2018, 9:08am

I had a look at your code and it seems your error calculation does overflow.

In this method you are calculating the error:

def error_criterion(outputs,labels):
    max_vals, max_indices = torch.max(outputs,1)
    train_error = (max_indices != labels).sum().data[0]/max_indices.size()[0]
    return train_error

The comparision (max_indices != labels) returns a torch.ByteTensor, which can overflow using your batch size of 10000.
Adding a .float to this line (max_indices != labels).float().sum()... will give a train error of ~0.622 and a test error of ~0.640.

Did you not get an error, since I got a RuntimeError when trying to run your code:

RuntimeError: value cannot be converted to type uint8_t without overflow: 8821

Brando_Miranda · April 13, 2018, 1:14pm

no, I never got a run time error weird. The printed errors is what my code and GPU produced. My version of pytorch are:

torch (0.3.1.post2)
torchvision (0.2.0)

Is the way Im tracking the errors wrong (or perhaps slow). Is this what I’m suppose to be doing or how is it suppose to be done? (which pytorch just had its own built in class for this [thnx so much for the help and your humor! ] )

Brando_Miranda · April 13, 2018, 5:43pm

Simon so the error is the way I compute the error. Do you have any advice on it? What is the standard thing to do in pytorch? I am sure there is something right? Im not the first one tracking the error. Is there no error class or module pytorch provides that is error free?

Brando_Miranda · April 13, 2018, 7:48pm

Whats the difference with doing (max_indices != labels).int().sum() vs (max_indices != labels).sum().float()?

Brando_Miranda · April 13, 2018, 8:02pm

cross posted:

SimonW · April 13, 2018, 8:27pm

(a != b).sum() will be automatically returning a long tensor in the next release. So you don’t need to change much.

It overflows at .sum so cast before it is better.