How does the Cifar10 tutorial make sure to use the test set is actually the test set if both use the same path to load both test and train?

I was looking at:

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

and was getting weird errors. How do I know that the dataloader is actually using the right data set if both are using the same path? (beyond the fact that the argument train=False, is that enough)?

If you are not trusting a very clearly named flag, perhaps the best way to clear your doubt is not asking on a forum, but reading the source code: https://github.com/pytorch/vision/blob/master/torchvision/datasets/cifar.py

instead of being publicly rude and try to humiliating you can point out your suggestion and skip the shaming.

1 Like

I am sorry about my attitude, and should have say it in better ways. I apologize for it.

However, I still think that the flag is very clearly named, and it makes sense for a directory root to contain both train and test splits. But this doesn’t justify my attitude. I’m sorry.

No worries Simon I understand.

The reason I asked is because I have nearly zero test error which makes no sense. So I just didn’t know where to look anymore…my question also reflects my own frustration (as yours) because this is very simple and very clearly labeled.

Could you explain your training procedure a bit?
Probably there is some “information leakage” or your model is just badass! :wink:

Cifar 10 is quite small scale, so maybe your model is just really good at it.

If you share your training procedure/model, I’d be happy to discuss.

1 Like

Ok finally! I got a small reproducible example that gets this ridiculously low test error:

the errors are as follows:

$ python my_cifar10.py
running main
[-1, -1], (train_loss: 2.3172860145568848, train error: 0.0054) , (test loss: 2.317185878753662, test error: 0.0038)
about to start training
[0, 5], (train_loss: 2.22599835395813, train error: 0.015160000000000002) , (test loss: 2.0623881816864014, test error: 0.0066)
[1, 5], (train_loss: 2.014406657218933, train error: 0.00896) , (test loss: 1.9619578123092651, test error: 0.0195)
[2, 5], (train_loss: 1.9428715705871582, train error: 0.01402) , (test loss: 1.918603539466858, test error: 0.0047)
[3, 5], (train_loss: 1.9434458494186402, train error: 0.01192) , (test loss: 1.9194672107696533, test error: 0.0125)
[4, 5], (train_loss: 1.8804980754852294, train error: 0.00794) , (test loss: 1.8549214601516724, test error: 0.004)
[5, 5], (train_loss: 1.8573726177215577, train error: 0.010159999999999999) , (test loss: 1.8625996112823486, test error: 0.0158)
[6, 5], (train_loss: 1.8454653739929199, train error: 0.01524) , (test loss: 1.8155865669250488, test error: 0.0122)
[7, 5], (train_loss: 1.8140610456466675, train error: 0.01066) , (test loss: 1.808283805847168, test error: 0.0101)
[8, 5], (train_loss: 1.8036894083023072, train error: 0.00832) , (test loss: 1.799634575843811, test error: 0.007)
[9, 5], (train_loss: 1.8023016452789307, train error: 0.0077399999999999995) , (test loss: 1.8030155897140503, test error: 0.0114)
Done

this is my terrific breakthrough model @ptrblck. The amazing magnificent 1 conv 2 FCs NN. AlexNet^2. X’’'D :rofl::joy:

So this is the MirandaNet :wink:
I’ll try it on my machine later and have a look at the code.

1 Like

I had a look at your code and it seems your error calculation does overflow.

In this method you are calculating the error:

def error_criterion(outputs,labels):
    max_vals, max_indices = torch.max(outputs,1)
    train_error = (max_indices != labels).sum().data[0]/max_indices.size()[0]
    return train_error

The comparision (max_indices != labels) returns a torch.ByteTensor, which can overflow using your batch size of 10000.
Adding a .float to this line (max_indices != labels).float().sum()... will give a train error of ~0.622 and a test error of ~0.640.

Did you not get an error, since I got a RuntimeError when trying to run your code:

RuntimeError: value cannot be converted to type uint8_t without overflow: 8821
1 Like

no, I never got a run time error :confused: weird. The printed errors is what my code and GPU produced. My version of pytorch are:

torch (0.3.1.post2)
torchvision (0.2.0)

Is the way Im tracking the errors wrong (or perhaps slow). Is this what I’m suppose to be doing or how is it suppose to be done? (which pytorch just had its own built in class for this :frowning: [thnx so much for the help and your humor! :slight_smile: ] )

Simon so the error is the way I compute the error. Do you have any advice on it? What is the standard thing to do in pytorch? I am sure there is something right? Im not the first one tracking the error. Is there no error class or module pytorch provides that is error free?

Whats the difference with doing (max_indices != labels).int().sum() vs (max_indices != labels).sum().float()?

cross posted:

(a != b).sum() will be automatically returning a long tensor in the next release. So you don’t need to change much. :slight_smile:

It overflows at .sum so cast before it is better.

1 Like