yunjey
(Yunjey)
March 4, 2017, 10:32am
1
I want to create a PyTorch tutorial using MNIST data set.
In TensorFlow, there is a simple way to download, extract and load the MNIST data set as below.
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("./mnist/data/")
x_train = mnist.train.images # numpy array
y_train = mnist.train.labels
x_test = mnist.test.images
y_test = mnist.test.labels
Is there any simple way to handle this in PyTorch?
yunjey
(Yunjey)
March 4, 2017, 10:37am
2
It seems to support MNIST data set in torchvision.datasets.
I was confused because PyTorch documentation does not specify MNIST.
Yes it already there - see here
http://pytorch.org/docs/data.html
and here,
http://pytorch.org/docs/torchvision/datasets.html#mnist
The code looks something like this,
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.batch_size, shuffle=True, **kwargs)
How do you subset the MNIST training data? Itās 60,000 images, how can you reduce it to say 2000?
Hereās the code
>>> from torchvision import datasets, transforms
>>>
>>>
>>> train_all_mnist = datasets.MNIST('../data', train=True, download=True,
... transform=transforms.Compose([
... transforms.ToTensor(),
... transforms.Normalize((0.1307,), (0.3081,))
... ]))
Files already downloaded
>>> train_all_mnist
<torchvision.datasets.mnist.MNIST object at 0x7f89a150cfd0>
How do I subset train_all_mnist
?
Or, alternatively I could just download it again, and hack this line to 2000
,
https://github.com/pytorch/vision/blob/master/torchvision/datasets/mnist.py#L64
Itās a bit ugly - anyone know a neater way to do this?
yunjey
(Yunjey)
March 21, 2017, 3:28am
5
what is your purpose of subsetting the training dataset?
Iām interested in Omniglot, which is like an inverse, MNIST, lots of classes, each with a small number of examples.
Take a look, here
By the way - thank you for your tutorials - they are very clear and helpful to learn from.
Best regards,
Ajay
smth
March 21, 2017, 9:00pm
7
omniglot is in this Pull Request:
pytorch:master
ā ludc:master
opened 04:21PM - 25 Jan 17 UTC
Ha, fanck Q
Spent a hour hacking together my own loader - but this looks better!
Seems to be the easiest data set for experimenting with one-shot learning?
Whats the current best methodology for Omniglot? Who or whatās doing the best at the moment?
smth
March 21, 2017, 9:20pm
10
@pranv set the record on Omniglot recently with his paper:
Attentive Recurrent Comparators
https://arxiv.org/abs/1703.00767
2 Likes
AjayTalati
(Ajay Talati)
March 21, 2017, 10:09pm
11
Thanks for that
It lookās like the DRAW I implemented in Torch years ago , without the VAE, and decoder/generative canvas.
I though you might like this, implementation of a GAN on Omniglot,
Code for training a GAN on the Omniglot dataset using the network described in:
Task Specific Adversarial Cost Function
ritchieng
(Ritchie Ng)
May 29, 2017, 5:48am
12
Have you found a better way to do this?
Nope sorry - been totally snowed under the past couple of months - not had any time to work on it.
If youāre referring to the alternative cost functions for GANs I donāt think they make much difference?
If youāre referring to non Gaussian attention mechanisms for the DRAW encoder, I donāt know of any better approach than @pranav 's as mentioned above. I think heās open sourced his code?
Cheers,
Aj
pranav
(Pranav Shyam)
May 29, 2017, 6:23pm
14
The code for Attentive Recurrent Comparators is here: https://github.com/pranv/ARC
It includes Omniglot data downloading and iterating scripts along with all the models proposed in the paper (the nets are written and trained with Theano).
I will try to submit a PR for torchvision.datasets.Omniglot
if I find some time
1 Like