Creating a DataLoader for unsupervised learning (MNIST, SVHN)

It looks pretty good now, I get the features from the conv net, do an anomaly detection and it looks like the results are pretty good (around 10 and robust :slight_smile: )
I have 2 last questions regarding the dataloader.

I have a list of predictions, but the thing is that I canā€™t compare the predictions to an index/image from the dataloader. Is there a way to get the indexes of the batch? Itā€™s suffice for me to know from which dataset each feature vector comes (0/1 or mnist/svhn would be great).

When I show the images using the dataloader the data from MNIST looks good, but the images from SVHN looks like vertical lines in contrast to the background, which is not even similar to the original image. I show them randomly by going with an iterator over the first images in the dataloader, which means that in each run it shows different images.
The rescaling is almost negligible as the transform rescales the images from 28X28 to 32X32 (3 channels, of course), I wouldnā€™t expect such a drastic change. Any idea what it happens?

  1. You could just return the current dataset name (or a specific dataset index, e.g. 0 for MNIST, 1 for SVHN):
    def __getitem__(self, index):
        if index < self.mnist_len:
            x = self.mnist_data[index]
            if self.mnist_transform:
                x = self.mnist_transform(x)
            print('Returning MNIST sample at index {}'.format(index))
            dset = 'mnist'
        else:
            index = index - self.mnist_len
            x = self.svhn_data[index]
            if self.svhn_transform:
                x = self.svhn_transform(x)
            print('Returning SVHN data at index {}'.format(index))
            dset = 'svhn'
        return x, dset
  1. Are you using a view operation on the SVHN data sample to change the axes, i.e. push the channel dimension so dim2?
    If so, use permute instead, since view will create artifacts if you try to permute the axes.
1 Like
  1. Super easy, just return the another number. I thought there might be a built-in method in PyTorch. Great, now I have a list.

  2. The dataset includes both MNIST and SVHN datasets. I used:

dataiter = iter(loader)
images = dataiter.next()
for i in range(9):

plt.subplot(3, 3, i+1)
plt.imshow(images[i, 0])

Adding permute doesnā€™t help, I just get a blank 1.0X1.0 rectangle

plt.imshow(images[i, 0].permute(2, 0, 1)

Yeah, you are right.
Apparently the SVHN data is stored as a numpy array, which should be passed as [H, W, C] to ToPILImage.
Change the call to self.svhn_transform to:

x = self.svhn_transform(x.transpose(1, 2, 0))
1 Like

You tried that with plt.imshow(images[i, 0]) or with something else?

I tried it with this command. Is it not working?

No, I get a blank rectangle.

In that case it would be probably easiest to update torchvision, as I canā€™t debug the code currently. Would that be possible or do you need to use an older version?

I forgot to mention that thereā€™s also an error
File ā€œā€, line 3, in
plt.imshow(images[i, 0])

TypeError: list indices must be integers or slices, not tuple.

Itā€™s a bit odd considering that this line showed the images when I first tried to show the images from the dataloader.

I just didnā€™t want the update to break older code. It would also require me to update to PyTorch 1.1 (now I have 0.4). If torchvision 0.2.1 makes problems then maybe I should update, though I prefer not to do it now.