Creating a DataLoader for unsupervised learning (MNIST, SVHN)

Nimrod_Daniel · May 30, 2019, 4:17pm

It looks pretty good now, I get the features from the conv net, do an anomaly detection and it looks like the results are pretty good (around 10 and robust )
I have 2 last questions regarding the dataloader.

I have a list of predictions, but the thing is that I can’t compare the predictions to an index/image from the dataloader. Is there a way to get the indexes of the batch? It’s suffice for me to know from which dataset each feature vector comes (0/1 or mnist/svhn would be great).

When I show the images using the dataloader the data from MNIST looks good, but the images from SVHN looks like vertical lines in contrast to the background, which is not even similar to the original image. I show them randomly by going with an iterator over the first images in the dataloader, which means that in each run it shows different images.
The rescaling is almost negligible as the transform rescales the images from 28X28 to 32X32 (3 channels, of course), I wouldn’t expect such a drastic change. Any idea what it happens?

ptrblck · May 30, 2019, 4:30pm

You could just return the current dataset name (or a specific dataset index, e.g. 0 for MNIST, 1 for SVHN):

    def __getitem__(self, index):
        if index < self.mnist_len:
            x = self.mnist_data[index]
            if self.mnist_transform:
                x = self.mnist_transform(x)
            print('Returning MNIST sample at index {}'.format(index))
            dset = 'mnist'
        else:
            index = index - self.mnist_len
            x = self.svhn_data[index]
            if self.svhn_transform:
                x = self.svhn_transform(x)
            print('Returning SVHN data at index {}'.format(index))
            dset = 'svhn'
        return x, dset

Are you using a view operation on the SVHN data sample to change the axes, i.e. push the channel dimension so dim2?
If so, use permute instead, since view will create artifacts if you try to permute the axes.

Nimrod_Daniel · May 30, 2019, 6:00pm

Super easy, just return the another number. I thought there might be a built-in method in PyTorch. Great, now I have a list.
The dataset includes both MNIST and SVHN datasets. I used:

dataiter = iter(loader)
images = dataiter.next()
for i in range(9):

plt.subplot(3, 3, i+1)
plt.imshow(images[i, 0])

Adding permute doesn’t help, I just get a blank 1.0X1.0 rectangle

plt.imshow(images[i, 0].permute(2, 0, 1)

ptrblck · May 30, 2019, 6:37pm

Yeah, you are right.
Apparently the SVHN data is stored as a numpy array, which should be passed as [H, W, C] to ToPILImage.
Change the call to self.svhn_transform to:

x = self.svhn_transform(x.transpose(1, 2, 0))

Nimrod_Daniel · May 30, 2019, 6:50pm

You tried that with plt.imshow(images[i, 0]) or with something else?

ptrblck · May 30, 2019, 7:16pm

I tried it with this command. Is it not working?

Nimrod_Daniel · May 30, 2019, 7:23pm

No, I get a blank rectangle.

ptrblck · May 30, 2019, 7:25pm

In that case it would be probably easiest to update torchvision, as I can’t debug the code currently. Would that be possible or do you need to use an older version?

Nimrod_Daniel · May 30, 2019, 7:26pm

I forgot to mention that there’s also an error
File “”, line 3, in
plt.imshow(images[i, 0])

TypeError: list indices must be integers or slices, not tuple.

It’s a bit odd considering that this line showed the images when I first tried to show the images from the dataloader.

I just didn’t want the update to break older code. It would also require me to update to PyTorch 1.1 (now I have 0.4). If torchvision 0.2.1 makes problems then maybe I should update, though I prefer not to do it now.