Loading 2 classes from FashionMNIST,

Hello

I have an assignment due and I am positively freaking out, I have spent 5 hours trying to load 2 classes in from the FashionMNIST dataset, but I simply cannot figure it out.

I have tried

idx = torch.as_tensor(trainset.targets) == 1
idx += torch.as_tensor(trainset.targets) == 7
dset_train = torch.utils.data.dataset.Subset(trainset, np.where(idx==1)[0])

this

but is that alone enough?

I use it to then load a batch of 8 images from the trouser/sneaker classes, which ends up working, but THEN I try applying StandardScaler() or PCA, and it will not work.

It states that ‘only one element tensors can be converted to Python scalars’ and if I try using the subset in any code, it throws attribute errors e.g. AttributeError: ‘Subset’ object has no attribute ‘numpy’

I am trying to apply PCA to only 2 classes (trouser/sneaker) from the FashionMNIST dataset.

Help please.

Yes, it should be enough to only select the samples with class1 and class7 as seen here:

idx = torch.as_tensor(dataset.targets) == 1
idx += torch.as_tensor(dataset.targets) == 7
dset_train = torch.utils.data.dataset.Subset(dataset, np.where(idx==1)[0])
loader = torch.utils.data.DataLoader(dset_train, batch_size=64)

for data, target in loader:
    print(target)

How are you trying to apply the StandardScaler?
As described in the docs you would need to fit it first and can then apply it on numpy arrays.
Passing a Dataset will most likely not work.

I applied it like this

scaler = preprocessing.StandardScaler()

scaleddset = scaler.fit_transform(dset_train)

but then it gives me the 'only one element tensors can be converted to Python scalars’ error so… I just assumed it was because of the way I was loading the data…?

How would I go about fitting it first and then applying on numpy arrays? ):

Sorry about this - also, when I try to convert the dset_train to a numpy array, it gives me the ‘Subset’ object has no attribute ‘numpy’’ error ?

I have also tried converting the trainset to numpy array before making subset, but that did not work either. It gives me a ‘ValueError: too many values to unpack (expected 2)’ error.

StandardScaler expects a numpy array, not a torch.utils.data.Dataset as its input.
The FashionMNIST dataset stores the data as a numpy array in its internal .data attribute.
Since you want to apply scikit-learn preprocessing methods to it, I would recommend to write a custom Dataset by deriving from FashionMNIST and to fit_transform the StandardScaler in the __init__ method and transform the data in the __getitem__.
If you want to use the same scaler for the training and validation (and test) datasets, you could create the object once and pass it to the custom dataset implementations.

I don’t really know how to go about writing a custom Dataset with those things sdfhksjdg

I am only using the training data for the visualisation with the PCA

You could take a look at this tutorial, which shows how to write a custom Dataset. Since this task is an assignment, I’m not comfortable providing the code here.

1 Like

Been trying but still have yet to get it working properly ): thank you though

i have gotten this far:

class mydataset(Dataset):
    def __init__(self, features, labels):
        self.features = features
        self.labels = labels
        
    def __len__(self):
        return len(self.features)
    
    def __getitem__(self, index):
        image = self.features[idx]
        image = image.numpy()
        label = self.labels[idx]
        if label == 1:
            return image, label
        elif label == 7:
            return image, label

i’m not sure the last bit is right (the if statement)
and it states that image does not have a numpy() attribute