I am new to Pytorch and image classification/training along with all the array manipulations as well, so please bear with me.
I am trying to train Fashion MNIST with a machine learning model/library from scikit-learn and the same dataset with a CNN model. It is an exercise for me to understand what is happening and to gain some practice.
I am obtaining the data using "fetch_openml(name = “Fashion-MNIST”
>data1 = fetch_openml(name ="Fashion-MNIST") >data1.target.shape ##the classes , "y" (70000,) >np.shape(data1.data) (70000, 784)
So turns out the above data is a pandas Dataframe. I converted it to numpy with:
Xdata = data1.data.to_numpy() Ydata = data1.target.to_numpy()
x_train, x_test, y_train, y_test = train_test_split(Xdata, Ydata, test_size=0.2, train_size=0.8, random_state=2)
Custom Dataset for Dataloader:
class CData(Dataset): def __init__(self,x_data,y_data): self.x_data, self.y_data = torch.from_numpy(x_data), torch.from_numpy(y_data) def __len__(self): return len(self.x_data) def __getitem__(self, i): return self.x_data[i], self.y_data[i]
trainloader = DataLoader(CData(x_train,y_train),batch_size=32) testloader = DataLoader(CData(x_test,y_test))
Excerpt of training loop:
for epoch in range(epochs): for images, labels in trainloader: images, labels = images.to(device), labels.to(device) images = images.reshape([32,1,28,28]) # print(type(images)) # warp input images in a Variable wrapper images = Variable(images) optimizer.zero_grad() outputs = net(images.float()) # Calculate the loss loss = F.cross_entropy(outputs,labels.long()) # Calculate gradient w.r.t the loss loss.backward() # Optimizer takes one step optimizer.step() # get the predicted class from the maximum value in the output-list of class scores pred = outputs.argmax(dim=1, keepdim=True) correct = pred.eq(labels.view_as(pred)).sum().item() train_acc = correct/batch_size # calculate the accuracy #scheduler.step() print(train_acc)
My results seem a little weird, so I want to know:
- How do I debug/know that I have the images loaded properly into the Tensor Dataset?
- Is there a more efficient/better way to do the data loading/ reshaping?
- Is there an inbuilt CNN model I can use for my dataset to see if there is an issue with my dataset or with my model? If there is, how would I go about using it?
I am converting from numpy/scikit-learn to pytorch Dataset, because I started with the scikit model first.