Hi, friends,
The user case of my self-created PyTorch Deep Learning model is based on patients’ medical-appoint booking behaviours. In details, it is to predict how many days in advance (except weekends and public holidays) a patient will book a medical appointment. The range is 2 to 20 days.
Actually this is quite a simple user case. However, the cater rate of my model is all the way very low: even after 1,000 epochs, the accuracy rate is only 30.9%, and the average loss is still as high as 2.058354.
In my model, the data tensor is indeed a vector composed of the following 7 fields:
- Gender: 1 – male, 2 – female
- Age
- Area: I take the first 3 digits of a patient’s residential postal codes and map to an integer from 0 to 999.
- Medical Examination: 1 – yes, 0 – no
- Blood Test: 1 – yes, 0 – no
- Urine Test: 1 – yes, 0 – no
- Fasting: 1 – yes, 0 – no
The labels of my model are taken by subtracting the booking-in-advance days by 2, so I get the range from 0 to 18, totally 19 categories.
Totally I prepare 377 training data and 94 testing data for the deep-learning model.
Basically my deep-learning model just follows the Fashion-MNIST learning model from the tutorials in pytorch.org official website (see the hyperlink Optimizing Model Parameters — PyTorch Tutorials 1.12.0+cu102 documentation). In the model, I also take the following parameters: the size of middle layer in the Neural Network is 512, the learning rate is 0.01 and the batch size is 10. (At the end of this post, I will attach the full set of my python source codes for your reference).
Can any experts help me to analyse and diagnose why my cater rate is so low? I feel that the low accuracy rate may be caused by one or multiple reasons as below:
- Training data size is too small.
- Running epochs (so far 1000 epochs) are not enough.
- The learning model is not efficient enough (do I need to consider some other algorithms the model? e.g., is the loss function good enough)?
- Other parameters, e.g., batch size, learning rate, and so on, may be misconfigured.
Thanks a lot for any help. If anybody needs to get more info to investigate the issue, please kindly let me know, and I will try my best to provide.
My source codes of the Pytorch deep-learning model:
import torch
from torch import nn
from torch.utils.data import Dataset, DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda
import numpy as np
import pandas as pd
class CustomDataset(Dataset):
‘Characterizes a dataset for PyTorch’
def init(self, csv_file):
“”"
Args:
csv_file (string): Path to the csv file.
“”"
raw_data = pd.read_csv(csv_file)
raw_data = torch.tensor(raw_data.to_numpy())
x_size = list(raw_data.size())[1]
self.data_tensor = raw_data[:,:x_size-1].clone()
self.data_tensor = self.data_tensor.type(torch.float32)
self.label_tensor = raw_data[:,x_size-1:].clone()
self.label_tensor = self.label_tensor.flatten()
print(f"self.label_tensor.size() = {self.label_tensor.size()}")
def len(self):
‘Denotes the total number of samples’
return len(self.data_tensor)
def getitem(self, index):
data, label = self.data_tensor[index], self.label_tensor[index]
return data, label
training_data_csv_file = “D:\Tools\PyTorch\Deep-Learning-Model\input_data\training_data.csv”
training_data = CustomDataset(training_data_csv_file)
test_data_csv_file = “D:\Tools\PyTorch\Deep-Learning-Model\input_data\testing_data.csv”
test_data = CustomDataset(test_data_csv_file)
train_dataloader = DataLoader(training_data, batch_size=10)
test_dataloader = DataLoader(test_data, batch_size=10)
class NeuralNetwork(nn.Module):
def init(self):
super(NeuralNetwork, self).init()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(7, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 19),
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork()
def train_loop(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
print(f’size = {size}‘);
for batch, (X, y) in enumerate(dataloader):
# Compute prediction and loss
pred = model(X)
print(f’pred = {pred}’);
print(f’y = {y}‘);
loss = loss_fn(pred, y)
print(f’loss = {loss}’);
# Backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()
if batch % 100 == 0:
loss, current = loss.item(), batch * len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
learning_rate = 0.01
batch_size = 10
def test_loop(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)