Hi, friends,
Now I am exploring another PyTorch Deep-Learning model upon patients’ medical-appoint booking behaviours. This time it is to predict which weekday (from Monday to Friday) a patient will book a medical appointment. The range is 0 (for Monday) to 4 (for Friday).
This is not a so complex user case. However, the accuracy rate of my model is still quite low: even after 1,000 epochs, the accuracy rate is only 25.3%, and the average loss is 1.608337.
However, low accuracy rate is not the whole problem. To be worse, after debugging the deep-learning model, I found that somehow the output values of the model actually converges to 4 (Friday). It is understandable that the label 4 gets the highest percentage (is this due to a “relexable Friday” many people appreciate?) among the training data (26.2%) and the testing data (25.3%, that’s why the accuracy rate is 25.3%!), but all predicted results converging to a single label is still quite abnormal.
The scenario of all-predicted-result-convert-to-a-single-label is shown in the output text below:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
testing–>pred = tensor([[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038]])
testing–>y = tensor([0, 2, 0, 2, 2, 2, 4, 1, 4, 0])
testing–>loss = 1.7076622247695923
testing–>pred.argmax(1) = tensor([4, 4, 4, 4, 4, 4, 4, 4, 4, 4])
testing–>(pred.argmax(1) == y) = tensor([False, False, False, False, False, False, True, False, True, False])
testing–>corr = 2.0
testing–>pred = tensor([[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038]])
testing–>y = tensor([0, 4, 4, 4, 1, 1, 4, 2, 1, 4])
testing–>loss = 1.4987714290618896
testing–>pred.argmax(1) = tensor([4, 4, 4, 4, 4, 4, 4, 4, 4, 4])
testing–>(pred.argmax(1) == y) = tensor([False, True, True, True, False, False, True, False, False, True])
testing–>corr = 5.0
testing–>pred = tensor([[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038],
[-0.4787, 0.1151, 0.0034, 0.0423, 0.3038]])
testing–>y = tensor([2, 1, 2, 4, 3, 4, 0, 4, 4])
testing–>loss = 1.5375866889953613
testing–>pred.argmax(1) = tensor([4, 4, 4, 4, 4, 4, 4, 4, 4])
testing–>(pred.argmax(1) == y) = tensor([False, False, False, True, False, True, False, True, True])
testing–>corr = 4.0
Test Error:
Accuracy: 25.3%, Avg loss: 1.608337
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
In my model, the data tensor is a vector composed of the following 7 fields:
- Gender: 1 – male, 2 – female
- Age
- Area: I take the first 3 digits of a patient’s residential postal codes and map to an integer from 0 to 999.
- Medical Examination: 1 – yes, 0 – no
- Blood Test: 1 – yes, 0 – no
- Urine Test: 1 – yes, 0 – no
- Fasting: 1 – yes, 0 – no
The labels of my model are from 0 (for Monday) to 4 (for Friday), totally 5 categories.
Totally I prepare 397 training data and 99 testing data for the deep-learning model.
My deep-learning model has followed the Fashion-MNIST learning model from the tutorials in pytorch.org official website (see the hyperlink Optimizing Model Parameters — PyTorch Tutorials 1.12.0+cu102 documentation). In the model, I also take the following parameters: the size of middle layer in the Neural Network is 512, the learning rate is 0.01 and the batch size is 10. (At the end of this post, I will attach the full set of my python source codes for your reference).
Can anybody help me to analyse and diagnose why all predicted values will converge to a single value? Is this due to the “overfitting” issue or the learning rate is still too high?
Thanks for the help. If you need more info to investigate the issue, please just let me know.
My source codes of the Pytorch deep-learning model:
import torch
from torch import nn
from torch.utils.data import Dataset, DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor, Lambda
import numpy as np
import pandas as pd
class CustomDataset(Dataset):
‘Characterizes a dataset for PyTorch’
def init(self, csv_file):
“”"
Args:
csv_file (string): Path to the csv file.
“”"
raw_data = pd.read_csv(csv_file)
raw_data = torch.tensor(raw_data.to_numpy())
x_size = list(raw_data.size())[1]
self.data_tensor = raw_data[:,:x_size-1].clone()
self.data_tensor = self.data_tensor.type(torch.float32)
self.label_tensor = raw_data[:,x_size-1:].clone()
self.label_tensor = self.label_tensor.flatten()
print(f"self.label_tensor.size() = {self.label_tensor.size()}")
def len(self):
‘Denotes the total number of samples’
return len(self.data_tensor)
def getitem(self, index):
data, label = self.data_tensor[index], self.label_tensor[index]
return data, label
training_data_csv_file = “D:\Tools\PyTorch\Medex-Deep-Learning-Model\input_data\training_data.csv”
training_data = CustomDataset(training_data_csv_file)
test_data_csv_file = “D:\Tools\PyTorch\Medex-Deep-Learning-Model\input_data\testing_data.csv”
test_data = CustomDataset(test_data_csv_file)
train_dataloader = DataLoader(training_data, batch_size=10)
test_dataloader = DataLoader(test_data, batch_size=10)
class NeuralNetwork(nn.Module):
def init(self):
super(NeuralNetwork, self).init()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(7, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 5),
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
model = NeuralNetwork()
def train_loop(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
print(f’size = {size}‘);
for batch, (X, y) in enumerate(dataloader):
# Compute prediction and loss
pred = model(X)
print(f’pred = {pred}’);
print(f’y = {y}‘);
loss = loss_fn(pred, y)
print(f’loss = {loss}’);
# Backpropagation
optimizer.zero_grad()
loss.backward()
optimizer.step()
if batch % 100 == 0:
loss, current = loss.item(), batch * len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")
learning_rate = 0.01
batch_size = 10
def test_loop(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
pred = model(X)
print(f'testing-->pred = {pred}');
print(f'testing-->y = {y}');
loss = loss_fn(pred, y).item()
print(f'testing-->loss = {loss}');
test_loss += loss
print(f'testing-->pred.argmax(1) = {pred.argmax(1)}');
print(f'testing-->(pred.argmax(1) == y) = {(pred.argmax(1) == y)}');
corr = (pred.argmax(1) == y).type(torch.float).sum().item()
print(f'testing-->corr = {corr}');
correct += corr
test_loss /= num_batches
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)