hello, i am trying to perform binary classification of voice pathologies(healthy/pathology). I use voice samples of /a/ in normal pitch from SVD database
below is my pytorch code:
data_transforms = {
'train': transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
'val': transforms.Compose([
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
}
### Define the data directory
data_dir = 'D:\\SVD_jan_19\\a_normal_pitch_data\\1_Model_Input'
### Create data loaders
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=32, shuffle=True) for x in ['train', 'val']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
print(dataset_sizes)
class_names = image_datasets['train'].classes
class_names
### Load the pre-trained model
model = models.vgg16(pretrained=True)
### Retrain all the layers
for name, param in model.named_parameters():
param.requires_grad = True
### Modify the final fully connected layer for binary classification
num_features = model.classifier[6].in_features
model.classifier[6] = nn.Linear(num_features, 2)
### Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
### Move the model to the GPU if available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = model.to(device)
### Training loop
num_epochs = 10
for epoch in range(num_epochs):
for phase in ['train', 'val']:
if phase == 'train':
model.train()
else:
model.eval()
running_loss = 0.0
running_corrects = 0
for inputs, labels in dataloaders[phase]:
inputs = inputs.to(device)
labels = labels.to(device)
optimizer.zero_grad()
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
#print(outputs.shape)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
if phase == 'train':
loss.backward()
optimizer.step()
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
epoch_loss = running_loss / dataset_sizes[phase]
epoch_acc = running_corrects.double() / dataset_sizes[phase]
print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')
print("Training complete!")
I am just running a pytorch example. my dataset is balanced (250 samples of healthy + 250 samples of pathology). i use mel spectrogram images as input after resizing to 224 x 224.
output:
{'train': 400, 'val': 100}
train Loss: 0.6902 Acc: 0.5875
val Loss: 0.6375 Acc: 0.6300
train Loss: 0.5373 Acc: 0.7425
val Loss: 0.8208 Acc: 0.6200
train Loss: 0.5475 Acc: 0.7375
val Loss: 0.6349 Acc: 0.6900
train Loss: 0.5105 Acc: 0.7550
val Loss: 0.6297 Acc: 0.6600
train Loss: 0.4729 Acc: 0.7750
val Loss: 0.7141 Acc: 0.5800
train Loss: 0.4320 Acc: 0.8050
val Loss: 0.7001 Acc: 0.7000
train Loss: 0.4262 Acc: 0.8275
val Loss: 0.6138 Acc: 0.6800
train Loss: 0.3604 Acc: 0.8350
val Loss: 0.6550 Acc: 0.7100
train Loss: 0.3314 Acc: 0.8600
val Loss: 0.6402 Acc: 0.7000
train Loss: 0.2681 Acc: 0.9075
val Loss: 0.7375 Acc: 0.7100
Training complete!
observation: my training seems to go good. but validation is not proper. i think it is the overfititng problem. i face the same problem when i use any of the pretrained models. i tried resnet and others too. in the literature studeies of reserach articles, they seem to be succesful with the classification using pretrained models with more than 90% accuracy. any recommendations from the community.