Overfitting problem during voice pathology classification

hello, i am trying to perform binary classification of voice pathologies(healthy/pathology). I use voice samples of /a/ in normal pitch from SVD database
below is my pytorch code:

data_transforms = {
    'train': transforms.Compose([
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

### Define the data directory
data_dir = 'D:\\SVD_jan_19\\a_normal_pitch_data\\1_Model_Input'

### Create data loaders
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val']}

dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=32, shuffle=True) for x in ['train', 'val']}
dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}
print(dataset_sizes)

class_names = image_datasets['train'].classes
class_names

### Load the pre-trained  model
model = models.vgg16(pretrained=True)

### Retrain all the layers
for name, param in model.named_parameters():
    param.requires_grad = True

### Modify the final fully connected layer for binary classification
num_features = model.classifier[6].in_features
model.classifier[6] = nn.Linear(num_features, 2)

### Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9) 

### Move the model to the GPU if available
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = model.to(device)

### Training loop
num_epochs = 10
for epoch in range(num_epochs):
    for phase in ['train', 'val']:
        if phase == 'train':
            model.train()
        else:
            model.eval()

        running_loss = 0.0
        running_corrects = 0

        for inputs, labels in dataloaders[phase]:
            inputs = inputs.to(device)
            labels = labels.to(device)

            optimizer.zero_grad()

            with torch.set_grad_enabled(phase == 'train'):
                outputs = model(inputs)
                #print(outputs.shape)
                _, preds = torch.max(outputs, 1)
                loss = criterion(outputs, labels)

                if phase == 'train':
                    loss.backward()
                    optimizer.step()

            running_loss += loss.item() * inputs.size(0)
            running_corrects += torch.sum(preds == labels.data)

        epoch_loss = running_loss / dataset_sizes[phase]
        epoch_acc = running_corrects.double() / dataset_sizes[phase]

        print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')

print("Training complete!")

I am just running a pytorch example. my dataset is balanced (250 samples of healthy + 250 samples of pathology). i use mel spectrogram images as input after resizing to 224 x 224.
output:

{'train': 400, 'val': 100}
train Loss: 0.6902 Acc: 0.5875
val Loss: 0.6375 Acc: 0.6300
train Loss: 0.5373 Acc: 0.7425
val Loss: 0.8208 Acc: 0.6200
train Loss: 0.5475 Acc: 0.7375
val Loss: 0.6349 Acc: 0.6900
train Loss: 0.5105 Acc: 0.7550
val Loss: 0.6297 Acc: 0.6600
train Loss: 0.4729 Acc: 0.7750
val Loss: 0.7141 Acc: 0.5800
train Loss: 0.4320 Acc: 0.8050
val Loss: 0.7001 Acc: 0.7000
train Loss: 0.4262 Acc: 0.8275
val Loss: 0.6138 Acc: 0.6800
train Loss: 0.3604 Acc: 0.8350
val Loss: 0.6550 Acc: 0.7100
train Loss: 0.3314 Acc: 0.8600
val Loss: 0.6402 Acc: 0.7000
train Loss: 0.2681 Acc: 0.9075
val Loss: 0.7375 Acc: 0.7100
Training complete!

observation: my training seems to go good. but validation is not proper. i think it is the overfititng problem. i face the same problem when i use any of the pretrained models. i tried resnet and others too. in the literature studeies of reserach articles, they seem to be succesful with the classification using pretrained models with more than 90% accuracy. any recommendations from the community.