Wanting to make a simple age estimator

I am planning to make an age detector with 10 classes, each of them has range from 2-6 years old, 7-12 years old and so on.
I use pretrained model from resnet18. During training, I did not freeze the layers, instead, I just let it update all the parameters. The loss function I am using is cross-entropy, with Adam optimizer and lr-scheduler. The datasets contain 10000 images for training and about 3000 images for validation.

The problem that keeping me stuck is that although the training loss seems able to continuous decreased, but the validation loss is not. No matter how I change the hyper-parameters, It will drop until minimum 33 and bounce back to 60+, which is the initial value when I started training. The accuracy for validation is only at most 46%

This is my code. Please have a look at what is causing this problem. Or is there any problem with datasets? Such as not enough datasets for training etc?

    if train_mode:
        train_loader, test_loader = data_load()
        for iteration in range(iteration_start, epoch):
            time_start = time.time()
            for current_mode in ['train', 'valid']:
                total_loss = 0
                total_accuracy = 0
                if current_mode == 'train':
                    loader = train_loader
                    model.train()
                else:
                    loader = test_loader
                    model.eval()

                for batch in loader:
                    images, labels = batch
                    images = images.to(device)
                    labels = labels.to(device)

                    with torch.set_grad_enabled(current_mode == 'train'):
                        output = model(images)
                        loss = loss_function(output, labels)
                        total_loss += loss.item()
                        if current_mode == 'train':
                            optimizer.zero_grad()
                            loss.backward()
                            optimizer.step()
                        else:
                            accuracy = calculate_accuracy(output, labels)
                            total_accuracy += accuracy

                record_loss(total_loss, current_mode, iteration)
                if current_mode == 'valid':
                    scheduler.step(total_loss)  # total_loss or total_accuracy, based on which you want to enhance
                    record_accuracy(total_accuracy / len(loader), iteration)

This is my model

def custom_model():
    my_model = models.resnet18(pretrained=True)
    # for paras in my_model.parameters():
    #     paras.requires_grad = False

    num_fc_layer = my_model.fc.in_features
    custom_fc_layers = nn.Sequential(
        nn.BatchNorm1d(num_fc_layer),
        nn.Dropout(0.5),
        nn.Linear(num_fc_layer, num_class)
    )
    my_model.fc = custom_fc_layers

    return my_model

and this is my the state of optimizer and scheduler

if __name__ == '__main__':
    model = custom_model()
    model.to(device)
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.5, patience=1, verbose=True)
    loss_function = nn.CrossEntropyLoss()
    main()