RMSprop optimiser doesn't work with BCE loss

When I run this code with the RMSprop optimiser it doesn’t learn anything while if I use Adam it works. How can I make it work? Or what can be the cause?I would really appreciate some help

I think it has to do with the model because when I do transfer learning. Then the model works appropriate but when I apply my own model it doesn’t work. But I couldn’t figure out what exactly causes the problem

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
from torchvision.datasets import ImageFolder

def get_dataset(train_path, val_path, BATCHSIZE):
    transform_train = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.RandomHorizontalFlip(),
        transforms.RandomRotation(20),
        transforms.RandomAffine(degrees=0, translate=(0.2, 0.2), scale=(0.8, 1.2), shear=0.2),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])

    transform_valid = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])

    train_dataset = ImageFolder(train_path, transform=transform_train)
    val_dataset = ImageFolder(val_path, transform=transform_valid)

    train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=BATCHSIZE, shuffle=True)
    valid_loader = torch.utils.data.DataLoader(val_dataset, batch_size=BATCHSIZE, shuffle=True)

    return train_loader, valid_loader


def build_model():
    model = nn.Sequential(
        nn.Conv2d(3, 32, kernel_size=3, padding=1),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=2, stride=2),
        nn.Conv2d(32, 64, kernel_size=3, padding=1),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=2, stride=2),
        nn.Conv2d(64, 128, kernel_size=3, padding=1),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=2, stride=2),
        nn.Flatten(),
        nn.Linear(128 * 28 * 28, 1),
        nn.Sigmoid()
    )

    return model


def train_model(train_loader, valid_loader, num_epochs=10, learning_rate=0.001):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = build_model().to(device)
    criterion = nn.BCELoss()
    optimizer = optim.RMSprop(model.parameters(), lr=learning_rate)

    for epoch in range(num_epochs):
        model.train()
        train_loss = 0.0

        for images, labels in train_loader:
            images = images.to(device)
            labels = labels.float().unsqueeze(1).to(device)

            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            train_loss += loss.item() * images.size(0)

        train_loss /= len(train_loader.dataset)

        print(f"Epoch [{epoch+1}/{num_epochs}], Train Loss: {train_loss:.4f}")

    model.eval()
    valid_loss = 0.0
    correct = 0

    with torch.no_grad():
        for images, labels in valid_loader:
            images = images.to(device)
            labels = labels.float().unsqueeze(1).to(device)

            outputs = model(images)
            loss = criterion(outputs, labels)
            valid_loss += loss.item() * images.size(0)
            predicted = (outputs >= 0.5).float()

train_loader, valid_loader = get_dataset(train_path, val_path, 32)
train_model(train_loader, valid_loader, num_epochs=10, learning_rate=0.001)

Could you describe the difference between this post and this topic or is it basically a double post with some rephrasing?

The problem is the same, but I have rephrased the problem and narrowed it down. Furthermore, the code is different and smaller. I will delete my previous post if that is not allowed. Both actually describe the same problem. I thought my previous post might be too long and therefore this post might be better for this forum.

Update: I have deleted my previous post. The problem as described here is the same problem. To avoid confusion, I have updated it. Sorry for my double post.

Can please someone help me with the problem

Update: I see this code works but when using BCEWithLogitsLoss instead of BCELoss. I don’t now why but at least it works.