Model training failing

I’m trying to create a “simple” model that takes a standard (3-channel) 36x36 image and outputs a probability that the image contains a certain object of interest.

I’ve found a simple model (I don’t remember where, can’t credit), I adjusted the parameters to fit our needs. Since we’re planning on using C++, I’m using tracing to load this model

This is my Pytorch model (Python):

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
       self.conv1 = torch.nn.Conv2d(3, 32, kernel_size=3, stride=1)
        self.conv2 = torch.nn.Conv2d(32, 64, kernel_size=3, stride=1)
        self.pool = torch.nn.MaxPool2d(kernel_size=2, stride=2)
        self.fc1 = torch.nn.Linear(64 * 7 * 7, 128)
        self.fc2 = torch.nn.Linear(128, 1)
        self.sigmoid = torch.nn.Sigmoid()

    def forward(self, x):
        x = self.pool(torch.relu(self.conv1(x)))
        x = self.pool(torch.relu(self.conv2(x)))
        x = x.view(-1, 64 * 7 * 7)
        x = torch.relu(self.fc1(x))
        x = self.sigmoid(self.fc2(x))
        return x.squeeze(1)

Since the model works on a “non-standard Pytorch” dataset, I had to write a custom dataset class:
(I’ll only include the overloaded get function, since it’s really the only “moving part”)
The dataset expects a “main” directory to contain two sub-directories “Positive” and “Negative”, each containing the appropriate images.

torch::data::Example<> get(const std::size_t index)
{
    constexpr auto labels = 2;
        
    const auto indexAdjusted = index / labels;
        
      
    const auto labelIndex = indexAdjusted % labels;
        
    const std::string_view labelName = labelIndex == 0 ?
    negativeDirectoryName : positiveDirectoryName;
        
    // `_bacteriaImageDirectory` is an `std::filesystem::path` to the "Main" directory
    const auto directory = _bacteriaImageDirectory / labelName;
        
    auto directoryIterator = std::filesystem::directory_iterator(directory);
    std::advance(directoryIterator, indexAdjusted);
       
    auto image = cv::imread(directoryIterator->path().string());
    cv::cvtColor(image, image, cv::ColorConversionCodes::COLOR_BGR2RGB);
        
        
    auto tensorImage = torch::from_blob(image.data, { 3, image.cols, image.rows }, torch::kByte)
    .to(torch::kFloat32) / 255.0f;
        
    const auto tensorLabel = torch::tensor(static_cast<float>(labelIndex), torch::kFloat32);
        
    return { tensorImage.clone(), tensorLabel };
};

This is my training loop, currently I run the training on 2160 images:

// Since `model` is a loaded "jit" module, I have to manually convert the module parameters 
// to a valid type that can be accepted by `torch::optim::Adam`
std::vector<torch::Tensor> modelParameters;
modelParameters.reserve(model.parameters().size());

std::transform(model.parameters().begin(), model.parameters().end(),
std::back_insert_iterator(modelParameters),
[](const auto& p)
{
    return p;
});

auto optimizer = torch::optim::Adam(modelParameters);

for(const auto& [image, label] : *dataLoader)
{
    optimizer.zero_grad();
    
    const auto output = model.forward({image}).toTensor();
    
    const auto prediction = output.item<float>();
    
    auto loss = torch::nn::functional::binary_cross_entropy(output, label);
    
    loss.backward();
    optimizer.step();
    
    
    std::clog  
    << "Loss: " << loss.item<double>() 
    << ", Prediction: " << prediction 
    << ", Label: " << static_cast<std::uint32_t>(label.item<float>()) 
    << '\n';
};

Attached are plotted results of the training. I don’t have much experience working with neural networks, but I don’t think training is supposed to look like that

What am I missing? I’ve been stuck here for a while, any help is very appreciated

I have somewhat of an update.

I changed torch::nn::functional::binary_cross_entropy to torch::nn::BCELoss.
I added 5 epochs. Before training I set the model to “Train” mode with model.train().*

My training loop now looks like so:

auto criterion = torch::nn::BCELoss();
model.train();

auto optimizer = torch::optim::Adam(modelParameters, torch::optim::AdamOptions{});

for (std::size_t i = 0; i < 5; ++i) 
{
    std::clog << "Epoch " << i + 1 << '/' << 5 << '\n';
    
    for(const auto& [image, label] : *dataLoader)
    {
        optimizer.zero_grad();
        
        const auto output = model.forward({image}).toTensor();
        
        const auto prediction = output.item<float>();
        
        auto loss = criterion->forward(output, label);
        
        loss.backward();
        optimizer.step();
        
        std::clog  
        << "Loss: " << loss.item<double>() 
        << ", Prediction: " << prediction 
        << ", Label: " << static_cast<std::uint32_t>(label.item<float>()) 
        << '\n';
    };
};

It’s mostly the same, other than the few changes stated above.
This has given me some improvement in the results.

Positive images hover around 70%-75%, Negative images results are typically <10%.
(The plotted loss graph still looks horrendous)

My question is: Is this right? Although I see improvements I don’t exactly understand why.
Is it because I “added” more training data?
Is the change from torch::nn::functional::binary_cross_entropy to torch::nn::BCELoss that drastic?
Or is model.train() that important?