My Discriminator model collapsed and always returns 1s

I composed a Discriminator nn model alike the one from the DCGAN tutorial. With main differences that I use biases and drop the batch normalization.

When training starts the model is showing a good loss optimization, but for some reason after several training batches the Discriminator suddenly curved up to the 1s producer with increased loss:

[epoch - 0, 78/3166]
Discriminator real mean: 0.995487093925, Discriminator fake mean: 0.000025399677
discriminatorRealError: 0.288825571537, discriminatorFakeError: 0.001625579316
[epoch - 0, 79/3166]
Discriminator real mean: 1.000000000000, Discriminator fake mean: 1.000000000000
discriminatorRealError: 0.000000000000, discriminatorFakeError: 64.000000000000

discriminatorRealError - the loss for real data, discriminatorFakeError - the loss for ‘fake’ data

After the model collapsed the final layer sigmoid always return 1, which makes all gradients in the model equal to zeros and it stops train.

Batch normalization layers help for a while but do not fix the problem, so I purpesly dropped them to easily reproduce the issue.
I use biases because I’m experimenting and don’t see reason to drop them.

My code:

import torch
import torch.nn as nn
import torchvision
import numpy as np
import matplotlib.pyplot as plt
import torchvision.transforms as transforms
import pathlib
from torchinfo import summary

imageWidth = 178
imageHeight = 218
batch_size = 64
path = "C:\\Users\\PC\\Downloads\\celeba"
netPath = pathlib.Path(__file__).parent.resolve()

class DiscriminatorNet:
    def __init__(self):
        self.model = nn.Sequential(
            nn.Conv2d(3, 24, 5, stride=2, bias=True),
            nn.LeakyReLU(),
            nn.Conv2d(24, 24, 5, stride=2, bias=True),
            nn.LeakyReLU(),
            nn.Conv2d(24, 36, 5, stride=2, bias=True),
            nn.LeakyReLU(),
            nn.Conv2d(36, 68, 5, stride=3, bias=True),
            nn.LeakyReLU(),
            nn.Conv2d(68, 1, kernel_size=(7, 5), bias=True),
            nn.Sigmoid()
        )
    
    def forward(self, input):
        return self.model(input)

if __name__ == '__main__':
    device = torch.device("cuda:0")

    discriminator = DiscriminatorNet()
    discriminator.model.to(device)

    # summary(discriminator.model, input_size=(batch_size, 3, imageHeight, imageWidth))
    # exit()

    discriminator.model.load_state_dict(torch.load(netPath / 'discriminatorBefore.pth', weights_only=True))
    print(f'loaded from {netPath}')

    transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
    dataset = torchvision.datasets.ImageFolder(root=path, transform=transform)

    dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size,
                                             shuffle=False, num_workers=2)

    train = True

    if train == True:
        discriminatorOpt = torch.optim.Adam(discriminator.model.parameters(), betas=(0.5, 0.999))

        for epoch in range(5):
            for i, batch in enumerate(dataloader):
                inputs = batch[0].to(device)

                # labels
                trueLabels = torch.full((inputs.size(0),), 1, dtype=torch.float32, device=device)
                fakeLabels = torch.full((inputs.size(0),), 0, dtype=torch.float32, device=device)

                discriminatorOpt.zero_grad()

                # calculating discriminator loss for the real data
                discriminatorRealsOutput = discriminator.model(inputs).view(-1)
                discriminatorRealError = torch.abs(discriminatorRealsOutput - trueLabels).sum()
                discriminatorRealError.backward()
                
                # calculating discriminator loss for the 'fake' data
                noise = torch.rand(inputs.size(0), 3, imageHeight, imageWidth, device=device)
                discriminatorFakeOutputs = discriminator.model(noise).view(-1)
                discriminatorFakeError = torch.abs(discriminatorFakeOutputs - fakeLabels).sum()
                discriminatorFakeError.backward()

                #gradD = discriminator.model[-2].weight.grad
                discriminatorOpt.step()

                print(f'[epoch - {epoch}, {i}/{len(dataloader)}]')
                print(f'Discriminator real mean: {discriminatorRealsOutput.view(-1).mean():.12f}, Discriminator fake mean: {discriminatorFakeOutputs.view(-1).mean():.12f}')
                print(f'discriminatorRealError: {discriminatorRealError:.12f}, discriminatorFakeError: {discriminatorFakeError:.12f}')
            

This behavior isn’t stably reproducing and depend what weights and biases PyTorch generated for the Discriminator while initializing. This is what the discriminator model looks like in my example before and after the training.

It would be nice to know how can I profile this or to hear that I’m doing something fundamentally wrong. The model is trying to outsmart me I don’t like that.