Custom Sigmoid (the network is not trained)

I tried to write my custom layer for sigmoid. But it doesn’t work. The network is not trained. If I replace the activation function with the standard torch.nn.Sigmoid() then it starts learning and achieves great accuracy. I can’t figure out what’s wrong with my custom sigmoid :frowning: Help me please, guys! Thanks

Sigmoid custom module:

class SigmoidCustom(torch.nn.Module):
    def __init__(self):
        super(SigmoidCustom, self).__init__()
    
    def forward(self, x):
        return 1. / (1 + torch.exp(-x))

Network:

class LinearNet(torch.nn.Module):
    def __init__(self, in_features,  out_features, hid_neurons=200):
        super(LinearNet, self).__init__()
        self.lin1 = LinearCustom(in_features, hid_neurons, bias=True)
        self.act1 = SigmoidCustom()
        self.lin2 = LinearCustom(hid_neurons, hid_neurons, bias=True)
        self.act2 = SigmoidCustom()
        self.lin3 = LinearCustom(hid_neurons, out_features, bias=True)
        self.softmax = torch.nn.Softmax(dim=1)
    
    def forward(self, x):
        # x - bs x in_feat
        x = self.lin1(x) # bs x hid_neurons
        x = self.act1(x) # bs x hid_neurons
        x = self.lin2(x) # bs x hid_neurons
        x = self.act2(x) # bs x hid_neurons
        x = self.lin3(x) # bs x out_feat
        return x

    def inference(self, x):
        x = self.softmax(x)
        return x

Hi Yaroslav!

Your SigmoidCustom works for me:

>>> import torch
>>> torch.__version__
'1.9.0'
>>> _ = torch.manual_seed (2022)
>>> class SigmoidCustom(torch.nn.Module):
...     def __init__(self):
...         super(SigmoidCustom, self).__init__()
...
...     def forward(self, x):
...         return 1. / (1 + torch.exp(-x))
...
>>> s = torch.randn (5)
>>> t = s.clone()
>>> s.requires_grad = True
>>> t.requires_grad = True
>>> s.sigmoid().sum().backward()
>>> s.grad
tensor([0.2477, 0.2433, 0.2467, 0.2061, 0.2474])
>>> SigmoidCustom() (t).sum().backward()
>>> t.grad
tensor([0.2477, 0.2433, 0.2467, 0.2061, 0.2474])
>>> torch.allclose (s.grad, t.grad)
True

Best.

K. Frank

Hi Frank!
Thank you for your answer! Yes, I also checked that the result of my module is the same as the library. But for some reason, with my sigmoid module, the network is not trained. I still can’t figure out why :frowning: maybe in torch.nn.Sigmoid somehow works with gradients in a special way… idk

here is the whole network training code:

train_data = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=None)
test_data = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=None)

X_train, y_train = train_data.data, train_data.targets
X_test, y_test = test_data.data, test_data.targets
X_train = X_train.float()
X_test = X_test.float()
X_train = X_train.reshape(-1, 28 * 28)
X_test = X_test.reshape(-1, 28 * 28)

device = 'cuda:0' if torch.cuda.is_available() else 'cpu'

in_feat = 28 * 28
out_feat = 10
net = LinearNet(in_feat, out_feat, hid_neurons=25)
net = net.to(device)

optimizer = torch.optim.Adam(net.parameters(), lr=3e-4)
criterion = torch.nn.CrossEntropyLoss()

num_epoch = 500
batch_size = 100

history_loss = {'train': [], 'test': []}
history_acc = {'train': [], 'test': []}

def generate_batch(X, y, batch_size):
    idx = torch.randperm(len(X))[:batch_size]
    return X[idx], y[idx]

iter_per_epoch = int(np.ceil(len(X_train) / batch_size))

X_test, y_test = X_test.to(device), y_test.to(device)

for epoch in range(num_epoch):

    total_epoch_loss = 0.0
    total_epoch_acc = 0.0
    for iter in range(iter_per_epoch):
        X_batch, y_batch = generate_batch(X_train, y_train, batch_size)
        X_batch = X_batch.to(device)
        y_batch = y_batch.to(device)

        output = net(X_batch)
        pred = torch.argmax(output, dim=1)
        batch_acc = (pred == y_batch).float().mean()
        total_epoch_acc += batch_acc

        loss = criterion(output, y_batch)
        total_epoch_loss += loss

        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

    total_epoch_acc /= iter_per_epoch
    total_epoch_loss /= iter_per_epoch
    history_acc['train'].append(total_epoch_acc)
    history_loss['train'].append(total_epoch_loss)

    # validation
    with torch.no_grad():
        output = net(X_test)
        pred = torch.argmax(output, dim=1)
        acc = (pred == y_test).float().mean()
        loss = criterion(output, y_test)
        history_acc['test'].append(acc)
        history_loss['test'].append(loss)

        if epoch % 10 == 9: 
            print('Epoch {}, validation acc = {:.4f}'.format(epoch+1, acc))

Hi Yaroslav!

Here’s a little more information:

It turns out the the gradient for torch.sigmoid() is somewhat worse
than it could be. Your SigmoidCustom displays similar imperfect
behavior, but is rather better than torch.sigmoid().

See this post:

Nonetheless, I think it is unlikely that this difference between your
SigmoidCustom and pytorch’s standard implementation explains
the issue you are seeing. Both versions of sigmoid() are basically
okay, and the imperfection in their gradients is more of an edge case.

I would only think this would matter if your network were close to
training unstably.

I would suggest trying something like this:

class LinearNet(torch.nn.Module):
    def __init__(self, in_features,  out_features, hid_neurons=200, useCustom = True):
        super(LinearNet, self).__init__()
        if  useCustom:
            sig = SigmoidCustom()
        else:
            sig = torch.nn.Sigmoid()
        self.lin1 = LinearCustom(in_features, hid_neurons, bias=True)
        self.act1 = sig
        self.lin2 = LinearCustom(hid_neurons, hid_neurons, bias=True)
        self.act2 = sig
        self.lin3 = LinearCustom(hid_neurons, out_features, bias=True)
        self.softmax = torch.nn.Softmax(dim=1)

This way you can switch between the standard and custom sigmoid()
with a single flag and not risk letting some other bug creep in by changing
additional code.

Best.

K. Frank