RNN Gradients not updating

AndrewKirillo · July 10, 2017, 3:53pm

Hey all.
I’m very new to PyTorch, and fairly new to neural networks in general.
I was trying to build a neural net that can guess gender given name, and I based off of the PyTorch RNN tutorial that judges nationality.
I got the code to run without errors, but the loss hardly changes, making me think the weights aren’t updating…
Is this a problem with my input/output/target tensor setup? Or perhaps something wrong with my training function? I’m very lost, and any help would be appreciated
Here’s my code:
from future import unicode_literals, print_function, division
from io import open
import glob
import unicodedata
import string
import torch
import torchvision
import torch.nn as nn
import torch.optim as optim
import random
from torch.autograd import Variable

"""------GLOBAL VARIABLES------"""

all_letters = string.ascii_letters + " .,;'"
num_letters = len(all_letters)
all_names = {}
genders = ["Female", "Male"]

"""-------DATA EXTRACTION------"""

def findFiles(path):
    return glob.glob(path)

def unicodeToAscii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
        and c in all_letters
    )

# Read a file and split into lines
def readLines(filename):
    lines = open(filename, encoding='utf-8').read().strip().split('\n')
    return [unicodeToAscii(line) for line in lines]

for file in findFiles("/home/andrew/PyCharm/PycharmProjects/CantStop/data/names/*.txt"):
    gender = file.split("/")[-1].split(".")[0]
    names = readLines(file)
    all_names[gender] = names

"""-----DATA INTERPRETATION-----"""

def nameToTensor(name):
    tensor = torch.zeros(len(name), 1, num_letters)
    for index, letter in enumerate(name):
        tensor[index][0][all_letters.find(letter)] = 1
    return tensor

def outputToGender(output):
    gender, gender_index = output.data.topk(1)
    if gender_index[0][0] == 0:
        return "Female"
    return "Male"

"""------NETWORK SETUP------"""

class Net(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(Net, self).__init__()
        self.hidden_size = hidden_size
        #Layer 1
        self.Lin1 = nn.Linear(input_size+hidden_size, int((input_size+hidden_size)/2))
        self.ReLu1 = nn.ReLU()
        self.Batch1 = nn.BatchNorm1d(int((input_size+hidden_size)/2))
        #Layer 2
        self.Lin2 = nn.Linear(int((input_size+hidden_size)/2), output_size)
        self.ReLu2 = nn.ReLU()
        self.Batch2 = nn.BatchNorm1d(output_size)
        self.softMax = nn.LogSoftmax()
        #Hidden layer
        self.HidLin = nn.Linear(input_size+hidden_size, hidden_size)
        self.HidReLu = nn.ReLU()
        self.HidBatch = nn.BatchNorm1d(hidden_size)

    def forward(self, input, hidden):
        comb = torch.cat((input, hidden), 1)
        hidden = self.HidBatch(self.HidReLu(self.HidLin(comb)))
        output1 = self.Batch1(self.ReLu1(self.Lin1(comb)))
        output2 = self.softMax(self.Batch2(self.ReLu2(self.Lin2(output1))))
        return output2, hidden

    def initHidden(self):
        return Variable(torch.zeros(1, self.hidden_size))

NN = Net(num_letters, 128, 2)

"""------TRAINING------"""

def getRandomTrainingEx():
    gender = genders[random.randint(0, 1)]
    name = all_names[gender][random.randint(0, len(all_names[gender])-1)]
    gender_tensor = Variable(torch.LongTensor([genders.index(gender)]))
    name_tensor = Variable(nameToTensor(name))
    return gender_tensor, name_tensor, gender

def train(input, target):
    hidden = NN.initHidden()

    loss_func = nn.NLLLoss()

    alpha = 0.01

    NN.zero_grad()

    for i in range(input.size()[0]):
        output, hidden = NN(input[i], hidden)

    loss = loss_func(output, target)
    loss.backward()
    for w in NN.parameters():
        w.data.add_(-alpha, w.grad.data)

    return output, loss

for i in range(5000):
    gender_tensor, name_tensor, gender = getRandomTrainingEx()
    output, loss = train(name_tensor, gender_tensor)

    if i%500 == 0:
        print("Guess: %s, Correct: %s, Loss: %s" % (outputToGender(output), gender, loss.data[0]))

And here’s the output:

Guess: Male, Correct: Male, Loss: 0.6931471824645996
Guess: Male, Correct: Female, Loss: 0.7400936484336853
Guess: Male, Correct: Male, Loss: 0.6755779385566711
Guess: Female, Correct: Female, Loss: 0.6648257374763489
Guess: Male, Correct: Male, Loss: 0.6765623688697815
Guess: Female, Correct: Male, Loss: 0.7330614924430847
Guess: Female, Correct: Female, Loss: 0.6565149426460266
Guess: Male, Correct: Female, Loss: 0.6946508884429932
Guess: Female, Correct: Female, Loss: 0.6621525287628174
Guess: Male, Correct: Male, Loss: 0.6662092804908752

Process finished with exit code 0

hughperkins · July 10, 2017, 10:54pm

Generally speaking, if a net doesnt train, I’d try things like, well, conceptually, simplify things right down, until it’s simple enough to get working, then gradually make it coplicated again. It’s totally fine for example to train on just 1 example of either class. It’ll overtrain of coruse, but if it fails to overtrain, on just two exapmles, then theres a bug still to fix. If it does overtrain on 2 examples, then you can start adding more examples. Similarly, you can make the names just a few letters. Doing these two things will mean that verything happens much faster, which means you can find problems faster.