My first model - can't make it work

Hi everyone,

I’m just starting out with NNs and for my first NN written from scratch, I was gonna try to replicate the net in this tutorial NLP From Scratch: Classifying Names with a Character-Level RNN — PyTorch Tutorials 1.7.1 documentation, but with a dataset, a dataloader and an actual rnn unit.

The following is my current code:

import os
import torch
import torch.nn as nn
import torch.nn.functional as F
import string
from torch.utils.data import DataLoader, Dataset
import numpy as np

class RNN(nn.Module):

    def __init__(self):
        super(RNN, self).__init__()
        self.rnn = nn.RNN(87, 128, batch_first=True)
        self.linear = nn.Linear(128, 1)

    def forward(self, x, h):
        y, h = self.rnn(x, h)
        y = self.linear(y)
        y = F.softmax(y)
        return y, h


class NameDataset(Dataset):
    def __init__(self, file_path):
        self.names = []
        self.labels = []
        self.all_names = []
        self.all_categories = []
        self.all_letters = []
        self.__readfiles(file_path)

        self.labels_to_index = {}

        for i, label in enumerate(self.all_categories):
            self.labels_to_index[label] = i
        self.labels = [self.labels_to_index[l] for l in self.labels]

        self.letter_to_index = {}
        self.all_letters.sort()

        for i, letter in enumerate(self.all_letters):
            self.letter_to_index[letter] = i

        for name in self.all_names:
            self.names.append(self.__encode_name(name))


    def __len__(self):
        return len(self.names)

    def __getitem__(self, index):
        name = self.names[index]
        label = self.labels[index]
        return name, label

    def __readfiles(self, file_path):
        for filename in os.listdir(file_path):
            label = os.path.splitext(os.path.basename(filename))[0]
            self.all_categories.append(label)
            with open(os.path.join(file_path, filename), "r", encoding='utf-8') as f:
                for name in f.read().strip().split('\n'):
                    self.all_names.append(name)
                    self.labels.append(label)
                    for letter in name:
                        if letter not in self.all_letters:
                            self.all_letters.append(letter)

    def __encode_name(self, name):
        oh_name = np.zeros((len(name), len(self.all_letters)))
        for i, char in enumerate(name):
            oh_char = np.zeros(len(self.all_letters))
            oh_char[self.letter_to_index[char]] = 1
            oh_name[1] = oh_char
        return oh_name

dataset = NameDataset(r"data/names")
train_loader = DataLoader(dataset, batch_size=1, shuffle=True)

rnn = RNN()

criterion = nn.CrossEntropyLoss
optimizer = torch.optim.Adam(rnn.parameters(), lr=0.0001)
epochs = 10000


for epoch in range(1, epochs+1):
    h0 = torch.zeros(1, 1, 128)
    for x, y in train_loader:
        h = h0
        optimizer.zero_grad()
        y_hat, h = rnn(x, h)

        loss = criterion(y_hat, y)
        loss.backward()
        optimizer.step()

    print('%d %d%% %.4f' % (epoch, epoch / epochs * 100, loss))

I realize there is quite a bit to refactor besides my problem, but for now I’m wondering why it’s not working at all.

I’m getting the following exception:

C:\ProgramData\Anaconda3\envs\pythonProject\python.exe H:/PycharmProjects/pythonProject/RNNFromScratch.py
Traceback (most recent call last):
  File "H:/PycharmProjects/pythonProject/RNNFromScratch.py", line 91, in <module>
    y_hat, h = rnn(x, h)
  File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "H:/PycharmProjects/pythonProject/RNNFromScratch.py", line 17, in forward
    y, h = self.rnn(x, h)
  File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\torch\nn\modules\rnn.py", line 234, in forward
    result = _impl(input, hx, self._flat_weights, self.bias, self.num_layers,
RuntimeError: expected scalar type Double but found Float

Any help would be much appreciated!

Hi,

I think the issue is that you are loading your data using numpy and by default numpy uses float64 data type which is double in PyTorch. I think if you add

name = name.astype(np.float32)
label = label.astype(np.float32)

Before returning an item, would solve the issue.

The issue is nn.Linear or other nn modules use float (or float32 in numpy) for parameters so there is mismatch between your loaded data and initialized weights.

Another point is that I can see you are not using any particular function from numpy, so you can write your entire data loading flow using PyTorch itself.

Bests

Thank you, that helped. Now I’m getting other exceptions but I’m gonna try to solve those myelf, before I bother you with it :slight_smile:

Nice!

You will probably find many similar issues on this forum; you just need to use proper keywords. :wink:

So, I’m thoroughly stuck on this one. I keep running into problems with dimensions.
I noticed that in my dataset, __getitem__ returns x of shape (seq_len, input_size)
However, when I iterate over the dataloder, I get shape (1, seq_len, input_size)

For example:

torch.Size([9, 87]) torch.Size([1, 18]) <-- x, y inside the dataset
torch.Size([1, 9, 87]) <-- x from the dataloader
torch.Size([1, 1, 18]) <-- y from the dataloader

Why is that?
If the point is to automatically add a batch, why does it do it batch first?

Yes exactly. The point is that all operations have been defined batch-wise, so iterating dataloader will create batches as actually you have to pass batch_size as argument to the dataloader.

About the second part of question, I am not sure what you mean by batch first. But if you mean that RNN expects [seq, batch, input], then you can just permute dimensions by using tensor.permute(dims).

x = torch.randn(1, 9, 87)
x_for_rnn = x.permute(1, 0, 2)
print(x_for_rnn.shape)   # [9, 1, 87]

But in your first post, you have set batch_first=True for RNN layers, in that case, RNNs expect [batch, seq, input] which matches what dataloader is giving you.

Bests

Thank you very much, I’m sure permute will come in handy in the future :slight_smile:
For now I fixed it by setting batch_size to None and dadding that dimension “manually”.

After another few issues, my model is training now. I hope it’s also learning :smiley:

Okay, for the record, here is my final code:

import os
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker


all_labels = []
all_names = []
all_categories = []
all_letters = []
labels_to_index = {}
letter_to_index = {}


def read_files(file_path):
    for filename in os.listdir(file_path):
        label = os.path.splitext(os.path.basename(filename))[0]
        all_categories.append(label)
        with open(os.path.join(file_path, filename), "r", encoding='utf-8') as f:
            for name in f.read().strip().split('\n'):
                all_names.append(name)
                all_labels.append(label)
                for letter in name:
                    if letter not in all_letters:
                        all_letters.append(letter)


def encode_name(name):
    oh_name = torch.zeros(len(name), len(all_letters))
    for i, char in enumerate(name):
        oh_char = torch.zeros(len(all_letters))
        oh_char[letter_to_index[char]] = 1
        oh_name[i] = oh_char
    return oh_name


def encode_label(label_index):
    oh_label = torch.zeros(1, len(all_categories))
    oh_label[0][label_index] = 1
    return oh_label


def predict(name):
    output, h = rnn(encode_name(name).unsqueeze(1), torch.zeros(1, 1, 128))
    output = output[output.size()[0] - 1]
    return torch.argmax(output, 1)


"""Assign proper values to global variables"""


read_files(r"data/names")

all_letters.sort()

for i, label in enumerate(all_categories):
    labels_to_index[label] = i
all_labels = [labels_to_index[l] for l in all_labels]

for i, letter in enumerate(all_letters):
    letter_to_index[letter] = i


"""Define Dataset"""


class NameDataset(Dataset):

    def __init__(self):
        self.names = []
        self.labels = []

        for name in all_names:
            self.names.append(encode_name(name))

        for label in all_labels:
            self.labels.append(encode_label(label))

    def __len__(self):
        return len(self.names)

    def __getitem__(self, index):
        name = self.names[index]
        label = all_labels[index]
        return name, label


"""Define Model"""


class RNN(nn.Module):

    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()
        self.rnn = nn.RNN(input_size, hidden_size)
        self.linear = nn.Linear(hidden_size, output_size)

    def forward(self, x, h):
        y, h = self.rnn(x, h)
        y = self.linear(y)
        y = F.log_softmax(y, dim=2)
        return y, h


dataset = NameDataset()
train_loader = DataLoader(dataset, shuffle=True, batch_size=None)
h0 = torch.zeros(1, 1, 128)


"""Train"""


if os.path.exists("RNN.pt"):
    rnn = torch.load("RNN.pt")
else:
    rnn = RNN(87, 128, len(all_categories))
    criterion = nn.NLLLoss()
    optimizer = torch.optim.Adam(rnn.parameters(), lr=0.0003)
    epochs = 30
    all_losses = []

    print("Training...")
    for epoch in range(1, epochs+1):
        current_loss = 0
        correct = []
        for x, y in train_loader:
            h = h0
            optimizer.zero_grad()
            x = x.unsqueeze(1)

            y_hat, h = rnn(x, h)
            y_hat = y_hat[y_hat.size()[0]-1]
            y = torch.tensor([y])

            loss = criterion(y_hat, y)
            loss.backward()
            optimizer.step()

            current_loss += loss

            if torch.argmax(y_hat, 1) == y:
                correct.append(1)

        current_accuracy = len(correct) / len(train_loader)
        current_avg_loss = current_loss / len(train_loader)
        all_losses.append(current_avg_loss.item())
        print('%d %d%% loss: %.4f acc: %.4f' % (epoch, epoch / epochs * 100, current_avg_loss, current_accuracy))

    torch.save(rnn, 'RNN.pt')
    plt.figure()
    plt.plot(all_losses)


"""Evaluate Model"""


# Keep track of correct guesses in a confusion matrix
n_categories = len(all_categories)
confusion = torch.zeros(n_categories, n_categories)
n_confusion = 10000


# Go through a bunch of examples and record which are correctly guessed
for i in range(n_confusion):
    for x, y in train_loader:
        x = x.unsqueeze(1)
        y_hat, hidden = rnn(x, h0)
        y_hat = y_hat[y_hat.size()[0] - 1]

        category_i = y
        guess_i = torch.argmax(y_hat, 1)

        confusion[category_i][guess_i] += 1
        break

# Normalize by dividing every row by its sum
for i in range(n_categories):
    confusion[i] = confusion[i] / confusion[i].sum()

# Set up plot
fig = plt.figure()
ax = fig.add_subplot(111)
cax = ax.matshow(confusion.numpy())
fig.colorbar(cax)

# Set up axes
ax.set_xticklabels([''] + all_categories, rotation=90)
ax.set_yticklabels([''] + all_categories)

# Force label at every tick
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
ax.yaxis.set_major_locator(ticker.MultipleLocator(1))

# sphinx_gallery_thumbnail_number = 2
plt.show()

"""Act on user input"""
while True:
    user_input = input("Enter a name: ")
    output = predict(user_input)
    print(all_categories[output])

confusion matrix:

Figure_1

Which seems to suggest it’s perfectly fine. I’m not shocked Scrottisch and English aren’t that easy to separate.

I’d love to get some more general feedback on my code, especially the pytorch parts. Is there anything I did in some uconventional way, or something I could have done better?

Everything seems fine to me except following line. Isn’t y already a tensor? what is the goal of this line?

Bests

Thanks a bunch for taking a look at it!

No, it appears to be an integer. Should dataloader make it a tensor?

Ok, I did not notice this before but this is not a good way of consructing a dataset. You are accessing all_names and all_labels from outside of this class without passing any argument that refers to them. Try to make dataset construction standalone by just giving a path, etc.

On the other side, you are using label=all_labels[index] in __getitem__ instead of self.labels[index]. Although these issues are not directly related to PyTorch but makes your debugging easier.

2 Likes

Thanks, I really appreciate the feedback and your absolutely right.