How to pass separate features into LSTM

CangyuanLi · June 3, 2023, 5:04am

Hi, I am new to PyTorch (and machine learning in general) and wanted to check if what I’m doing makes any sense at all. I am attempting to predict race (Asian, Black, Hispanic, White) from first name, last name, and the racial distribution of the person’s zip code. These should be three separate features, the thought being that, e.g. last name might have more predictive power than first name. For example, a row of my data would be

("John", "Li", [0.10, 0.40, 0.20, 0.30])

Currently, my model is

class FirstLastZctaLSTM(nn.Module):
    def __init__(self, input_size: int, hidden_size: int, output_size: int) -> None:
        super(FirstLastZctaLSTM, self).__init__()

        self.hidden_size = hidden_size
        self.ltsm_cell = nn.LSTM(input_size, hidden_size)
        self.h2o = nn.Linear(hidden_size + 4, output_size)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(
        self,
        name: torch.Tensor,
        pct: torch.Tensor,
        hidden: tuple[torch.Tensor, torch.Tensor],
    ):
        _, hidden = self.ltsm_cell(name.view(1, 1, -1), hidden)
        combined = torch.cat([hidden[0].squeeze(0), pct], dim=1)
        output = self.h2o(combined)
        output: torch.Tensor = self.softmax(output)

        return output, hidden

    def init_hidden(self):
        return (
            torch.zeros(1, 1, self.hidden_size, device=DEVICE),
            torch.zeros(1, 1, self.hidden_size, device=DEVICE),
        )

I use a character-level encoding

import string
VALID_NAME_CHARS = f"{string.ascii_lowercase} '-"
VALID_NAME_CHARS_DICT = {c: i for i, c in enumerate(VALID_NAME_CHARS)}
VALID_NAME_CHARS_LEN = len(VALID_NAME_CHARS)

def encode_name(
    name: str,
    valid_name_chars_dict: dict[str, int],
    valid_name_chars_len: int,
    device: torch.device,
) -> torch.Tensor:
    encoded = torch.zeros(len(name), 1, valid_name_chars_len, device=device)
    for idx, c in enumerate(name):
        encoded[idx][0][valid_name_chars_dict[c]] = 1

    return encoded

Then I combine the encoded tensors like so:

name = torch.cat([encode_name(first_name), encode_name(last_name)], dim=0)

As an example, for data “michael kitts”, the final name tensor is of size [12, 29] and the percent data looks like

tensor([[6.0413e-03, 8.2485e-04, 3.7458e-02, 9.0547e-01]])

I pass it into the model

for name, pct, race in dataloader:
    name = name.squeeze()  # dataloader adds an extra batch size dimension
    model.zero_grad(set_to_none=True)
    hidden = model.init_hidden()
    for i in range(name.size()[0]):
        output, hidden = model(name[i], pct, hidden)

It runs with no errors, but I am curious if this is doing what I want it be doing? Since I concatenate first and last name, does it mean that I am basically just passing in the full name, just without the space?

Thanks a lot.

vdw · June 3, 2023, 9:22am

You code is a bit confusing to read, so maybe I’m got some things wrong.

You’re using an nn.LSTM instead of nn.LSTMCell, but you treat it as a cell since you give it only one character at a time as input; hence, the need for a loop that iterates over all characters in the name. You can give an nn.LSTM only one character at the time, but it is a bit odd.

It’s unclear what pct is, so I assume it’s the tensor reflecting the race distribution you concatenate with the current hidden state. According to your code, you concatenate pct at each time step. You can do this, but it does not seem intuitive. pct is a single vector associated with your while training sample, i.e., the name, and not the individual characters. In other words, pct itself is not sequential in nature. The more intuitive solution would to only feed the name to the LSTM and concatenate only the last hidden state with pct.

CangyuanLi · June 3, 2023, 12:37pm

Thanks a lot for the response. Sorry for the confusion! You are right about what pct is. Regarding passing each character, I was attempting to adapt Predict Nationality Based On Name In Python - AskPython. Does a model like this make more sense then? Now the entire name is passed to lstm, so when I combine it with the pct vector it is equivalent to only using it at the last hidden state?

class FirstLastZctaLSTM(nn.Module):
    def __init__(self, input_size: int, hidden_size: int, output_size: int) -> None:
        super(FirstLastZctaLSTM, self).__init__()

        self.hidden_size = hidden_size
        self.lstm = nn.LSTM(input_size, hidden_size)
        self.h2o = nn.Linear(hidden_size + 4, output_size)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(
        self,
        name: torch.Tensor,
        pct: torch.Tensor,
        hidden: tuple[torch.Tensor, torch.Tensor],
    ):
        _, hidden = self.lstm(name, hidden)
        combined = torch.cat([hidden[0].squeeze(0), pct], dim=1)
        output = self.h2o(combined)
        output: torch.Tensor = self.softmax(output)

        return output, hidden

    def init_hidden(self):
        return (
            torch.zeros(1, 1, self.hidden_size, device=DEVICE),
            torch.zeros(1, 1, self.hidden_size, device=DEVICE),
        )

vdw · June 4, 2023, 3:05am

For predicting the nationality based on names, there’s actually a PyTorch tutorial you can have a look at as well.

Note that hidden[0] only works here because you initialize the nn.LSTM with the default parameters num_layers=1 and bidirectional=False. For example hidden[-1] is a bit cleaner as it handles the case where you want to use more than 1 layer. If you want to use a bidirectional LSTM, you need to be a tad more careful.

That being said, concatenating pct to the last hidden state seems more intuitive to me. Whether the results are better or worse is a different story, but that’s how I would approach this task. But with neural networks, in general, there are many alternatives. For example you could push the last hidden state first through an additional linear layer (e.g., to reduce the dimension) before concatenating it with pct. Which “makes more sense” is difficult to answer :).