How to use LSTM for a time-series classification task?

Problem

Given a dataset consisting of 48-hour sequence of hospital records and a binary target determining whether the patient survives or not, when the model is given a test sequence of 48 hours record, it needs to predict whether the patient survives or not.

Data

I have constructed a dummy dataset as following:

input_ = torch.randn(100, 48, 76)
target_ = torch.randint(0, 2, (100,))

and loading the training data as following:

trainloader = DataLoader(TensorDataset(input_, target_), 
                         batch_size=50, 
                         shuffle=True)

Model

I have constructed an LSTM based model as following:

class LSTMClassification(nn.Module):

    def __init__(self, input_dim, hidden_dim, target_size):
        super(LSTMClassification, self).__init__()
        self.lstm = nn.LSTM(input_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, target_size)

    def forward(self, input_):
        lstm_out, (h, c) = self.lstm(input_)
        logits = self.fc(lstm_out[-1])
        scores = F.sigmoid(logits)
        return scores

with training function as following:

def train(model, n_epochs):
    loss_function = nn.BCEWithLogitsLoss()
    optimizer = optim.SGD(model.parameters(), lr=0.1)

    history = {
        'loss': []
    }
    for epoch in range(n_epochs):
        losses = []
        for i, data in enumerate(trainloader, 0):
            inputs, labels = data

            model.zero_grad()

            tag_scores = model(inputs)

            labels = labels.unsqueeze(1)
            
            loss = loss_function(tag_scores, labels)
            
            loss.backward()
            optimizer.step()
            losses.append(float(loss))
        avg_loss = np.mean(losses)
        history['loss'].append(avg_loss)
        print("Epoch {} / {}: Loss = {:.3f}".format(epoch+1, n_epochs, avg_loss))
    return history

The model construction is as following:

model = LSTMClassification(input_.shape[2], 
                        hidden_dim=6, 
                        target_size=1)

However, when I train the model, I’m getting an error

ValueError: Target size (torch.Size([50, 1])) must be the same as input size (torch.Size([48, 1]))

I’m not sure how to get my model to yield a tensor of size (50,1) whereby for each group of time series data, it yields an output of 0 or 1.

I assume you want to index the last time step in this line of code:

which is wrong, since you are using batch_first=True and according to the docs the output shape would be [batch_size, seq_len, num_directions * hidden_size], so you might want to use self.fc(lstm_out[:, -1]) instead.

Thank you @ptrblck. A few follow up questions referring to the following code snippet

    def forward(self, input_):
        lstm_out, (h, c) = self.lstm(input_)
        logits = self.fc(lstm_out[-1])
        scores = F.sigmoid(logits)
        return scores
  1. lstm_out[:, -1] would be the same as h[-1]? I checked it with torch.equal(lstm_out[:, -1], h[-1]) and it is True. Just want to confirm.
  2. Since I’m using BCEWithLogitsLoss, do I need to have the sigmoid activation at the end of the model as BCEWithLogitsLoss has in-built sigmoid activation ?
    2.1. Only during prediction phase, I need to have a sigmoid activation for classification, as following ?
def binary_acc(y_pred, y_test):
    y_pred_tag = torch.round(torch.sigmoid(y_pred))

    correct_results_sum = (y_pred_tag == y_test).sum().float()
    acc = correct_results_sum/y_test.shape[0]
    acc = torch.round(acc * 100)
    
    return acc
  1. For bidirectional=False layers, the tensors should be equal, yes.
  2. nn.BCEWithLogitsLoss expects raw logits as the model output, so you should not apply a sigmoid on the output before passing it to the loss function.
  3. During the prediction phase you could apply a sigmoid and use a threshold to get the class labels, e.g.:
threshold = 0.5
pred = torch.sigmoid(output) > threshold

Your rounding approach would also work, but the threshold would allow you to pick a point on the ROC curve.

1 Like

Thank you @ptrblck.

Number (3) would be the same for multiclass prediction also, right ?

Whereby, the output of the last layer in the model would be an array of logits for each class and during prediction, a sigmoid is applied to get the probabilities for each class.

Yes, you could apply the sigmoid also for a multi-class classification where zero, one, or multiple classes can be “active”.

1 Like