Wrong results when JIT exporting an LSTM module

imaluengo · March 4, 2019, 12:43pm

Find below a self contained example:

import torch
from torch import nn

T = 10
F = 20
device = torch.device('cuda')

print('Generating data')
data = (torch.rand(1, T, F) * 0.1).to(device)

print('Loading model')
model = nn.LSTM(F, F, num_layers=1, batch_first=True, bidirectional=True, dropout=0)
model = model.eval().to(device)

print('Tracing model')
tmodel = torch.jit.trace(model, (data,))
tmodel.save('/tmp/test.pt')

print('Productionazing model')
pmodel = torch.jit.load('/tmp/test.pt', map_location=device)

print('Forwarding data')
with torch.no_grad():
    o1 = model(data)[0]
    o2 = tmodel(data)[0]
    o3 = pmodel(data)[0]

assert (o1 == o2).all()   # WORKS
assert (o2 == o3).all()   # FAILS

The above example shows 3 different versions of the same model:

model: raw LSTM
tmodel: traced LSTM
pmodel: dumped and loaded tmodel

model and tmodel calculate the same output. On the contrary. pmodel outputs wrong values + NaNs.

Reproducible in both versions: (a) 1.0.0 and (b) 1.0.1

Am I doing something wrong?

imaluengo · March 4, 2019, 2:29pm

Just updated the code by replacing .cuda() with .to(device). Bug is still reproducible when using device = torch.device('cuda') but seems to the code seems to work with device = torch.device('cpu').

imaluengo · March 4, 2019, 2:38pm

Issue no longer reproducible with torch-nightly-1.0.0.dev20190304.