Wrong results when JIT exporting an LSTM module

Find below a self contained example:

import torch
from torch import nn

T = 10
F = 20
device = torch.device('cuda')

print('Generating data')
data = (torch.rand(1, T, F) * 0.1).to(device)

print('Loading model')
model = nn.LSTM(F, F, num_layers=1, batch_first=True, bidirectional=True, dropout=0)
model = model.eval().to(device)

print('Tracing model')
tmodel = torch.jit.trace(model, (data,))

print('Productionazing model')
pmodel = torch.jit.load('/tmp/test.pt', map_location=device)

print('Forwarding data')
with torch.no_grad():
    o1 = model(data)[0]
    o2 = tmodel(data)[0]
    o3 = pmodel(data)[0]

assert (o1 == o2).all()   # WORKS
assert (o2 == o3).all()   # FAILS

The above example shows 3 different versions of the same model:

  • model: raw LSTM
  • tmodel: traced LSTM
  • pmodel: dumped and loaded tmodel

model and tmodel calculate the same output. On the contrary. pmodel outputs wrong values + NaNs.

Reproducible in both versions: (a) 1.0.0 and (b) 1.0.1

Am I doing something wrong?

Just updated the code by replacing .cuda() with .to(device). Bug is still reproducible when using device = torch.device('cuda') but seems to the code seems to work with device = torch.device('cpu').

Issue no longer reproducible with torch-nightly-1.0.0.dev20190304.