JIT speedup is great and a memory leak with dropout

I tried using JIT reading along the lines of examples in https://github.com/pytorch/pytorch/blob/master/test/test_jit.py

I saw big speedups in my network - 50% faster with JIT than without it and the use is very straightforward. While I was testing it, I found a bug in JIT:

if I add a dropout layer in the FC layers, a JIT compiled forward pass of a net starts to seriously leak memory in the backward pass. I made a simple test case below. When it runs, you will see a rapidly increasing GPU/CPU memory usage in nvidia_smi/top and it will crash with an out of memory exception soon. Without the dropout layer or the backward pass, it works fine.

import torch
from torch import jit
import torch.nn as nn
from torch.autograd import Variable

class TestNet(nn.Module):
    def __init__(self):
        super(TestNet, self).__init__()
        self.net1 = nn.Linear(100, 200)
        self.net2 = nn.Linear(200, 1)
        self.sigmoid = nn.Sigmoid()
        self.ReLU = nn.ReLU(inplace=False)
        self.drop = nn.Dropout(0.5)
               
    def forward(self, V):
        return self.sigmoid(self.net2(self.drop(self.ReLU(self.net1(V))))).squeeze() 


use_cuda = True
net = TestNet()
criterion = nn.BCELoss()
if use_cuda:
    net.cuda()
    criterion.cuda()
    V = Variable(torch.randn(100, 100)).cuda()
    label = Variable(torch.randn(100)).cuda()
else:
    V = Variable(torch.randn(100, 100))
    label = Variable(torch.randn(100))

res = net(V)
fwd = jit.compile(net.forward)
for i in range(0,1000000):
    r = fwd(V)
    err = criterion(r, label)
    err.backward()

thanks for reporting the memory leak on github, we’ll track and fix it.