GRU creates large FloatTensor and ByteTensor

I found weird behavior of GRU module and would like to know if this is expected.
It seems GRU module creates large ByteTensor object when called, and it causes GPU memory issue.

Here is a small program that replicate the behavior.

import torch
import torch.nn as nn
from torch.autograd import Variable
import gc

gru = nn.GRU(256, 256).cuda()
x = Variable(torch.FloatTensor(20, 64, 256)).cuda()
output, h = gru(x)

for obj in gc.get_objects():
    if torch.is_tensor(obj):
        print(type(obj), obj.size())

Here is the result I got

<class 'torch.cuda.FloatTensor'> torch.Size([768, 256])
<class 'torch.cuda.FloatTensor'> torch.Size([768, 256])
<class 'torch.cuda.FloatTensor'> torch.Size([768])
<class 'torch.cuda.FloatTensor'> torch.Size([768])
<class 'torch.cuda.FloatTensor'> torch.Size([20, 64, 256])
<class 'torch.cuda.FloatTensor'> torch.Size([1, 64, 256])
<class 'torch.cuda.FloatTensor'> torch.Size([394752])
<class 'torch.cuda.FloatTensor'> torch.Size([20, 64, 256])
<class 'torch.cuda.FloatTensor'> torch.Size([1, 64, 256])
<class 'torch.cuda.ByteTensor'> torch.Size([7864320])

When I am not using GPU, this problem did not happen.

Thank you

on the GPU, with CuDNN backend there is workspace buffers that are needed. This memory usage is a CuDNN requirement and is expected for now.

Is there anyway to control their size?
I’m trying to use LSTMs but the byte tensors’ size is exploding and leaving me OOM