Torch.nn in context of a cuda stream


When I execute a torch.nn model in the context of a CUDA stream, some of the kernels run on this stream, while others are executed on the default stream. See profiler output and sample program below. Is there a way to run all the kernels in the context of the custom stream?


import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

class MNISTConvNet(nn.Module):

    def __init__(self):
        super(MNISTConvNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, 5)
        self.pool1 = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(10, 20, 5)
        self.pool2 = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

        self.cuda(0)    # compute on GPU

    def forward(self, input):
        x = self.pool1(F.relu(self.conv1(input)))
        x = self.pool2(F.relu(self.conv2(x)))
        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return x

def main():
        n = 100000
        net = MNISTConvNet()
        input = Variable(torch.randn(n, 1, 28, 28).cuda(0))
        out = torch.FloatTensor(n, 10).pin_memory()
        out.copy_(net(input).data, async=True)

if __name__ == '__main__':

The CuDNN kernels dont seem to be respecting the current stream. @ngimel is this expected?

It’s not cudnn not respecting streams, it’s pytorch cudnn bindings never calling cudnnSetStream to properly set stream. I vaguely remember some problems with that, Sam might know more.

Hi @smth, @ngimel,

Could you follow up on the stream bindings?


I’m tracking this bug in
Will give it a priority and fix.

Hi @smth, could you give it the higher priority?

this is fixed in master via