Conv3d and AvgPool3d interactions yield errors in CUDA mode only

The network below runs successfully on CPU, but throws a RuntimeError: Can only downcast contiguous tensors when backpropagating through the AvgPool3d layer on CUDA. If either the initial or final Conv3d layers are removed, the network runs successfully. Any explanations of why this is happening or suggestions on how to fix it would be appreciated. I am using pytorch 0.2.0 on python 3.5 with CUDA 8.0 from the pip binary, on ubuntu 16.04.

import torch
from torch import nn
from torch.autograd import Variable

class ConvBlock(nn.Module):

    def __init__(self, in_channels, out_channels):
        super(ConvBlock, self).__init__()
        self.net = nn.Conv3d(in_channels=in_channels, out_channels=out_channels,
                             kernel_size=(3, 5, 5), stride=(3, 3, 3),
                             padding=(1, 0, 0), bias=False)

    def forward(self, x):
        return self.net(x)

class DepthConcatBlock(nn.Module):

    def __init__(self, channels):
        super(DepthConcatBlock, self).__init__()
        self.branch1 = nn.Conv3d(in_channels=channels, out_channels=channels,
                                kernel_size=(1, 3, 3), stride=(1, 2, 2), padding=0)

        self.branch2 = nn.AvgPool3d(kernel_size=(1, 3, 3), stride=(1, 2, 2))

    def forward(self, x):
        return torch.cat([self.branch1(x), self.branch2(x)], dim=1)


net = nn.Sequential(
    ConvBlock(3, 3),
    DepthConcatBlock(3),
    ConvBlock(6, 6)
    )

net = net.cuda()
x = Variable(torch.randn([1, 3, 75, 288, 360])).cuda()
y = net(x)
l = torch.sum(y)
l.backward()
Traceback (most recent call last):
  File "debug.py", line 39, in <module>
    l.backward()
  File "/usr/local/lib/python3.5/dist-packages/torch/autograd/variable.py", line 156, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/usr/local/lib/python3.5/dist-packages/torch/autograd/__init__.py", line 98, in backward
    variables, grad_variables, retain_graph)
  File "/usr/local/lib/python3.5/dist-packages/torch/autograd/function.py", line 91, in apply
    return self._forward_cls.backward(self, *args)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/_functions/thnn/pooling.py", line 419, in backward
    grad_input = AvgPool3dBackward.apply(input, grad_output, ctx.kernel_size, ctx.stride)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/_functions/thnn/pooling.py", line 435, in forward
    ctx.stride[0], ctx.stride[2], ctx.stride[1])
RuntimeError: Can only downcast contiguous tensors at /pytorch/torch/lib/tmp_install/include/THC/THCDeviceTensor-inl.cuh:295

Hum, this looks weird. Can you open an issue in the PyTorch repo?
I had a quick look at the underlying C code, and it does make the required tensors contiguous, so I’m unsure about the reason for this issue.

I’ve opened an issue: https://github.com/pytorch/pytorch/issues/2996