MaxPool3d raises 'RuntimeError: Can only downcast contiguous tensors' After deconvolution and permute [.contigous() applied]

Run the following code will raise "RuntimeError: Can only downcast contiguous tensors"
only happens in cuda mode, works for conv2d, works for cpu mod. Tested on 0.2.0 and 0.3.0

from __future__ import print_function
from __future__ import division
# torch libs
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
# other libs
import numpy as np
#Works
corr = Variable(torch.FloatTensor(np.ones((128,5,7,7))))
f = Variable(torch.FloatTensor(np.ones((8,5,3,3))))
out = F.conv2d(corr,f)
print('conv',out.size())
out = out.view((128,8,1,5,5)).permute(0,2,1,3,4).contiguous()
x,i = F.max_pool3d(out,(8,1,1),stride=(8,1,1),return_indices=True)
print('conv out',out.size())

#Fail
corr = Variable(torch.FloatTensor(np.ones((128,5,3,3)))).cuda()
f = Variable(torch.FloatTensor(np.ones((5,8,3,3)))).cuda()
out = F.conv_transpose2d(corr,f)
print('deconv',out.size())
out = out.view((128,8,1,5,5)).permute(0,2,1,3,4).contiguous()
x,i = F.max_pool3d(out,(8,1,1),stride=(8,1,1),return_indices=True)
print('deconv out',out.size())

Is there a way to work around?

Relevant issue here: https://github.com/pytorch/pytorch/issues/4835 for reference.