Cuda compilation tools, release 8.0, V8.0.61
ts1 size: torch.Size([16, 1, 441])
ts2 Size: torch.Size([16, 441, 10])
I also have tried this snippet instead of torch.bmm and torch.matmul, but got the same error
B, H, W = batch_kernel.size()
ts1 = batch_kernel.view((B, 1, H * W))
ts2 = self.weight.expand((B, ) + self.size)
s1,s2,s3 = ts2.size()
#ts3 = torch.bmm(ts1, ts2)
out = torch.Tensor(B, s3)
for i, batch_v in enumerate(ts1):
out[i] = (batch_v @ ts2).t()
return ts3.view((B, -1))
However, I tried the same project on pytorch=> 1.0, it is working fine with bmm but I am getting worse results with newer version at the end.