Libtorch bmm return 0s on device

When using bmm, it can output right result on cpu, but always 0s on device, which is not expected. the code is as follows:
void test_bmm() {
auto t1 = torch::randn({1, 3, 4});
auto t2 = torch::randn({1, 4, 2});
auto r = torch::bmm(t1, t2);
std::cout << r << std::endl; // Right

t1 = t1.cuda();
t2 = t2.cuda();
auto r2 = torch::bmm(t1, t2);
std::cout << r2 << std::endl; // Always 0s
/*
(1,.,.) =
0 0
0 0
0 0
[ CUDAFloatType{1,3,2} ]
*/
}

I want to know what’s wrong with it?
Thanks

Were you able to use the GPU before in libtorch and if so, did you change anything in your setup (e.g. a driver update etc.)?
Also, are you able to properly use the Python frontend or does it also return invalid values?
Which PyTorch version and device are you using?

I can use GPU, such as vpf and gpu path tracing (with libtorch, they are in the same project). The driver isn’t updated.
It is in c++ environment, so I didn’t use the Python front end.
libtorch version: 1.8.1, device: 1080Ti.
And also, bmm is ok in another environment, which has a different driver version and doesn’t include gpu path tracing.
It also doesn’t work for tensor add etc.

Thanks for the update.
Given that you are using 1.8.1 and a 1080Ti (sm_61), I think you are hitting this issue, which is already fixed in later releases. Could you update libtorch and rerun your workload?

It works with libtorch 1.9.0 now. Thanks