Libtorch c++ GPU forward failed, but CPU succeed

I install pytorch by source compiling to get libtorch on window10, CUDA10.0, VS 2017, RTX 2080.
Thou Libtorch C++ work with cpu successfully, it will crash when libtorch c++ work with GPU.
Actually, it can load model successfully, but it will crash when execute forward(.) function.

To Reproduce

compile and execute following codes, the error will occur:

#include <torch/script.h> // One-stop header.


int main(int argc, const char* argv[]) {

std::cout &lt;&lt; "ok 1\n";
if (argc != 2) {
std::cerr &lt;&lt; "usage: example-app \n";
return -1;
std::cout &lt;&lt; "ok 2\n" &lt;&lt; argv[1] &lt;&lt; std::endl;
// Deserialize the ScriptModule from a file using torch::jit::load().
std::shared_ptrtorch::jit::script::Module module = torch::jit::load(argv[1]);
std::cout &lt;&lt; "ok 3\n";
assert(module != nullptr);
std::cout &lt;&lt; "ok\n";

// Create a vector of inputs.
std::vector<torch::jit::IValue> inputs;

inputs.push_back(torch::ones({ 1, 3, 224, 224 }));
std::cout << "ok 4\n";
// Execute the model and turn its output into a tensor.
auto output = module->forward(inputs).toTensor();
std::cout << "ok 5\n";
std::cout << output.slice(/*dim=*/1, /*start=*/0, /*end=*/5) << '\n';

// Create a vector of inputs.
std::cout << "ok 6\n";
std::vector<torch::jit::IValue> inputs2;
inputs2.push_back(torch::ones({ 1, 3, 224, 224 }).to(at::kCUDA));
std::cout << "ok 7\n";
	// Execute the model and turn its output into a tensor.
auto output2 = module->forward(inputs2).toTensor();
std::cout << "ok 8\n";
std::cout << output2.slice(/*dim=*/1, /*start=*/0, /*end=*/5) << '\n';



Expected behavior

it should output printing: “ok 8”

however , “ok 8” didnot appear:

λ .\predict.exe
ok 1
ok 2
ok 3
ok 4
ok 5
-0.4035 0.4926 -0.0028 0.0435 -0.0177
[ Variable[CPUFloatType]{1,5} ]
ok 6
ok 7



  • PyTorch Version (e.g., 1.0): pytorch master 1.0.1
  • OS (e.g., Linux): win10
  • How you installed PyTorch ( conda , pip , source): source
  • Build command you used (if compiling from source):
  • Python version: python 3.6.6
  • CUDA/cuDNN version: CUDA 10.0
  • GPU models and configuration: RTX 2080
  • Any other relevant information: VS 2017

Additional context

I compiled the newest pytorch source
I use in Release x64 edition
can anybody help me? I am very comfused!
thanks very much!

I am facing the same problem now. Do you have any solution yet?