Pytorch vs CNTK on Windows

Hi all,

I am working on a previously developed system on Windows using CNTK in C++ code at inference time. I am considering switching it to the PyTorch.

  • Is there any performance comparison on Windows between CNTK and PyTorch (C++)?
  • Is PyTorch reliably tested on Windows, especially using traced network in C++?

Note: I really cannot move away from Windows due to other issues.

Thank you

I’m not sure if there is a performance comparison.
We run tests in the CI also on Windows machines, so it should be tested.
@peterjc123 might chime in on this topic as the expert. :wink:

1 Like

I didn’t use CNTK either. As for traced networks, it should be tested in CI.

2 Likes

Thank you.

I did a test on Windows, C++ inference time to compare the performance of PyTorch and CNTK. Unfortunately, CNTK seems to be faster! I just simply create a tensor and pass it through a traced network. I tested with resnet50, resnet18 and vgg16. Loading the model takes MUCH longer using Pytorch but that is ok. Forwarding is worse using PyTorch!
My GPU is GTX 1050.
This is the code:

int num = 1000;
std::string smodel = "address/resnet18_cuda_trace.pt";
torch::jit::script::Module module = torch::jit::load(net_path_.c_str());    
    
auto start = std::chrono::high_resolution_clock::now();

for (int times = 0; times < num; times++) {
     auto ten = torch::ones({ 1, 3, 224, 224 }, torch::kCUDA);
     std::vector<torch::jit::IValue> inputs;
     inputs.push_back(ten);
     at::Tensor out = module.forward({ inputs }).toTensor().to(torch::kCPU);
  }
    
  auto finish = std::chrono::high_resolution_clock::now();
  auto msSinceStart = std::chrono::duration_cast<std::chrono::milliseconds>(finish - start).count();

  std::cout << msSinceStart << "  " << out.sizes() << std::endl;

Is there anything I can do to make this simple code faster?

Did you use the same libs (CUDA, cudnn, etc.) for both frameworks?
Depending on your work load and the used libs, you might get different results.

Also, how large is the difference for which models?

If you’re only doing inference you could also consider exporting your model via ONNX and doing inference with ONNX Runtime. If you’re tightly integrated with windows you might benefit from the WinRT APIs distributed as part of the OS.
You’ll probably want to do some benchmarks for your use case, but I’ve found ORT to generally have excellent performance even without tuning. PyTorch for trainig+development and ORT for deployment is a pretty good combo :slight_smile:

1 Like