Hi guys,
I have been trying to do copy tensor from CPU to GPU with non blocking following tutorial(Tensor Creation API — PyTorch main documentation) by doing below:
#include<torch/torch.h>
#include
#include
using namespace std::chrono;int main()
{
auto t0 = high_resolution_clock::now();
torch::Tensor cpu_tensor = torch::randn({128, 3, 24, 24}, torch::kFloat);
torch::Tensor gpu_tensor = cpu_tensor.to(torch::kCUDA,/non_blocking/true);
auto t1 = high_resolution_clock::now();
std::cout<<“Time Taken:”<<(static_cast((duration_cast(t1 - t0)).count()))/1e6<<std::endl;
return 0;
}
Result: “Time Taken: 0.08”
Which is very slow in comparison to what it would take in python which is 0.0001 seconds. I was wondering if anyone can tell me if I am doing anything/ using the API wrong. Thanks in advance for your answers guys