Hello All!
I have write an AI program to test ‘.pt’ module.
but I found infer cost time is 360ms. core code
auto startTime = std::chrono::high_resolution_clock::now();
at::Tensor result = m_module.forward({tensor}).toTensor();
at::cuda::CUDAStream stream = at::cuda::getCurrentCUDAStream();
AT_CUDA_CHECK(cudaStreamSynchronize(stream));
auto endTime = std::chrono::high_resolution_clock::now();
float totalTime = std::chrono::duration<float, std::milli>(endTime - startTime).count();
printf("(%s) >>> infer cost time = %.3f ms\n", m_name.c_str(), totalTime);
return result;
output:
>>> infer cost time = 364.698 ms
My computer is hp with RTX-3070.
system: ubuntu 18.04 with linux-kernel-5.4.0-99-generic
libtorch: 1.8.0+cu111
nvidia-driver: 470.74
cuda-runtime: 11.1
cuda-driver: 11.4
Gpu info
Total amount of Global Memory: 4051501056 bytes
Number of SMs: 40
Total amount of Constant Memory: 65536 bytes
Total amount of Shared Memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per SM: 1536
Maximum number of threads per block: 1024
Maximum size of each dimension of a block: 1024 x 1024 x 64
Maximum size of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignmemt: 32 bytes
Clock rate: 1.29 GHz
Memory Clock rate: 6001 MHz
Memory Bus Width: 256-bit
more:
Thu Feb 10 17:45:28 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.74 Driver Version: 470.74 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| N/A 45C P8 16W / N/A | 603MiB / 7959MiB | 2% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
could anyone help me?