I tested the data transfer speeds between CPU to GPU and GPU to CPU, and there was a significant difference between the two.
PCIe: Speed 8GT/s, Width x8
result:
CPU->GPU:5.30 GB/s
GPU->CPU:0.93 GB/s
What’s the problem? How to accelerate data transfer time from GPU to CPU?
My test code is here:
import torch
import time
tensor_size = 1024 * 32 # 1GB
dtype = torch.float32
tensor = torch.randn((tensor_size, tensor_size), dtype=dtype, device='cuda')
targer_tensor = torch.zeros((tensor_size, tensor_size), dtype=dtype, device='cpu')
torch.cuda.synchronize()
start_time = time.time()
num_iterations = 10
for _ in range(num_iterations):
tensor.cpu()
torch.cuda.synchronize()
end_time = time.time()
total_data_transferred_gb = (num_iterations * tensor_size * tensor_size * tensor.element_size()) / (1024**3)
average_bandwidth_gb_per_s = total_data_transferred_gb / (end_time - start_time)
print(f"Average bandwidth: {average_bandwidth_gb_per_s:.2f} GB/s")