I am moving from CUDA C to PyTorch to achieve high performance parallel computing.
If I need add a constant variable to a tensor, the following way is certainly working.
t = torch.tensor.ones(10, 10000, 1000)
k_a = range(10)
for i in range(10):
t[i] = t[i] + k_a[i]
But for achieve better performance,
Do I need copy k_a to GPU memory first?
Can I copy k_a to GPU constant memory?
Or any thing i can do to improve it?
Thanks,