Dear friends,
I am using pytorch mainly for linear algebra tasks and I have question about transfer of some tensor section from RAM to GRAM. For example
for iter in range(40):
if flag == 2:
tmp1[:] = A[iter*1000:(iter+1)*1000,:]
flag = 1
elif flag == 1:
c = c + tmp1[:]
tmp2[:] = A[iter*1000:(iter+1)*1000,:]
flag = 0
elif flag == 0:
c = c + tmp2[:]
tmp1[:] = A[iter*1000:(iter+1)*1000,:]
flag = 1
A tensor is very big and located in RAM, tmp1, tmp2 and c tensors are in GPU RAM. I want to load parts of a tensor to GPU and do some calculations with them, as you can see I want do split calculation part and loading part. And the question is it done automatically asynchronous or I need to do additional configuration?