Clarification on at:parallel_for needed

Matthias_Moller · January 22, 2023, 4:34pm

In this document slide 43 I read that it is recommended to use at::parallel_for over OpenMP pragmas.

In another post here the individual elements of the tensor are accessed by the operator[], e.g.

torch::Tensor z_out = at::empty({z.size(0), z.size(1)}, z.options());
  int64_t batch_size = z.size(0); 

  at::parallel_for(0, batch_size, 0, [&](int64_t start, int64_t end) {
    for (int64_t b = start; b < end; b++) {
      z_out[b] = z[b] * z[b];
    }
  });

Is this the right way to do or should one still use a tensor accessor (even when using at::parallel_for)?

Matthias_Moller · January 25, 2023, 9:00am

Thanks, let me ask two follow-up questions.

Is there a recommended way to implement nested loops with at::parallel_for? With OpenMP you can (if the logic of the loops allows for it) use the nested(2) keyword.
Is there a similar approach for tensors residing in GPU memory or do I need to implement a CUDA kernel?

Many thanks in advance.