Should we set non_blocking to True?

Sorry for picking up on old thread but I found your statement interesting.

Without any in-depth knowledge of how GPUs work, I assume that this means: The data transfer to the GPU can happen independent of the computation i.e. tensor transformations which is why non_blocking=True is a good option.

If, however, we wanted to do something that changes the data itself, say normalize it along some dimension, then its not really going to help because that updated data will have to be readied before output=model(data) part.

Is this understanding of mine largely correct?

If I understand your description correctly, your general understanding should be correct.
Asynchronous operation would allow you to execute other operations in the meantime while the async operation is being executed in the background. If you have a data dependency between both tasks, the execution of the data-dependent operation would need to wait.

1 Like

Hello ! How to use non_blocking=True in C++ in libtorch?

The same methods should also accept the non_blocking argument e.g. as seen in:

Module::to(at::Device device, at::ScalarType dtype, bool non_blocking)
1 Like

Thanks Patrick, for the example! I tryed this, seem worked

  int height =400;
  int width = 400;
  std::vector<int64_t> dims = { 1, height, width, 3 };
  auto options = torch::TensorOptions().dtype(torch::kUInt8).device({ torch::kCUDA }).requires_grad(false);
  torch::Tensor tensor_image_style2 = torch::zeros(dims, options);
  bool non_blocking = true;
  bool copy_flag = false;
  tensor_image_style2 = tensor_image_style2.to(torch::kCUDA, non_blocking, copy_flag);