Run backward on a different CUDA stream than forward


I was wondering if it is possible to run forward on a block on one stream and backward on the same block on a different stream (but still on the same GPU)?

It seems like it used to possible before this PR: Updates autograd engine to respect streams set in forward by mruberry · Pull Request #8354 · pytorch/pytorch · GitHub I am wondering if there’s a way to support this use-case now?

Could you check if simply wrapping the backward() call into a separate stream would use it?
You should be able to use e.g. Nsight Systems to check the streams.

No, it still happens on the stream forward happens on. (check with nsys)

It seems you are right given this explanation from the docs:

Each backward CUDA op runs on the same stream that was used for its corresponding forward op. If your forward pass runs independent ops in parallel on different streams, this helps the backward pass exploit that same parallelism.

What’s your use case that you would like to avoid using the same streams from the forward?