Assumptions around autograd and Python multi-threading

I’m wondering if it’s safe to assume that users will not call my gradient library from multiple Python threads simultaneously

IE, are there reasons to support this use-case?

It seems current PyTorch doesn’t fully support this use case either, thread-global storage for .grad means that two different Python threads can interfere with each other, example https://gist.github.com/yaroslavvb/e53c83c40c8385cd90cdc15c7c61fa63

I don’t think we took any particular attention to this case appart from: “It should not deadlock”, “It should compute correct gradients”.
For the absence of thread-local storage, my feeling as a python user was that, unless you redeclare something in your new thread, everything is global to all threads.
In particular if you want to do learning in multiple threads (not that it is a useful thing to do), you should have two different models to do that.
You think we should document / change this behavior?

Documenting this would be useful. In particular the fact that there’s global autograd engine shared among Python threads, and that a call into backward will block until all concurrent .backward calls complete. Perhaps here? https://pytorch.org/docs/stable/autograd.html#torch.Tensor.backward

Disallowing concurrent backward calls in a process seems fine since Python support for concurrency is not great.

The Python Global Interpreter Lock or GIL, in simple words, is a mutex (or a lock) that allows only one thread to hold the control of the Python interpreter. All the GIL does is make sure only one thread is executing Python code at a time; control still switches between threads. What the GIL prevents then, is making use of more than one CPU core or separate CPUs to run threads in parallel.

Python threading is great for creating a responsive GUI, or for handling multiple short web requests where I/O is the bottleneck more than the Python code. It is not suitable for parallelizing computationally intensive Python code, stick to the multiprocessing module for such tasks or delegate to a dedicated external library. For actual parallelization in Python, you should use the multiprocessing module to fork multiple processes that execute in parallel (due to the global interpreter lock, Python threads provide interleaving, but they are in fact executed serially, not in parallel, and are only useful when interleaving I/O operations). However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.