Assumptions around autograd and Python multi-threading

I don’t think we took any particular attention to this case appart from: “It should not deadlock”, “It should compute correct gradients”.
For the absence of thread-local storage, my feeling as a python user was that, unless you redeclare something in your new thread, everything is global to all threads.
In particular if you want to do learning in multiple threads (not that it is a useful thing to do), you should have two different models to do that.
You think we should document / change this behavior?