Shared variable in GPU

Hi,

I want to have two parallel processes in one GPU, one for training (calculating) and the other for communicating parameter updates with other GPUs. And both processes can modify a shared variable, (like a buffer to store the most updated parameters). Anyone knows is it possible to do this?
I checked this documentation: https://pytorch.org/docs/stable/notes/multiprocessing.html, and it mentions “multiprocessing.Queue”, not sure is it suitable in my case? Or any good examples?

This should be doable, see the example code in this post: