I tried to implement A3C using torch.multiprocess and put my models into different gpus. However it told me I cannot use cuda when using multiprocess unless I switch to Python 3.4+ (I am currently using Python 2.7.6). The document recommand me to use DataParallel. But I want to implement asynchronous algorithm. In addition, DataParallel is implemented with threads, while my agents may have a lot of cpu operations and python objects modifications, which can be frequently blocked due to Python’s GIL and slow down the speed. So how can I correctly implemented A3C in Pytorch and store all my parameters in to cuda?