I tried to implement A3C using torch.multiprocess and put my models into different gpus. However it told me I cannot use cuda when using multiprocess unless I switch to Python 3.4+ (I am currently using Python 2.7.6). The document recommand me to use DataParallel. But I want to implement asynchronous algorithm. In addition, DataParallel is implemented with threads, while my agents may have a lot of cpu operations and python objects modifications, which can be frequently blocked due to Python’s GIL and slow down the speed. So how can I correctly implemented A3C in Pytorch and store all my parameters in to cuda?
I got this to work and working fast if you have the GPUs:) For Pong it speed up convergence to 10mins on my dgx station compared to 45mins on CPU only.
I have posted new A3C-GPU versions in repos below:
discrete action spaces:
continuous action space version: