Accroding to original A3C, there exist 1 shared global model and multiple local models corresponded to each worker process. A3C is designed for multi-core CPU for parallel processing with poor GPU usage.
With little modification like model.cuda() we can run the model on GPU to speed up training.
If we have 2 GPUs in computer. Can we run the A3C just without modification to share the global model between each worker?
cause cpu is faster for sharing model cause we can do “hogwild training”, sharing without locks asynchronous updates. The act of acquiring locks and releasing locks is actually quite time consuming. This is a benefit of CPU has over GPU. GPU requires locks for sharing in almost all cases except for atomic operations