GPU memory usage issue of A3C in GPU

I implemented A3C algorithm to play starcraft2. The shared model and local model in each worker is put in GPU by using model.cuda(). This method is much like A3G, the only difference is that the shared model in A3G is in CPU, not GPU.

A3C needs to collect n-step experience to update the local model. I use a list to store history n-step trajectory here. However, the memory cost of each worker in GPU is 1200MB, which seems to be much higher than the stored trajectory experience. Is it reasonable? Or it just caused by bad programming?

How can we reduce the memory usage in this case?

Thank you!