Performance difference from spinningup VPG

Hi, I’m implementing the Vanilla Policy Gradient (REINFORCE) with GAE for advantage estimation with spinningup implementation as a reference.

During the learning, I found the significant difference of the performance between mine and the spinningup one. my implementation took about 1255 seconds while the spinningup only 169 seconds.

detail of the performance

spinningup vpg

43216335 function calls (40435708 primitive calls) in 169.783 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1200300   18.082    0.000   18.082    0.000 {method 'matmul' of 'torch._C._TensorBase' objects}
     4050   14.348    0.004   14.348    0.004 {method 'run_backward' of 'torch._C._EngineBase' objects}
   808500   13.275    0.000   13.275    0.000 {built-in method tanh}
   200150   12.876    0.000   12.876    0.000 {method 'logsumexp' of 'torch._C._TensorBase' objects}
  1212750    9.589    0.000   38.420    0.000 functional.py:1355(linear)
3033950/404250    9.217    0.000   74.797    0.000 module.py:531(__call__)
     2830    6.419    0.002    6.419    0.002 {method 'read' of '_io.FileIO' objects}
    12450    5.643    0.000    5.643    0.000 {built-in method addmm}
   200300    4.866    0.000    4.866    0.000 {built-in method as_tensor}
  1212750    4.790    0.000    4.790    0.000 {method 't' of 'torch._C._TensorBase' objects}
   200000    4.332    0.000    7.135    0.000 cartpole.py:91(step)
   200150    3.459    0.000   17.169    0.000 categorical.py:44(__init__)
   404250    3.391    0.000   69.865    0.000 container.py:90(forward)
  1212750    3.138    0.000   42.537    0.000 linear.py:86(forward)
   200050    2.890    0.000  104.369    0.001 core.py:126(step)
        1    2.766    2.766  157.027  157.027 vpg.py:89(vpg)

mine

39947968 function calls (37109426 primitive calls) in 1255.151 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     4050  757.922    0.187  757.922    0.187 {method 'run_backward' of 'torch._C._EngineBase' objects}
     4000  307.777    0.077  335.365    0.084 vpg.py:248(_compute_value_function_loss)
     4150   27.596    0.007   27.596    0.007 {method 'mean' of 'torch._C._TensorBase' objects}
        1   22.205   22.205 1253.417 1253.417 vpg.py:59(learn)
  1200150   20.096    0.000   20.096    0.000 {method 'matmul' of 'torch._C._TensorBase' objects}
   808200   13.866    0.000   13.866    0.000 {built-in method tanh}
   200050   13.662    0.000   13.662    0.000 {method 'logsumexp' of 'torch._C._TensorBase' objects}
3232800/404100   10.787    0.000  101.064    0.000 module.py:531(__call__)
  1212300   10.297    0.000   42.361    0.000 functional.py:1355(linear)
    12150    6.157    0.001    6.157    0.001 {built-in method addmm}

As you see, the backward propagation took the most of the execution time.

In the codes, they ran value function updates 80 times by default like

for _ in range(self.n_value_gradients):
        all_values = self.value_function(all_observations_tensor)
        value_loss = self._compute_value_function_loss(all_values, discounted_returns_tensor)
        self.value_function.optimizer.zero_grad()
        value_loss.backward()
        self.value_function.optimizer.step()

for now, I confirmed below parameters are the same between the implementations:

  • the number of the network parameters: policy: 4610, value_fn: 4545
  • network architecture (two hidden layers with 64 units)
  • total environment interactions
  • number of value function updates
  • learing rate both on policy and value function
  • gym environment: CartPole-v0

Could you give me an advice to improve this?

my implementation

Spiningup

document

code