Significiant time difference between minor model architecutre change

I’m running ppo lagrangian algorithm from following repo: GitHub - akjayant/PPO_Lagrangian_PyTorch: Implementation of PPO Lagrangian in PyTorch

By changing one of the model’s architecture from:

>  class MLPCritic(nn.Module):
>       def __init__(self, obs_dim, hidden_sizes, activation):
>           super().__init__()
>           self.v_net = mlp([obs_dim] + list(hidden_sizes) + [1], activation)
>       def forward(self, obs):
>           return torch.squeeze(self.v_net(obs), -1) # Critical to ensure v has right shape.

To:

 class MLPCritic_affine_Q_wide(nn.Module):
      def __init__(self, obs_dim, ac_dim, hidden_sizes, activation):
          super().__init__()
          self.q_w_net = mlp([obs_dim] + list(hidden_sizes) + [obs_dim], activation)
          self.fuse_layer = mlp([obs_dim + ac_dim] + [1], activation)
      def forward(self, obs, act):
          w = self.q_w_net(obs)
          result = self.fuse_layer(torch.cat((w, act), dim=-1))
          return result

During a simple test where:
obs_dim = 2
act_dim = 1
The backward updating time increased by 10 times, but there is no significiant increase of parameter size and model deepth.
Why such minor change to the network has a profound impact to the running time?

Without further details on the network, device and such it’s a bit hard to tell. Feel free to share more details, like a fully reproducible example!

Ideally, you should run your models under the pytorch profiler to get a good understanding of what’s going on:
https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html

with profile(activities=[ProfilerActivity.CPU], record_shapes=True) as prof:
    net1(data).backward() # or net2
prof.export_chrome_trace("trace.json")

then open it in chrome://tracing in your browser.