Increasing of model size leads to poor backward performance

I built ResNet - like model with Pytorch 1.11.
I use a machine with 2 CPU (12 cores), 72 GB memory, no GPU.
While trainig the following results were obtained:

|params |forward, s|backward, s|
|76 929 892 |2,81 |7,1 |
|80 075 818 |3,14 |7,76 |
|83 322 861 |3,32 |7,95 |
|86 634 044 |3,45 |8,47 |
|90 009 367 |3,58 |8,86 |
|93 516 104 |3,47 |8,92 |
|96 979 046 |3,90 |9,23 |
|100 547 209|4,01 |9,93 |
|104 179 512|4,27 |10,19 |
|107 875 955|4,31 |244,46 |
|111 710 028|4,17 |255,14 |

Seems like I achieved limitation of some resource, L3 cache?

Any ideas how to resolve it?

Thank you.