4090 25% slower than 3090 for pytorch transformers?

Is anyone else having severe performance issues on RTX 4090 cards with pytorch and transformers?

Using nvidia ncg docker images 22.11-py3 didn’t help a bit. Also tried pytorch 2.0/nightly. Nothing works. Pure cuda benchmarks shows 4090 can scale to 450w on cuda workload and perf is as advertised of almost 2x vs 3090. But, on my pytorch transformer workload using huggingspaces translation pipeline, 3090 is consistently 25% faster.

Thank s a bunch.