GPU speed decreases immensely on deployment

Hi,
I developed a c++ software to make predictions on images by segmentation. Using libtorch 1.7, cuda toolkit 10.2 and cudnn 8. Everything works well on my computer (windows10, i7, GTX1070) and on my colleague’s. However, when deploying in a brand new computer (windows10, i7, RTX 3070) the calculation speed drops immensely.
Using only the CPU it takes 2minutes to load the network and perform one prediction. However, when using the GPU, it takes about 5 minutes only to load the network (where it should take 10 sec to do it all).
Libtorch detects the gpu and performs all operations, but really really slow.

This is the new computer’s benchmark:

Everything else works well on the GPU. The Nvidia driver that we are using was released on 2021.4.14 (6 days ago)

Do you know where it might be the problem?