Hello, newbie here.
I am fighting against my new setup:AMD 9950X, MB is Rog Crosshair X870E, 192GB ram, 2 x samsung 990pro in software raid 1 under linux ubuntu server 24.04LTS. Tried with RTX4090 at first, bought a RTX5090 now and using it.
Training with Pytorch is 1 to 6 it/s with 5090. I reached 9it/s in the past days with RTX4090 but I don’t know how.
Can you please help finding the bottleneck?
python -c “import torch; print(torch.version); print(torch.cuda.is_available()); print(torch.cuda.get_device_name(0))”
2.8.0.dev20250322+cu128
True
NVIDIA GeForce RTX 5090
python --version
Python 3.12.3
Thanks for any feedback you may have.
All the best