I have a server with 2 x RTX 3090 GPUs.
Installed Nvidia driver 460, CUDA 11.1, PyTorch nightly (1.8), on Ubuntu 20 and tried running deep learning benchmarks.
The problem is everything runs fine if I use a single GPU.
But the moment when I run both of them, the PC just shuts off.
I tried using a stress test that loaded both GPUs 100% utilization and it worked fine without crashing.
I tried limiting the power of GPUs to 200W (using ‘sudo nvidia-smi -pl 200’ command), started the pytorch training script and it crashed again
so I guess it isn’t power supply issue (it’s a SilverStone 1500 watt power supply)
here are the code lines I use for using the 2GPUs:
if torch.cuda.device_count() > 1:
model = nn.DataParallel(model)
model.to(device)
for input_images, labels in dataloaders[‘train’]:
# Enable CUDA: use GPUs for model computation
input_images, labels = input_images.to(device), labels.to(device)
no, just CUDA 11.1.
I was told thig might still be PSU issue.
even though the stress test with 100% utilization for both 2GPUs passes correctly, when running the pytorch training the GPUs+CPU might have big power spikes which the PSU cant handle (1500Wat, rated ‘80 PLUS Silver’).
I got the recommendation to get a PSU with ‘80 platinum’ or ‘80 titanium’ rating
Any PSU above 1200W usually requires that the incoming voltage be of a higher rating to achieve the max stated watt rating of the PSU. What is the AC input voltage rating coming into the PSU? (This can vary by country.)