I have a server with 2 x RTX 3090 GPUs.
Installed Nvidia driver 460, CUDA 11.1, PyTorch nightly (1.8), on Ubuntu 20 and tried running deep learning benchmarks.
The problem is everything runs fine if I use a single GPU.
But the moment when I run both of them, the PC just shuts off.
- I tried using a stress test that loaded both GPUs 100% utilization and it worked fine without crashing.
- I tried limiting the power of GPUs to 200W (using ‘sudo nvidia-smi -pl 200’ command), started the pytorch training script and it crashed again
so I guess it isn’t power supply issue (it’s a SilverStone 1500 watt power supply)
here are the code lines I use for using the 2GPUs:
model = models.resnet152(pretrained=False)
model.conv1 = nn.Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
model.fc = nn.Linear(2048, 2)
if torch.cuda.device_count() > 1:
model = nn.DataParallel(model)
for input_images, labels in dataloaders[‘train’]:
# Enable CUDA: use GPUs for model computation
input_images, labels = input_images.to(device), labels.to(device)
don’t know how to proceed with this…