Hello PyTorch-Community,
i am very new to PyTorch and I try to get this Segmentaton Network running on my Notebook with Geforce GTX 1650. I’d be glad if you can give me hints.
The following error occurs:
RuntimeError: CUDA out of memory. Tried to allocate 450.00 MiB (GPU 0; 3.82 GiB total capacity; 2.08 GiB already allocated; 182.75 MiB free; 609.42 MiB cached)
It obviously means, that i dont have enough memory on my GPU. But I dont understand why, because 3.8GB - 2GB - 600MB = 1.2GB free space != 180MB . In similar Questions people say, that this is due to fragmentation. But how does this make sense, i only load a pretrained model to my GPU? Where do the 2GB occupied space come from?
Then I open up nvidia-smi, which makes me wonder even more, as it says that only 10% are occupied:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21 Driver Version: 435.21 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1650 Off | 00000000:01:00.0 Off | N/A |
| N/A 47C P8 1W / N/A | 285MiB / 3914MiB | 10% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1214 G /usr/lib/xorg/Xorg 28MiB |
| 0 1720 G /usr/lib/xorg/Xorg 105MiB |
| 0 1984 G /usr/bin/gnome-shell 103MiB |
+-----------------------------------------------------------------------------+
Im also using PyTorch 1.1 with Torchvision 0.3, the network does not function with newer versions due some boolean changes.
Here is a short summary of whats called (i execute this script):
# (...)
model = DistributedDataParallel(model.cuda(device), device_ids=[device_id], output_device=device_id)
# (...)
model.eval()
# (...)
# Output of torch.cuda.memory_allocated: 512089088 (512,089,088 bytes)
for it, batch in enumerate(dataloader): # Batch Size 1
with torch.no_grad():
# (...)
torch.cuda.empty_cache() # Does not change allocated memory
# Here RuntimeError occurs:
_, pred, _ = model(img=img, do_loss=False, do_prediction=True)
I use pretrained weights and don’t have the resources to train it from scratch with a different architecture. Maybe the model is too large for my GPU, but it only needs 500MB in the beginning + 450MB when doing predicition?. How can it be, that it does not fit on my GPU?
I’d be glad if you have any further tips i can investigate in.
Benjamin