Hi,
I’m trying to run the neural style transfer example but am running out of memory:
~/.conda/envs/ONE/lib/python3.7/site-packages/torch/cuda/init.py in _lazy_new(cls, *args, **kwargs)
493 # We need this method only for lazy init, so we can remove it
494 del _CudaBase.new
–> 495 return super(_CudaBase, cls).new(cls, *args, **kwargs)
496
497
RuntimeError: CUDA error: out of memory
I have 4 V100 (16GB) available, and am using device #2:
print(device)
device(type='cuda', index=2)
nvidia-smi shows that I’m nowhere close to exhausting the memory of device #2 when the error happens:
±----------------------------------------------------------------------------+
| NVIDIA-SMI 390.59 Driver Version: 390.59 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2… Off | 00000000:61:00.0 Off | 0 |
| N/A 66C P0 262W / 300W | 16139MiB / 16160MiB | 94% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla V100-SXM2… Off | 00000000:62:00.0 Off | 0 |
| N/A 30C P0 40W / 300W | 11MiB / 16160MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla V100-SXM2… Off | 00000000:89:00.0 Off | 0 |
| N/A 31C P0 54W / 300W | 1063MiB / 16160MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla V100-SXM2… Off | 00000000:8A:00.0 Off | 0 |
| N/A 30C P0 41W / 300W | 11MiB / 16160MiB | 0% Default |
±------------------------------±---------------------±---------------------+
±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 74738 C python 16128MiB |
| 2 60794 C …0/u62/ivoliv/.conda/envs/ONE/bin/python 1052MiB |
±----------------------------------------------------------------------------+
Is this to be expected?