Cuda of memory of my code

Hi, all,

    I implement my code about an Autoendocer with dense blcok on the basis of Pytorch.  The total number of parameters around 0.8M.  The image size is 100*100.  the batchsize for training 100. However, when I launch CUDA, an error "cuda out of memory" will be ocurred.  Can anyone can help me. My GPU card : 

-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.14 Driver Version: 430.14 CUDA Version: 10.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 745 Off | 00000000:01:00.0 On | N/A |
| 25% 59C P0 N/A / N/A | 4037MiB / 4043MiB | 1% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|===========================================================================

Thanks.

Hi,

There is not much you can do to reduce GPU memory usage except reducing the batch size and use inplace operations for activations like ReLUs if possible (if you see an error saying that a tensor needed for backward has been changed by an inplace operation, then that means that you cannot use them at that particular place).

Hi,

    Thanks for your replying.  My GPU card only has around 4G memory.  This error will be resolved if I use a GPU card with larger memory? 

best regard

This might be the case. If it’s possible to post the model definition and some dummy inputs (and targets) I could check the memory usage on larger GPUs.

Hi, ptrblck,

Thanks for your kind help.  The following link is the github of this code: 
   https://github.com/xclmj/Dense-Convolutional-Encoder-Decoder-Network-with-Dense-Block.git

You can download the data from : https://drive.google.com/drive/folders/1VkYtS2oe-vwapUjwIG_0GdFzR_RMw2xW?usp=sharing.

best regard

Thanks for the code.
I’ve used the default arguments and this script to run the test:

device = 'cuda'

# enters time in latent
model = DenseEDT(1,
                 2,
                 blocks=(4, 9, 4),
                 times=(5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100),
                 growth_rate=24,
                 drop_rate=0,
                 bn_size=100,
                 num_init_features=48,
                 bottleneck=False,
                 time_channels=1).to(device)


optimizer = torch.optim.Adam(model.parameters(), lr=0.001,
                             weight_decay=5e-4)
criterion = nn.MSELoss()

data = torch.randn(10, 1, 224, 224, device=device)
times = torch.randn(10, device=device)

output = model(data, times)
target = torch.empty_like(output).normal_()

loss = criterion(output, target)
loss.backward()

print('mem allocated {:.3f}MB'.format(torch.cuda.memory_allocated()/1024**2))
> mem allocated 16.433MB
print('max mem allocated {:.3f}MB'.format(torch.cuda.max_memory_allocated()/1024**2))
> max mem allocated 2146.784MB
print('mem cached {:.3f}MB'.format(torch.cuda.memory_cached() / 1024**2))
> mem cached 2440.000MB

nvidia-smi shows a usage of approx. 3500MB using a Titan V GPU with 12GB memory.
The additional usage comes from the cuda context, which will take some space on the GPU.