Error in CUDA memory allocation, no matter the GPU size

Traceback (most recent call last):                      
  File "main.py", line 49, in <module>
    solver.exec()
  File "/project/src/solver.py", line 195, in exec
    self.valid()
  File "/project/src/solver.py", line 234, in valid
    ctc_pred, state_len, att_pred, att_maps = self.asr_model(x, ans_len+VAL_STEP,state_len=state_len)
  File "/home/anaconda2/envs/dlp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/project/src/asr.py", line 61, in forward
    encode_feature,encode_len = self.encoder(audio_feature,state_len)
  File "/home/anaconda2/envs/dlp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/project/home/src/asr.py", line 314, in forward
    input_x,enc_len = self.vgg_extractor(input_x,enc_len)
  File "/home/anaconda2/envs/dlp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/project/src/asr.py", line 554, in forward
    feature = self.pool2(feature) # BSx128xT/4xD/4
  File "/home/anaconda2/envs/dlp/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/anaconda2/envs/dlp/lib/python3.6/site-packages/torch/nn/modules/pooling.py", line 148, in forward
    self.return_indices)
  File "/home/anaconda2/envs/dlp/lib/python3.6/site-packages/torch/_jit_internal.py", line 132, in fn
    return if_false(*args, **kwargs)
  File "/home/anaconda2/envs/dlp/lib/python3.6/site-packages/torch/nn/functional.py", line 425, in _max_pool2d
    input, kernel_size, stride, padding, dilation, ceil_mode)[0]
  File "/home/anaconda2/envs/dlp/lib/python3.6/site-packages/torch/nn/functional.py", line 417, in max_pool2d_with_indices
    return torch._C._nn.max_pool2d_with_indices(input, kernel_size, _stride, padding, dilation, ceil_mode)
RuntimeError: CUDA out of memory. Tried to allocate 24.12 MiB (GPU 0; 10.91 GiB total capacity; 9.25 GiB already allocated; 17.44 MiB free; 41.97 MiB cached)

I keep increasing the GPU size, using 2 gpus but i still get this error even when batch size is 1.

How could I resolve it?

This also means it is not using the 2 GPUs.

How could I use 2 GPUs ( using nn.DataParallel?), and any way to remove this error?

Are you doing this in jupyter or py script?