CUDA out of memory but says error suggests there is free memory

neuralpat · February 2, 2021, 2:59pm

Hi,

I’m trying to fine-tune gpt2 and while training (with a batch size of 1) I get

Traceback (most recent call last):
  File "H:/PycharmProjects/pythonProject/DecisionSummariesLM.py", line 102, in <module>
    outputs = model(input_ids,
  File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 888, in forward
    transformer_outputs = self.transformer(
  File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 738, in forward
    outputs = block(
  File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 319, in forward
    feed_forward_hidden_states = self.mlp(self.ln_2(hidden_states))
  File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 261, in forward
    return self.dropout(h2)
  File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\torch\nn\modules\dropout.py", line 58, in forward
    return F.dropout(input, self.p, self.training, self.inplace)
  File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\torch\nn\functional.py", line 983, in dropout
    else _VF.dropout(input, p, training))
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 8.00 GiB total capacity; 3.19 GiB already allocated; 3.52 GiB free; 3.20 GiB reserved in total by PyTorch)

I don’t really understand why I’m getting this as it’s only 20 MiB and the error seems to suggest that there are 3.20 GiB allocated for pytorch.

I have fine-tuned the same model on the same GPU before with different data but 20 MiB ?