Hi,
I’m trying to fine-tune gpt2 and while training (with a batch size of 1) I get
Traceback (most recent call last):
File "H:/PycharmProjects/pythonProject/DecisionSummariesLM.py", line 102, in <module>
outputs = model(input_ids,
File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 888, in forward
transformer_outputs = self.transformer(
File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 738, in forward
outputs = block(
File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 319, in forward
feed_forward_hidden_states = self.mlp(self.ln_2(hidden_states))
File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 261, in forward
return self.dropout(h2)
File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\torch\nn\modules\dropout.py", line 58, in forward
return F.dropout(input, self.p, self.training, self.inplace)
File "C:\ProgramData\Anaconda3\envs\pythonProject\lib\site-packages\torch\nn\functional.py", line 983, in dropout
else _VF.dropout(input, p, training))
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 8.00 GiB total capacity; 3.19 GiB already allocated; 3.52 GiB free; 3.20 GiB reserved in total by PyTorch)
I don’t really understand why I’m getting this as it’s only 20 MiB and the error seems to suggest that there are 3.20 GiB allocated for pytorch.
I have fine-tuned the same model on the same GPU before with different data but 20 MiB ?