Hello everyone, I have had the following error for a few weeks now:
RuntimeError: CUDA error: invalid configuration argument
My use case is RL in self play.
In input of my agent I have a sequence of size [B, Seq_len, Feat_size] (the size of the batch can change during the training) which will be directly put in a LSTM with batchfirst=True.
It happens systematically after several million step at the initialization of the hidden state and the cell.
Here is the traceback, unfortunately I am obliged to truncate and anonymise certain part due to the confidentiality of my project.
**** *****
File "/home/*****/anaconda3/envs/test_env_soda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/adil.zouitine/project/*****/*.py", line 25, in forward
_, (h_n, c_n) = self.lstm_1(x)
File "/home/*****/anaconda3/envs/test_env_soda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/*****/anaconda3/envs/test_env_soda/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 570, in forward
zeros = torch.zeros(self.num_layers * num_directions,
RuntimeError: CUDA error: invalid configuration argument
During the error I was able to get the problematic tensor:
SIZE OF INPUT torch.Size([8, 10, 18]) (the size is Ok)
Device of input: cuda:0 (Device OK)
its values are between -1 and 1 (Value Ok)
Memory summary:
|===========================================================================|
| PyTorch CUDA memory summary, device ID 0 |
|---------------------------------------------------------------------------|
| CUDA OOMs: 0 | cudaMalloc retries: 0 |
|===========================================================================|
| Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed |
|---------------------------------------------------------------------------|
| Allocated memory | 7459 KB | 777955 KB | 27916 GB | 27916 GB |
| from large pool | 0 KB | 24750 KB | 11515 GB | 11515 GB |
| from small pool | 7459 KB | 756142 KB | 16400 GB | 16400 GB |
|---------------------------------------------------------------------------|
| Active memory | 7459 KB | 777955 KB | 27916 GB | 27916 GB |
| from large pool | 0 KB | 24750 KB | 11515 GB | 11515 GB |
| from small pool | 7459 KB | 756142 KB | 16400 GB | 16400 GB |
|---------------------------------------------------------------------------|
| GPU reserved memory | 780 MB | 780 MB | 780 MB | 0 B |
| from large pool | 40 MB | 40 MB | 40 MB | 0 B |
| from small pool | 740 MB | 740 MB | 740 MB | 0 B |
|---------------------------------------------------------------------------|
| Non-releasable memory | 45788 KB | 713821 KB | 33551 GB | 33551 GB |
| from large pool | 0 KB | 29630 KB | 15796 GB | 15796 GB |
| from small pool | 45788 KB | 713821 KB | 17755 GB | 17755 GB |
|---------------------------------------------------------------------------|
| Allocations | 428 | 20610 | 530960 K | 530960 K |
| from large pool | 0 | 7 | 2815 K | 2815 K |
| from small pool | 428 | 20610 | 528144 K | 528144 K |
|---------------------------------------------------------------------------|
| Active allocs | 428 | 20610 | 530960 K | 530960 K |
| from large pool | 0 | 7 | 2815 K | 2815 K |
| from small pool | 428 | 20610 | 528144 K | 528144 K |
|---------------------------------------------------------------------------|
| GPU reserved segments | 372 | 372 | 372 | 0 |
| from large pool | 2 | 2 | 2 | 0 |
| from small pool | 370 | 370 | 370 | 0 |
|---------------------------------------------------------------------------|
| Non-releasable allocs | 55 | 1454 | 280235 K | 280235 K |
| from large pool | 0 | 3 | 1615 K | 1615 K |
| from small pool | 55 | 1454 | 278620 K | 278620 K |
|===========================================================================|
I have Nvidia 3090 24gb.
Do you have any idea why this kind of error appears?
I didn’t have any problems when I replaced the LSTM with a linear (I flatten the input).