Hi,
I’m facing a cuda OOM error during evaluation only in the prediction phase (model(inputs))
Training loop works fine.
few points:
- It happens with and without training loop
- The train and evaluation loop are in separate functions (each epoch each function called - does not looks like scope issue)
- I am using with torch.no_grad() and model.eval()
- train and eval loaders has same data types, same shapes and same batch sizes.
- This framework works with lot’s of models. The problem occur in new model I implemented - so I suspect that the problem is in the model and not in the code around it
- The problem occur with and without mixed precision, but if I use torch.cuda.amp.autocast() before evaluation - the problem is solved. (Still I want to know why)
- If I use torch.cuda.clear_cash() inside the forward() method - problem is allegedly solved.
My model has 5 layers from the same kind and the input grows becaause of number of filters of convolutional layers.
initial input size is (2048, 10, 256) == (B, C, N) and
The forward function of my model:
def forward(self, x, info_dict=None):
self.process_info_dict(info_dict)
tmp_res = []
for d, m in self.slice_layers.items():
if d == "remainder":
tmp_res.append(m(x[:, self.slice_dict[d]]))
else:
b, c, n = (x.shape[0], len(self.slice_dict[d]), x.shape[2])
tmp_x = x[:, self.slice_dict[d]].view(-1, 1, n)
res = m(tmp_x)
res = res.view(b, c * self.n_filters, self.output_dim[-1])
tmp_res.append(res)
x = torch.cat(tmp_res, 1)
self.update_info_dict(info_dict)
return x
When info_dcit is a dictionary of few hundreds of integers
The error occur in the line
x = torch.cat(tmp_res, 1)
The for loop is executed only 3 times.
The model layers are defined here:
def create_layers(self, out_h, dtype_slices, kernel_size, dilation, padding, stride, groups):
for i, (d, layer_out_h) in enumerate(out_h.items()):
block_layers = []
in_channels = dtype_slices[d] if d == "remainder" else 1
out_channels = layer_out_h if d == "remainder" else int(layer_out_h / dtype_slices[d])
layer = torch.nn.Conv1d(in_channels, out_channels, kernel_size=kernel_size,
dilation=dilation,
padding=padding, stride=stride, groups=groups[d])
block_layers.append(layer)
if self.activation:
block_layers.append(self.activation)
self.output_dim = conv_output_shape(self.output_dim, kernel_size, stride, padding)[0]
self.slice_layers[d] = (torch.nn.Sequential(*block_layers))
and all the layers save in:
self.slice_layers = torch.nn.ModuleDict()
error message:
CUDA out of memory. Tried to allocate 5.00 GiB (GPU 0; 21.99 GiB total capacity; 10.30 GiB already allocated; 2.29 GiB free; 19.40 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Any idea what to check and how?
Thanks!