Cuda out of memory during evaluation (tried everything)

seanco92 · October 4, 2023, 7:46am

Hi,
I’m facing a cuda OOM error during evaluation only in the prediction phase (model(inputs))
Training loop works fine.
few points:

It happens with and without training loop
The train and evaluation loop are in separate functions (each epoch each function called - does not looks like scope issue)
I am using with torch.no_grad() and model.eval()
train and eval loaders has same data types, same shapes and same batch sizes.
This framework works with lot’s of models. The problem occur in new model I implemented - so I suspect that the problem is in the model and not in the code around it
The problem occur with and without mixed precision, but if I use torch.cuda.amp.autocast() before evaluation - the problem is solved. (Still I want to know why)
If I use torch.cuda.clear_cash() inside the forward() method - problem is allegedly solved.

My model has 5 layers from the same kind and the input grows becaause of number of filters of convolutional layers.
initial input size is (2048, 10, 256) == (B, C, N) and
The forward function of my model:

def forward(self, x, info_dict=None):
    self.process_info_dict(info_dict)
    tmp_res = []
    for d, m in self.slice_layers.items():
        if d == "remainder":
            tmp_res.append(m(x[:, self.slice_dict[d]]))
        else:
            b, c, n = (x.shape[0], len(self.slice_dict[d]), x.shape[2])
            tmp_x = x[:, self.slice_dict[d]].view(-1, 1, n)
            res = m(tmp_x)
            res = res.view(b, c * self.n_filters, self.output_dim[-1])
            tmp_res.append(res)
    x = torch.cat(tmp_res, 1)
    self.update_info_dict(info_dict)
    return x

When info_dcit is a dictionary of few hundreds of integers
The error occur in the line
x = torch.cat(tmp_res, 1)
The for loop is executed only 3 times.
The model layers are defined here:

def create_layers(self, out_h, dtype_slices, kernel_size, dilation, padding, stride, groups):
        for i, (d, layer_out_h) in enumerate(out_h.items()):
            block_layers = []
            in_channels = dtype_slices[d] if d == "remainder" else 1
            out_channels = layer_out_h if d == "remainder" else int(layer_out_h / dtype_slices[d])
            layer = torch.nn.Conv1d(in_channels, out_channels, kernel_size=kernel_size, 
                                                   dilation=dilation,
                                                  padding=padding, stride=stride, groups=groups[d])
            block_layers.append(layer)
            if self.activation:
                block_layers.append(self.activation)
            self.output_dim = conv_output_shape(self.output_dim, kernel_size, stride, padding)[0]
            self.slice_layers[d] = (torch.nn.Sequential(*block_layers))

and all the layers save in:
self.slice_layers = torch.nn.ModuleDict()

error message:
CUDA out of memory. Tried to allocate 5.00 GiB (GPU 0; 21.99 GiB total capacity; 10.30 GiB already allocated; 2.29 GiB free; 19.40 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Any idea what to check and how?
Thanks!