Saving Numpy Arrays in the Most Compressed Way

MUnique · June 22, 2020, 5:10pm

Hello everyone,

I have some large NumPy arrays (4000 , 200 , 200 , 20). I was looking for the best way to save them in the most compressed way. I’ve tried .npz format saving but it takes about 200MB for each. These Numpy arrays would be the inputs to my CNN and due to the large size of the arrays, I am struggling with “Cuda out of memory” problem.

ptrblck · June 23, 2020, 3:42am

The “CUDA out of memory” issue is raised, if your GPU is running our of memory and is unrelated to the storage of numpy arrays.
Try to reduce the batch size for your input tensors to reduce the memory usage or alternatively use torch.utils.checkpoint to trade compute for memory.

MUnique · June 23, 2020, 5:07am

The batch size is set to 1. A large part of it happens when I call my network (Model), before even getting to the forward pass:
In the debugging window:
y = Model(x)
-> Class Model(Module):
def init(self):
# some initializations
def forward (self, input):
# A large part of memory have been used before the forward pass. Here
tensor = torch.max(input, dim=2, keepdim=False)

ptrblck · June 23, 2020, 5:09am

If the OOM error is raised before the forward pass is executed, this would mean that either your model or the current batch is already too large for your GPU.
Are you pushing the complete dataset to the device?

MUnique · June 23, 2020, 5:11am

No, every epoch only one of these arrays (4000,200,200,20) is fed to the network. There are 200 of these arrays in total in the datatset. The network itself is not a large one. It has 10 convolutional layers in total.

ptrblck · June 23, 2020, 5:28am

I don’t quite understand the use case, as you’ve mentioned the batch size is 1.
Which dimension of your input would correspond to the batch size and which shape are you feeding to the model?
If the mentioned shape is fed in each epoch to the model, when does the OOM error occur?

MUnique · June 23, 2020, 5:51am

The dimension of the input is (1, 4000, 200, 200, 20) in which the first one is the batch size. Here is my watchdog window for each command during debugging:
y = Model(x) ######## Watchdog GPU Memory Usage: 1085MiB
-> Class Model(Module):
def init (self):
…
def forward (self, input):
tensor = input ######## Watchdog: 8279MiB

Seems that OOM happens when the GPU memory usage is 8469MiB in a layer between but the largest part of the memory is used before getting into the forward pass which I guess makes the problem.

ptrblck · June 23, 2020, 6:50am

Since you’ve mentioned you are using a CNN, I assume you are using nn.Conv3d layers and the input has a channel dimension of 4000?
Could you post your model architecture and post your GPU (including the memory)?

MUnique · June 23, 2020, 7:42am

The input is actually like a video in which 4000 is the number of frames, spatial window size is 200 for both horizontal and vertical and 20 is the number of channels. I am using two Tesla V100 PCIe 16GB GPUs.
The model architeture is a stack of two of these layers:
1DTemporalConv-1DChannelconv-2DSpatialConv-2DSpatialMaxPooling

I also tried to test the network without training and the same problem happens during testing.

MUnique · June 24, 2020, 3:28am

Is it possible that the size of the input array in each epoch (200MB) cause the OOM problem? Seems that Variable(input.cuda()) causes the problem before getting into the forward pass.

ptrblck · June 24, 2020, 4:30am

We would still need to see the model to debug it further.
Could you post it with all necessary arguments and shapes to reproduce the OOM?

googlebot · June 24, 2020, 5:54am

4000*200*200*20*16bits ~= 6GB, so no mysteries there. 200mb is zipped .npz size, you can’t train a network on that representation.