MemoryError on Raspberry Pi 2

Hello,

I am trying to train this on a Raspberry 2 which has 512MBs of RAM. I reduced the original Google Commands Dataset into 12 classes and 34000 training samples and I am using a Resnet model with 8 layers.

class ResNet(BaseModel):
    def __init__(self, config):
        super().__init__()
        self.n_layers = config["n_layers"]
        n_maps = config["n_feature_maps"]

        self.layers = nn.ModuleDict()

        self.layers["conv_0"] = nn.Conv2d(1, n_maps, (3, 3), padding=1, bias=False)

        for i in range(1, self.n_layers + 1):
            if config["use_dilation"]:
                padding_size = int(2**((i-1) // 3))
                dilation_size = int(2**((i-1) // 3))
                self.layers[f"conv_{i}"] = nn.Conv2d(n_maps, n_maps, (3, 3), padding=padding_size, dilation=dilation_size, bias=False)
            else:
                self.layers[f"conv_{i}"] = nn.Conv2d(n_maps, n_maps, (3, 3), padding=1, bias=False)
            self.layers[f"bn_{i}"] = nn.BatchNorm2d(n_maps, affine=False)

        if "pool" in config:
            self.layers["pool"] = nn.AvgPool2d(config["pool"])

        self.layers["output"] = nn.Linear(n_maps, config["n_labels"])

        self.activations = nn.ModuleDict({
            "relu": nn.ReLU()
        })

    def forward(self, x):
        x = x.unsqueeze(1)
        x = self.layers["conv_0"](x)
        x = self.activations["relu"](x)

        if "pool" in self.layers:
            x = self.layers["pool"](x)

        prev_x = x
        for i in range(1, self.n_layers + 1):
            x = self.layers[f"conv_{i}"](x)
            x = self.activations["relu"](x)

            if i % 2 == 0:
                x = x + prev_x
                prev_x = x

            x = self.layers[f"bn_{i}"](x)

        x = x.view(x.size(0), x.size(1), -1) # shape: (batch, features, o3)
        x = x.mean(2)
        x = self.layers["output"](x)
        return x

It can be trained successfully on my laptop, however, I get MemoryError on the rpi2. I split the wav list thus they all won’t be loaded to the memory at once. (There are almost 1,5 GBs of wav files, rpi2 has only 512 MBs of RAM) I set --num_workers argument to 0. (CPU memory gradually leaks when num_workers > 0 in the DataLoader) However I can’t solve the problem.

Environment:
PyTorch Version: 1.3.0
Python version: 3.7.3
OS: Raspbian GNU/Linux 10 (buster)
CPU: armv7l

What could be the problem?

Thank you so much!

Could you reduce the batch size to a single sample and rerun the code, in case your current batch size is larger?
If that doesn’t help, could you remove the data loading as a debugging step and feed a single sample as x = torch.randn(...) to the model to check the memory usage?

1 Like

Thank you so much for the reply!

I’ve tried with batch_size=1 and got the same error. With a random tensor, the memory usage is 0.004444MB.

This would most likely point to the data loading pipeline using a lot of memory.
If I understand this loader correctly, the wav files are all preloaded and just indexed in the __getiteim__ method?
If so, try to use lazy loading and try to move the actual data loading into the __getitem__ method while the __init__ method gets the paths etc.