Does forward hook for simply extracting features for FPN delay time?

hot-couscous · September 20, 2021, 5:23pm

Hello.
While implementing SSD series, I found that their extracting features is quite tricky. Most of implementations just modifiy forward method, but in my opinion, it would be better using forward hook.

So, with simple experiment on CPU, I found it does not affect on parameters and gradients of model.
But there was about 20 seconds delay compared to ‘forward method way’, which took 350 seconds.
(Considering 60000 inputs of MNIST, 20 / 60000 s is very short time, but anyway there was a delay)

I registered forward hook only once at Model class, and the only function applied to register_forward_hook was python-list-append for 3 tensors.

But it’s just an experimental conclusion, and I’m still not clear whether to use hook or not for running model.
If anyone knows if the register forward hook affects on speed of model running and why, or other reason of not recommending using hook, please help me.

ptrblck · September 21, 2021, 2:47am

I think it depends on your use case which approach to take.
I.e. in case you are rewriting the model architecture and would like to constantly return intermediates, I would probably stick to overriding the forward method, since it would also allow you to script the model (forward hooks are unsupported right now in torch.jit). It could also make the overall model architecture clearer, as new users wouldn’t need to check if and which layers are using hooks.
On the other hand, I’m using hooks most of the time as I’m usually more interested in a quick way to debug issues etc.

In any case, I haven’t seen a slowdown yet (but also didn’t profile it) so could you post a minimal code snippet showing your usage and the slowdown, please?

hot-couscous · September 21, 2021, 5:36am

Thank you for answering my question.
Here is my code.

class FeatureExtractor(nn.Module):
    def __init__(self, model: nn.Module, hook_layers: List[str]):
        super(FeatureExtractor, self).__init__()
        self.model = model
        self.hook_layers = hook_layers

        for name, module in self.model.named_modules():
            if name in self.hook_layers:
                module.register_forward_hook(self.extract())

        self.features = []

    def extract(self):
        def _extract(module, f_in, f_out):
            self.features.append(f_out)
        return _extract

    def forward(self, input):
        _ = self.model(input)
        return self.features

And the simple experiment was like below

The backbone of model is very lightly modified MobileNet V1, and three features from it are feed to only the classifier of RetinaNet.
The loss functiion is CrossEntropyLoss of nn, and the optimizer is Adam.
Training was done only 1 epoch on MNIST dataset with 32 batches.

As a result, training with modifiying forward method took about 350 seconds on average, and the ‘hook-way’ took 370 seconds on average.

But since these are run on the laptop’s CPU, it couldn’t be stable, I think.

ptrblck · September 21, 2021, 6:32am

Based on your code snippet it seems you are directly appending all intermediate features without detaching them, so I would assume to see an increase in memory usage in each iteration.
If that’s the case I would also expect to see a slowdown as you would need to allocate memory during the training. Could you check this behavior?

hot-couscous · September 21, 2021, 9:10am

About memory problem caused by attending, I forgot that I already included the line

self.backbone.features.clear()

at main model’s forward method.

By additional experiments, I found that it is difficult to conclude that the usage of hook causes valid delay, because; first experiments using hooks appear faster in many cases, and second time difference between two methods is very small in every case.

About memory, it will depend on the training conditions such as the model or dataset, but in my condition, ‘forward-method’ spends 156438 B on average and ‘hook-method’ spends 157681.5 B for whole 1 epoch training, so hook causes about 1KiB more memory.

Anyway, I appreciate you again and I’ll load the final code.

class FeatureExtractor(nn.Module):
    def __init__(self, model: nn.Module, hook_layers: List[str]):
        super(FeatureExtractor, self).__init__()
        self.model = model
        self.hook_layers = hook_layers

        for name, module in self.model.named_modules():
            if name in self.hook_layers:
                module.register_forward_hook(self.extract())

        self.features = []

    def extract(self):
        def _extract(module, f_in, f_out):
            self.features.append(f_out)

        return _extract

    def forward(self, input):
        self.features.clear()
        _ = self.model(input)

        return tuple(self.features)