My question is simple. Why replacing the fused modules by
nn.Identity rather than simply deleting them in the function
fuse_known_modules (see code)?
The profiler shows that it introduces some non-negligible delay and kinds of suppresses the performance improvement gained by fusion. As you can see in the tracing, the overhead of the two
nn.Modules: Identity is almost equivalent of the latency of
aten::batch_norm in the original fp32 model, which cancels the improvement done by fusing conv1d with batch_norm.
- usually people either jit script the model or export it to get a final maximally performant version. This would remove the nn.Identity
- if you delete random modules your model will error.
if you have a model like
self.lin = nn.Linear
self.bn = nn.BatchNorm
y = self.bn(self.lin(x))
deleting self.bn doesn’t just remove the identity since the model doesn’t know where to send the activation after self.lin
- it makes debugging a lot easier. We normally debug quantization by comparing activation for each module output for both the quantized and non-quantized model. If you do a fusion like
location 1: A
location 2: B
location 1: A+B
location 2: identity
then you can compare the output of location 2 for both models and get an apples to apples comparison. If you compare location 1, for the top case, you’d get the output of A while for the bottom case you’d get the output of B. If you delete location 2 there’s no place that has the same location and same operations.