I have a trained and pruned neural network in pytorch. after pruning it, i expected the inference time to be reduced but it doesn’t seem to be, so can someone please suggest me a way to do that?
basically what i want is that the zeroed out weights should not be computed in the forward pass of the graph so that the inference time is reduced.
thanks @Soumya_Kundu, that post is about the same problem i have mentioned here, but it doesn’t contain a clear solution on how I can achieve this in pytorch.