Does PyTorch ignore zero weights during forward?

As title says, does PyTorch perform some kind of under-the-hood optimizations for zeroed-out parameters in a network’s layers? I.e., would the forward() method of a normal network be slower than the same architecture but with some of its weights set to 0?


No we don’t :slight_smile:
Checking for the content of a Tensor is usually more expensive in most cases than the potential gain you could get from it here.

1 Like

Thank you very much!