As title says, does PyTorch perform some kind of under-the-hood optimizations for zeroed-out parameters in a network’s layers? I.e., would the forward() method of a normal network be slower than the same architecture but with some of its weights set to 0?
No we don’t
Checking for the content of a Tensor is usually more expensive in most cases than the potential gain you could get from it here.
Thank you very much!