Is there a way to make the forward pass through a network faster?

Assuming I do not need to do backprop, is it possible to do a forward pass faster? I understand that it beats the purpose of creating a computational graph and having an autoGrad engine. I’m not a computer scientist, so I just want to know, whether its possible? For example while running in model.eval mode. Is it viable to have a different algorithm to do the forward pass when backprop is not required?

You can wrap the forward pass into with torch.no_grad() to avoid storing the intermediate activations and create the computation graph. However, this shouldn’t change the used operations.

If you are using a GPU, you could use torch.backends.cudnn.benchmark = True to let cudnn benchmark all workloads and select the fastest algorithm. Note that the first iteration will observe an overhead due to the profiling.