Should many intermediate computations be left in the forward propagation of the neural network?

Nguyen_Duy · September 8, 2022, 8:27am

Hello everyone,
I have a question as follows:
I need to use a temporary variable during the transition of an artificial neural network. Specifically, assuming the forward propagation of that neural network has an input of x, I use a variable y transformed from x over 3-4 operations in the forward function. Should I do that? Or should I create the variable y right from the time of sample generation.

def forward(self, x, mask):
  z = self.linear1(x)
  # Some computations create y from x (examples)
  total = torch.sum(x, dim=(1, 2))
  total = total * mask
  y = x / total
  # Continue forward propagation
  result = z + self.linear2(y)
  return result

or

# Create y from sampling 
def forward(self, x, mask, y):
  z = self.linear1(x)
  result = z + self.linear2(y)
  return result

I want to ask is there any difference between the above two ways about:

GPU memory used when training
Size of the model
Sorry if my expression is not good. Thanks for any help