I have a question as follows:
I need to use a temporary variable during the transition of an artificial neural network. Specifically, assuming the forward propagation of that neural network has an input of x, I use a variable y transformed from x over 3-4 operations in the forward function. Should I do that? Or should I create the variable y right from the time of sample generation.
def forward(self, x, mask): z = self.linear1(x) # Some computations create y from x (examples) total = torch.sum(x, dim=(1, 2)) total = total * mask y = x / total # Continue forward propagation result = z + self.linear2(y) return result
# Create y from sampling def forward(self, x, mask, y): z = self.linear1(x) result = z + self.linear2(y) return result
I want to ask is there any difference between the above two ways about:
- GPU memory used when training
- Size of the model
Sorry if my expression is not good. Thanks for any help