Hello everyone,

I have a question as follows:

I need to use a temporary variable during the transition of an artificial neural network. Specifically, assuming the forward propagation of that neural network has an input of x, I use a variable y transformed from x over 3-4 operations in the forward function. Should I do that? Or should I create the variable y right from the time of sample generation.

```
def forward(self, x, mask):
z = self.linear1(x)
# Some computations create y from x (examples)
total = torch.sum(x, dim=(1, 2))
total = total * mask
y = x / total
# Continue forward propagation
result = z + self.linear2(y)
return result
```

or

```
# Create y from sampling
def forward(self, x, mask, y):
z = self.linear1(x)
result = z + self.linear2(y)
return result
```

I want to ask is there any difference between the above two ways about:

- GPU memory used when training
- Size of the model

Sorry if my expression is not good. Thanks for any help