I have a small dummy feedforward network defined in PyTorch in which I am making inference like the following -
import torch
import torch.nn as nn
device = torch.device("cpu")
n_input, n_hidden, n_out = 100, 150, 1
batch_size = 5000
data_x = torch.randn(batch_size, n_input)
data_x = data_x.to(device)
def create_model():
hidden_layers = [nn.Linear(n_hidden, n_hidden), nn.ReLU()]
model = nn.Sequential(*([
nn.Linear(n_input, n_hidden), nn.ReLU()] +
hidden_layers * 20 +
[nn.Linear(n_hidden, n_out), nn.Sigmoid()
]))
return model.to(device)
def make_prediction(model, data_x):
return model(data_x)
@profile
def main():
model = create_model()
y_pred = make_prediction(model, torch.randn(batch_size, n_input))
y_pred = make_prediction(model, torch.randn(batch_size, n_input))
y_pred = torch.rand((batch_size, n_out))
main()
I am interested in knowing how much memory each operation takes up. Using memory-profiler (I run python -m memory_profiler main.py), I profile the code and get this result -
Filename: main.py
Line # Mem usage Increment Occurrences Line Contents
=============================================================
26 270.777 MiB 270.777 MiB 1 @profile
27 def main():
28 271.773 MiB 0.996 MiB 1 model = create_model()
29
30 396.586 MiB 124.812 MiB 1 y_pred = make_prediction(model, torch.randn(batch_size, n_input))
31 513.059 MiB 116.473 MiB 1 y_pred = make_prediction(model, torch.randn(batch_size, n_input))
32 513.059 MiB 0.000 MiB 1 y_pred = torch.rand((batch_size, n_out))
Could someone please explain this result? Specifically, I am not sure if I understand why the total memory usage keeps culminating after every inference call. Once the inference is ran, and y_pred computed, why does torch still keeps using around 120 MB?
To check if y_pred itself is not taking up that much memory, I create another random y_pred like array at the end, and as you can see, it’s using almost no memory. All of the ~120MB is used for the intermediate computations for running data_x through the network, and yet, that memory is not released once the computations are completed, and y_pred calculated.
Am I understanding this wrong or does memory-profiler does not work with torch? In order to try and ensure this is not some GPU related issue that memory-profiler cannot track, I am forcing everything to happen on CPU.
Any help is appreciated. Thanks!
EDIT: I tried @torch.no_grad() and torch.cuda.empty_cache() and that does not fix this issue.