For some reason the code below seems to have a memory leak. I can’t seem to figure out the exact cause, but it only occurs when using the L-BFGS optimizer.
from typing import Tuple
"""Simple dataset for collected samples."""
def __init__(self, data: torch.Tensor, labels: torch.Tensor) -> None:
self.data = [data[i].clone() for i in range(data.shape)]
self.labels = [labels[i].clone() for i in range(labels.shape)]
def __getitem__(self, idx: int) -> Tuple[torch.Tensor, int]:
return self.data[idx], self.labels[idx]
def __len__(self) -> int:
def train_model(model: torch.nn.Module, dataloader, device: torch.device, num_epochs: int = 5):
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.LBFGS(
model.parameters(), lr=1.0, max_iter=1, tolerance_change=-1, tolerance_grad=-1
for n in range(num_epochs):
print("Epoch", n + 1)
for inputs, labels in dataloader:
inputs, labels = inputs.to(device), labels.to(device)
def closure() -> torch.Tensor:
output = model(inputs)
loss = criterion(output, labels)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
activation_samples = torch.randn(363, 2560, 9, 9)
activation_labels = list(range(0, 121)) * 3
activation_labels = torch.as_tensor(activation_labels)
sample_data = activation_samples.reshape(activation_samples.shape, -1).double()
# Setup dataset
sample_dataset = SampleDataset(sample_data.cpu(), activation_labels.cpu())
dataloader = torch.utils.data.DataLoader(
sample_dataset, batch_size=8, num_workers=0, shuffle=True
model = torch.nn.Linear(sample_data.shape, 121, bias=False).to(device).double()
model = model.train()
train_model(model, dataloader, device)
I don’t think this is a memory leak, but the expected high memory requirement of
From the docs:
This is a very memory intensive optimizer (it requires additional
param_bytes * (history_size + 1) bytes). If it doesn’t fit in memory try reducing the history size, or use a different algorithm.
By default a
100 is used so:
model.weight.nelement() * 4 / 1024**3 * 101
~9.44 GB would be needed additionally to the “standard” memory usage.
10 uses ~6.8GB in my setup.
@ptrblck The code is running out of memory with 16GB of GPU memory. I also think that I did have it working at first, before I changed something slightly and ended up with the out of memory error, so I know it’s possible on my system.
The error message when it crashes for me is:
CUDA out of memory. Tried to allocate 192.00 MiB (GPU 0; 14.76 GiB total capacity; 13.32 GiB already allocated; 41.75 MiB free; 13.41 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I’ve tested with PyTorch versions:
torch 1.10.0+cu111 &
The original code is also running OOM on a 40GB A100, so unless you changed the
history_size or the model itself, I wouldn’t know how you could fit it into the 16GB.
Is the script still running OOM after setting
Thanks for testing with a 40GB device! I’m now thinking that my “successful” test could have used SGD (as I had the option for it as well) and the notebook history didn’t record that for some reason.
I had another similar function that uses far fewer classes, and it works with L-BFGS, but I broke it around the same time as it just barely fits in memory. I think that confused me a bit as the issue with that one was that I needed to be more careful about creating & storing tensors.