Bug report
-
Summary of problem: I’ve been encountering a steady increase in CPU RAM memory while using a PyTorch
DataLoader
. Only loading the data leads to an increase in CPU RAM which eventually crashes the notebook. Please note that I am actually using GPU and not CPU, hence my device iscuda
. This notebook’storch
version is:
-
torch @ file:///tmp/torch/torch-1.13.0-cp37-cp37m-linux_x86_64.whl
.
A very similar notebook which had not problem used torch
:
torch @ file:///tmp/torch/torch-1.11.0-cp37-cp37m-linux_x86_64.whl
-
Steps to reproduce:
Simple Dataset:
from albumentations import Normalize
from albumentations.pytorch import ToTensorV2
class CustomDataset(Dataset):
def __init__(self, dataframe, train, valid):
self.dataframe = dataframe
self.train = train
self.valid = valid
if self.train:
self.transform = Compose([
Normalize(p=1),
ToTensorV2(p=1)])
if self.valid:
self.transform = Compose([
Normalize(p=1),
ToTensorV2(p=1)])
def __len__(self):
return len(self.dataframe)
def __getitem__(self, index):
idx = self.dataframe['image_id'][index] # Select image id
image_path = self.dataframe['path'][index] # Get image path
image = cv2.imread(image_path) # Read image
image = self.transform(image=image) # Apply transforms
image = image['image'] # Extract image from dictionary
return (idx, image)
Loop:
for fold, (train_index, valid_index) in enumerate(kf.split(train_df)):
train_data = train_df.iloc[train_index].reset_index(drop=True)
train = CustomDataset(train_data, train=True, valid=False)
train_loader = DataLoader(train,
batch_size=config.BATCH_SIZE_TRAIN,
shuffle=True,
num_workers=config.NUM_WORKERS,
drop_last=True)
epochs = config.EPOCHS
for epoch in range(epochs):
for step, (idx, image) in enumerate(train_loader):
image = torch.tensor(image, device=device, dtype=torch.float32)
After a couple hundred steps the notebook crashes and RAM is 100% used. It does not even complete one epoch.