Hello,
I am dealing with memory fragment when using dataloader.
Following is my __getitem__
function:
def __getitem__(self, idx):
# Get token
token = self.tokens[idx]
# Get drawing
drawing = self.drawings[idx]
stroke = stroke_data(drawing).astype(np.float32)
# stroke = np.expand_dims(stroke, axis=0)
stroke = np.transpose(stroke, (1, 0)).astype(np.float32)
drawing = eval(drawing)
label = self.labels[idx]
# Plot the image
img, img_gray = drawing_to_image(drawing, IMG_SIZE, IMG_SIZE)
if self.transform:
img = self.transform(image=img)["image"]
img = np.transpose(img, (2, 0, 1)).astype(np.float32)
# img = np.expand_dims(img, 0).astype(np.float32)
img_gray = self.transform(image=img_gray)["image"]
img_gray = np.expand_dims(img_gray, 0).astype(np.float32)
return {
"image": img,
"image_gray": img_gray,
"stroke": stroke,
"token": token,
"targets": label
}
Environment:
- Ubuntu 16.04 - AMD Ryzen 2700
- CUDA 9.0, CuDNN 7.0.
I am running by the following options:
- num_workers=4
- shuffle=True
- batch_size=128
The problem is:
My RAM increases every batch and it is released when an epoch ends.
I tried following:
- set
num_workers=0
- Change
drawing = self.drawings[idx]
todrawing = self.drawings[0]
- Using
gc.collect
every min-batch or every 500 mini batch
(1) and (3) can help, but it makes my model running too slow.
I could not figure it out. Do you have any solutions for it?
Thank you.