Memory fragmentation

Hello,

I am dealing with memory fragment when using dataloader.
Following is my __getitem__ function:

    def __getitem__(self, idx):

        # Get token
        token = self.tokens[idx]

        # Get drawing
        drawing = self.drawings[idx]
        stroke = stroke_data(drawing).astype(np.float32)
        # stroke = np.expand_dims(stroke, axis=0)
        stroke = np.transpose(stroke, (1, 0)).astype(np.float32)

        drawing = eval(drawing)
        label = self.labels[idx]

        # Plot the image
        img, img_gray = drawing_to_image(drawing, IMG_SIZE, IMG_SIZE)

        if self.transform:
            img = self.transform(image=img)["image"]
            img = np.transpose(img, (2, 0, 1)).astype(np.float32)
            # img = np.expand_dims(img, 0).astype(np.float32)

            img_gray = self.transform(image=img_gray)["image"]
            img_gray = np.expand_dims(img_gray, 0).astype(np.float32)

        return {
            "image": img,
            "image_gray": img_gray,
            "stroke": stroke,
            "token": token,
            "targets": label
        }

Environment:

  • Ubuntu 16.04 - AMD Ryzen 2700
  • CUDA 9.0, CuDNN 7.0.

I am running by the following options:

  • num_workers=4
  • shuffle=True
  • batch_size=128

The problem is:
My RAM increases every batch and it is released when an epoch ends.

I tried following:

  1. set num_workers=0
  2. Change
    drawing = self.drawings[idx] to drawing = self.drawings[0]
  3. Using gc.collect every min-batch or every 500 mini batch

(1) and (3) can help, but it makes my model running too slow.

I could not figure it out. Do you have any solutions for it?

Thank you.