Memory problem with dataloader

Hello,

I have a problem I am seeing for the first time. It does not relate to gpu memory, since it occurs even if I use my dataloader code in an empty enumerate(loader) loop.

The code is this:

original_image_sequence = self.img_dict[self.index_dict[index]][0]
saliency_image_sequence = self.img_dict[self.index_dict[index]][1]

    label_sequence = self.img_dict[self.index_dict[index]][2]

    img_ = []
    img_s_ = []
    for i in range(len(original_image_sequence)):
        img = self.tun(PIL.Image.open(original_image_sequence[i]))
        img.unsqueeze_(0)
        img_.append(img)
        img_s = self.tun(PIL.Image.open(saliency_image_sequence[i]))
        img_s.unsqueeze_(0)
        img_s_.append(img_s)

    ori_tensor = torch.cat(img_)
    sal_tensor = torch.cat(img_s_)
    return ori_tensor, sal_tensor, label_sequence

If I do this and return something random like

return torch.Tensor(5)

then there is no memory problem. However when I return the tensors, after every epoch, my “modified” memory keeps increasing without bound (its contents have to be written to disk), while used memory stays the same.

I have tried using gc for garbage collection or just del to delete the references, but it is weird since I am not referencing anything outside the function but it still keeps updating the memory size.

Is this code in your __init__ or __getitem__ function?
If is the __init__ function, of cource the problem appears.

This code is in the get_item of the dataloader.

My desktop computer should be fine, but it has trouble especially with a lower number of workers. If I increase the number, the problem is diminished but still there. The same code works fine in a dedicated server environment.