Let me start off this post by saying I am a newbie with deep learning and pytorch.
I am trying to train a DenseNet instance on a dataset of around 2500 images. Currently, I am parsing them from my disk and loading them into main memory. I have 64GB of main memory and 8GB of GPU memory. I figure since main memory is cheap, it would be faster to load them into main memory at constructin time, then perform an SSD access operation when getitem is called. However, when I try loading 1 batch, the GPU suddenly allocates around 5-6GB of memory and I run out of GPU memory. When I instantiate my dataset object, I also realize my main memory usage goes up around 6-7 GB. Thus, I have the following questions.
-
What is the common convention when making a dataset based on folders of images? Do you only create the tensor in
__getitem__
, even if the dataset fits in main memory? -
When I have a large 6-7 GB tensor, get a reference to a small subset of it, and call .to(cuda), will the entire tensor be copied to the GPU?
EDIT: I have just modified my dataset to only load images into memory when __getitem__
is called, but I still have large GPU memory usage. I checked line by line with the debugger, and the memory usage spikes when I call model(x)
. Is DenseNet just a memory hog after an input is passed through?
class TrashNetDataset(torch.utils.data.Dataset):
def __init__(self, basedir: str = ""):
self.basedir = basedir
self.glass_list = [(x, GLASS) for x in self.parseImages("glass/*")]
self.paper_list = [(x, PAPER) for x in self.parseImages("paper/*")]
self.cardboard_list = [(x, CARDBOARD) for x in self.parseImages("cardboard/*")]
self.plastic_list = [(x, PLASTIC) for x in self.parseImages("plastic/*")]
self.metal_list = [(x, METAL) for x in self.parseImages("metal/*")]
self.trash_list = [(x, TRASH) for x in self.parseImages("trash/*")]
self.image_list = self.glass_list + self.paper_list + self.cardboard_list + self.plastic_list + self.metal_list + self.trash_list
self.data_len = len(self.image_list)
def __len__(self):
return self.data_len
def __getitem__(self, index):
return torch.Tensor(self.image_list[index][0]), self.image_list[index][1]
def parseImages(self, path: str):
return [numpy.asarray(Image.open(x))/255.0 for x in glob.glob(join(self.basedir, path))]