Thank you very much in advance for your help.
I want to train an image classifying NN but am running into the following memory Error:
RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 123681636352 bytes. Error code 12 (Cannot allocate memory)
I found this error strange because I am running on an ml.t2.xlarge SageMaker instance (lots of memory). Here’s a bit of context on the code:
I wrote a python script to:
i) load a file list (csv file) into the dataloader
ii) convert input images from TIFF to Numpy to Torch
iii) permute channels (H, W, C vs C, H, W)
iv) crop picture into smaller picture to avoid memory problems (dataset has 380GB of images, each .TIFF images is about ~25MB)
Here is the python script:
class load_csv(Dataset): def __init__(self, csv_file, root_dir, transform=None): self.annotations = pd.read_csv(csv_file) self.root_dir = root_dir self.transform = transform def __len__(self): return len(self.annotations) def __getitem__(self, index): img_path = os.path.join(self.root_dir, self.annotations.iloc[index, 0]) image = torch.from_numpy(tiff.imread(img_path)).permute(2,0,1).float() #Image.MAX_IMAGE_PIXELS = None image.transform = transforms.RandomResizedCrop(224) y_label = torch.tensor(int(self.annotations.iloc[index, 1])) #if self.transform: # image = self.transform(image) return (image, y_label)
Now, I’m running into a memory-related problem that I don’t know how to resolve. I am running a ml.t2.xlarge SageMaker notebook with plenty of memory and images being loaded into the dataloader have supposedly undergone a “transforms.RandomResizedCrop(224)” transformation and should be smaller.
Why am I running into this problem?
Please find project repo with CNN.ipynb and python script csv_loader.py where getitem class is defined for dataloader (here).
Thank you very much again,