I’m trying to train a network on Colab, but I have a problem of memory.
The training cannot start because I obtain the following message:
RuntimeError: DataLoader worker (pid 12945) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.
I’m new to PyTorch and Colab and I’m not sure the problem is really the size of the data or maybe something else in the code.
I use a dataset of 47721 images, about 3.25 GB.
I create three dataloader:
- training 60%
- validation 20%
- test 20%
For training I use minbatch of size 32.
I use the free version of Colab, which has about 12 GB of RAM. When I start the runtime about 5 GB are already occupied, but 7 are free.
As model I use a pretrained GoogLeNet.
I’m not sure if maybe I’m doing something wrong when I create the dataloader, here below you can find the code:
def getDataLoader(dataset, batchSize=BATCH_SIZE, shuffle=False, dropLast=False):
print('Splitting dataset into train and validation datasets...')
trainDs, validDs = randomSplitDataset(dataset)
validDataLoader = DataLoader(validDs,
batch_size=(len(dataset)
if batchSize is None
else batchSize),
shuffle=shuffle,
num_workers=WORKERS,
drop_last=dropLast)
trainDataLoader = DataLoader(trainDs,
batch_size=(len(dataset)
if batchSize is None
else batchSize),
shuffle=shuffle,
num_workers=WORKERS,
drop_last=dropLast)
return trainDataLoader, validDataLoader
Please let me know if there is something else I can share, in order to understand if the problem is my code or really the size of the data.