I am doing a CV project with training data stored in Google Drive and executed on the Google Colab.
For the efficiency of loading images, which are downloaded from Google Drive, I created a cache for the images have been loaded.
Here’s the code:
imgcache = {}
def get_img_by_url(url):
file_id = url.split('open?id=')[1]
global imgcache
if file_id in imgcache:
return imgcache[file_id]
else:
downloaded = drive.CreateFile({'id': file_id})
downloaded.FetchContent()
img_stream = Image.open(downloaded.content)
imgcache[file_id] = img_stream
return img_stream
The get_img_by_url will be called in the getitem of my custom Dataset function.
This function works fine with no forked workers ( num_workers = 0 ), however, with more workers added, I found the “imgcache” got cleared everytime the program enters a new epoch ( the len of imgcache goes to 0 again ).
Can anyone tell me how this happened?