Data cached in a global variable for DataLoader is cleared for different epoches!

I am doing a CV project with training data stored in Google Drive and executed on the Google Colab.
For the efficiency of loading images, which are downloaded from Google Drive, I created a cache for the images have been loaded.

Here’s the code:

imgcache = {}

def get_img_by_url(url):
    file_id = url.split('open?id=')[1]
    global imgcache
    if file_id in imgcache:
        return imgcache[file_id]
        downloaded = drive.CreateFile({'id': file_id})
        img_stream =
        imgcache[file_id] = img_stream
        return img_stream 

The get_img_by_url will be called in the getitem of my custom Dataset function.

This function works fine with no forked workers ( num_workers = 0 ), however, with more workers added, I found the “imgcache” got cleared everytime the program enters a new epoch ( the len of imgcache goes to 0 again ).

Can anyone tell me how this happened?

Have a look at this thread for some information on caching with multiple workers using shared arrays.