What if the image on the disk changes when we run DataLoader

Hi there,

Just curios what will happen if the image on the disk changes when we run DataLoader to traverse the dataset. Will the loaded data be the latest file on the disk or otherwise.

Thanks!

Hi Phantom!

What will happen is whatever your code decides to do.

A torch.utils.data.DataLoader is a wrapper that lets you iterate
over a Dataset. Per the documentation, a Dataset is an abstract
class that you have to implement concretely. In particular, you have
to implement its __getitem__() method.

If your implementation of __getitem__() (or one you get from some
third-party code) rereads the image from the disk when called, you
will fetch the latest version of the image from the disk. But if
__getitem__() fetches the item from a cache (for example from a
cache that was built when the Dataset was instantiated), then you’ll
get the “old” version of the image.

Typically with images – which can be large – a Dataset will scan
the image directories when constructed, most likely building a list
of image pathnames, but not actually load the images into memory.
It will then read a specific image from disk when __getitem__()
is called (so you would get the “new” image). But it’s up to you how
you write your code.

(And just to be clear, DataLoader neither knows nor cares how its
Dataset fetches its data items. Whether you get the new or old
image is solely up to Dataset.)

Best.

K. Frank

1 Like