Hi there,
Just curios what will happen if the image on the disk changes when we run DataLoader to traverse the dataset. Will the loaded data be the latest file on the disk or otherwise.
Thanks!
Hi there,
Just curios what will happen if the image on the disk changes when we run DataLoader to traverse the dataset. Will the loaded data be the latest file on the disk or otherwise.
Thanks!
Hi Phantom!
What will happen is whatever your code decides to do.
A torch.utils.data.DataLoader
is a wrapper that lets you iterate
over a Dataset
. Per the documentation, a Dataset is an abstract
class that you have to implement concretely. In particular, you have
to implement its __getitem__()
method.
If your implementation of __getitem__()
(or one you get from some
third-party code) rereads the image from the disk when called, you
will fetch the latest version of the image from the disk. But if
__getitem__()
fetches the item from a cache (for example from a
cache that was built when the Dataset
was instantiated), then you’ll
get the “old” version of the image.
Typically with images – which can be large – a Dataset
will scan
the image directories when constructed, most likely building a list
of image pathnames, but not actually load the images into memory.
It will then read a specific image from disk when __getitem__()
is called (so you would get the “new” image). But it’s up to you how
you write your code.
(And just to be clear, DataLoader
neither knows nor cares how its
Dataset
fetches its data items. Whether you get the new or old
image is solely up to Dataset
.)
Best.
K. Frank