Hi,
I have a doubt about how batches are selected in some situations.
Let say I defined a data loader with:
train_sampler = SubsetRandomSampler(indeces)
...
train_loader = torch.utils.data.DataLoader(train_data, batch_size = bs ,
sampler = train_sampler, num_workers = nw)
1. Dataloader Iterables
If I well understood at this point with Dataloader I wrap an iterable around the Dataset
to enable easy access to the samples; in particular due to SubsetRandomSampler
every time the iterable is called, the elements of train_sampler
are reshuffled and a new sequence of batches is defined.
If I’d have used SequentialSampler
to define the sampler, every time I’d get the same sequence:
- Have I understood well this?
.
2. Iter() and next(): sequence of batches
Now If I write:
for data,label in train_loader:
#do stuff
It calls the __iter__()
method on the iterable, and then calls __next__()
on the returned iterator until it reaches the end of the iterator. It then raises a stopIteration
and the loop stops. (see source)
In other words it should be equivalent to:
iterable = iter(train_loader)
try:
img, lab = next(iterable)
#do stuff
where the line iterable = iter(train_loader)
fixes the sequence of batches for the iterable and with next()
we call them one by one.
If instead I write:
try:
img, lab = next(iter(train_loader))
except StopIteration:
img, lab = next(iter(train_loader))
By calling iter()
each time I create a new list of batches over which iterate time by time taking always only the first element of that and in particular:
-
If I defined a DataLoader by
SequentialSampler
the sequence of batches will be always the same and so I will pick withimg, lab = next(iter(train_loader))
always the same batch -
If I defined a DataLoader by
SubsetRandomSampler
the sequence of batches will be reshuffled every time I callimg, lab = next(iter(train_loader))
and a new composed batch (in principle with same element chosen multiple times before every one has been selected ) will be picked each time
- Have I well understood the
next()
,iter()
mechanism?
3. Nested iteration
If what said above is right I expect that I can define multiple batches’ sequence and scroll over them in parallel, for example:
iterable1 = iter(train_loader)
iterable2 = iter(train_loader)
try:
img, lab = next(iterable1) #select the first batch of iterable1
try:
img, lab = next(iterable2) #select the first batch of iterable2
I expect that with ‘iterable1’ and ‘iterable2’ I init 2 different sequence of batches and that img, lab = next(iterable1)
, img, lab = next(iterable2)
will follow independently the 2 iterable
- Is it correct?