[resolved] Why would a transform be called 3 times with a batch size of 1?

I’m trying to work thru the process of building a zca transform and I discovered this oddity. Maybe my data is bad but they’re just jpegs. I can’t share them because they’re “owned” by the Kaggle competition.

EDIT: I just tried the same thing on the cats_and_dogs training set with the same result. So at least thing are consistent.

In fact, the transformation seems to be called 3 times for each item in a batch. If I change the batch size to 2 it gets called 6 time, and for a batch of 3, 9 times, etc. The even odder part is that some of the reference addresses are the same (see the output below).

I am seriously confused.

Here’s a minimal program followed by some output.

import torch.utils as utils
from torchvision import datasets, transforms
from torchvision.datasets.folder import ImageFolder

class testTransform:
    def __call__(self, img):
        print(img)
        return img

train_dataset = datasets.ImageFolder(
    'train',
    transforms.Compose([
        testTransform(),
        transforms.ToTensor(),
    ]))

train_loader = utils.data.DataLoader(
    train_dataset,
    batch_size=1,
    num_workers=1)

inputs, classes = next(iter(train_loader))
print(inputs.size())

The output follows.

<PIL.Image.Image image mode=RGB size=312x312 at 0x7FDE42042128>
<PIL.Image.Image image mode=RGB size=312x312 at 0x7FDE42008128>
torch.Size([1, 3, 312, 312])
<PIL.Image.Image image mode=RGB size=312x312 at 0x7FDE420080F0>

The output for a batch of 3. Notice the duplicated references

<PIL.Image.Image image mode=RGB size=312x312 at 0x7FDE41F61EF0>
<PIL.Image.Image image mode=RGB size=312x312 at 0x7FDE41F61EF0>
<PIL.Image.Image image mode=RGB size=312x312 at 0x7FDE41F61EF0>
torch.Size([3, 3, 312, 312])
<PIL.Image.Image image mode=RGB size=312x312 at 0x7FDE90DC2550>
<PIL.Image.Image image mode=RGB size=312x312 at 0x7FDE90DC2550>
<PIL.Image.Image image mode=RGB size=312x312 at 0x7FDE90DC2550>
<PIL.Image.Image image mode=RGB size=312x312 at 0x7FDE90DC2518>
<PIL.Image.Image image mode=RGB size=312x312 at 0x7FDE90DC2518>
<PIL.Image.Image image mode=RGB size=312x312 at 0x7FDE90DC2518>

Python 3.6.1
torch 0.1.12.post2
torchvision 0.1.8
Ubuntu 17.04 x64

Dumb mistake: Set num_workers=0 will take care of this. After looking in the code, I understand how the worker threads work now.