Accessing ImageFolder() imgs after using ConcatDataset()

bluAsterisk · September 30, 2021, 12:35am

Hello,

I have a question about how I can access the images from the call to ImageFolder() after feeding it into ConcatDataset(). This does seem to execute fine, but the ConcatDataset object doesn’t have an attribute for imgs that ImageFolder() has.

Here is the code I am using to setup the training data, the commented line is what I originally used. Now I am trying to take two datasets of images.

size = (224, 224)
data_transform = ImageTransform(size)
# train_data = torchvision.datasets.ImageFolder(root=train_dir, transform=data_transform.train_transform)

train_dataA = torchvision.datasets.ImageFolder(root=train_dirA, transform=data_transform.train_transform)
train_dataB = torchvision.datasets.ImageFolder(root=train_dirB, transform=data_transform.train_transform)
train_data = ConcatDataset([train_dataA, train_dataB])

val_data = torchvision.datasets.ImageFolder(root=val_dir, transform=data_transform.val_transform)
train_loader = torch.utils.data.DataLoader(train_data,
                                          batch_size=batch_size,
                                          shuffle=True)
val_loader = torch.utils.data.DataLoader(val_data,
                                          batch_size=batch_size,
                                          shuffle=True)
dataloaders_dict = {'train': train_loader, 'val': val_loader}

In another block I try and access the train_data.imgs, but it gives me an error of

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-30-e1395bae7815> in <module>()
----> 1 weights = make_weights_for_balanced_classes(train_data.imgs, carClasses) #len(train_data.classes)
      2 weights[1] /= 2
      3 weights = torch.FloatTensor(weights)
      4 print(weights)

AttributeError: 'ConcatDataset' object has no attribute 'imgs'

Thanks in advance!

ptrblck · September 30, 2021, 1:02am

You would have to access the internal attribute through the .datasets attribute as seen here:

# setup
datasetA = TensorDataset(torch.randn(10, 1), torch.randn(10, 1))
datasetB = TensorDataset(torch.randn(10, 1), torch.randn(10, 1))

# access internal attribute
datasetA.tensors[0].shape

# concat
dataset = ConcatDataset((datasetA, datasetB))

# access attribute through the internal datasets
dataset.datasets[0].tensors[0].shape

bluAsterisk · September 30, 2021, 2:30am

Thanks for the reply!
I tried your approach like so

# setup
train_dataA = torch.utils.data.TensorDataset(torch.randn(10, 1), torch.randn(10, 1))
train_dataB = torch.utils.data.TensorDataset(torch.randn(10, 1), torch.randn(10, 1))

# access internal attribute
train_dataA.tensors[0].shape

# concat
train_data = torch.utils.data.ConcatDataset((train_dataA, train_dataB))

# access attribute through the internal datasets
train_data.datasets[0].tensors[0].shape

But it seems to be having issues in the defined function here

def make_weights_for_balanced_classes(images, nclasses):                        
    count = [0] * nclasses                                                      
    for item in images:                                                         
        count[item[1]] += 1                                                     
    weight_per_class = [0.] * nclasses                                      
    N = float(np.max(count))                                                   
    for i in range(nclasses):                                                   
        weight_per_class[i] = N/float(count[i])                                 
#     weight = [0] * len(images)                                              
#     for idx, val in enumerate(images):                                          
#         weight[idx] = weight_per_class[val[1]]                                  
    return weight_per_class

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-49-93f1a7a4ed45> in <module>()
----> 1 weights = make_weights_for_balanced_classes(train_data.datasets[0].tensors[0].shape, carClasses) #len(train_data.classes)
      2 weights[1] /= 2
      3 weights = torch.FloatTensor(weights)
      4 print(weights)

<ipython-input-42-7fa267b3c6b9> in make_weights_for_balanced_classes(images, nclasses)
      2     count = [0] * nclasses
      3     for item in images:
----> 4         count[item[1]] += 1
      5     weight_per_class = [0.] * nclasses
      6     N = float(np.max(count))

TypeError: 'int' object is not subscriptable

Sorry if I misinterpreted your code, was there something that I missed?
I’m still quite new to PyTorch, I need to get familiar with all the library functions.

ptrblck · September 30, 2021, 5:31am

My example code was accessing a random attribute to show that you would need to use the .datasets[index] approach to use it after wrapping the datasets into the ConcatDataset.
In your current code snippet you are passing the shape of the underlying tensors to the method, which sounds wrong so just manipulate your code from:

train_data.imgs

to

train_data.datasets[index].imgs

enterthevoidf22 · January 16, 2022, 6:08pm

i have a similar situation and i want to be able to access an attribute programatically whether it is in ConcatDataset or not.

is it possible?

i was thinking about defining a parent class and modifying it’s get attribute function such that it checks whether the child class is a ConcatDataset object or Dataset object and modifies the get attribute class in accordance.

how would you go about doing that?