Using Pytorch model for data augment inside of getitem

Archjbald · March 2, 2023, 10:19am

Hello,

I’m trying to implement a data augmentation method using a deep model. To do so, I initialized the augmentation network in the init of my Dataset class with trained weights and stored it as an attribute.

However, when the code enters the getitem method of the dataset, all the weights and stored Tensor of my augmentation network are reset to 0, as if it was never initialized.

Am I doing something wrong? I want to use the network as the first step of the data augmentation process, so I can’t put it outside of the getitem method.

Thank you for your insights!

ptrblck · March 2, 2023, 11:07am

Your idea seems to work for me using this simple approach:

class MyDataset(Dataset):
    def __init__(self):
        self.model = nn.Linear(10, 10)
        print("Initial weight.abs().sum (): ", self.model.weight.abs().sum())
        self.data = torch.randn(10, 10)
        
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, index):
        print("Inside __getitem__ weight.abs().sum (): ", self.model.weight.abs().sum())
        x = self.data[index]
        x = x.unsqueeze(0)
        with torch.no_grad():
            x = self.model(x)
        return x
        
dataset = MyDataset()
# Initial weight.abs().sum ():  tensor(16.0544, grad_fn=<SumBackward0>)

for data in dataset:
    pass
# Inside __getitem__ weight.abs().sum ():  tensor(16.0544, grad_fn=<SumBackward0>)
# Inside __getitem__ weight.abs().sum ():  tensor(16.0544, grad_fn=<SumBackward0>)
# Inside __getitem__ weight.abs().sum ():  tensor(16.0544, grad_fn=<SumBackward0>)
# Inside __getitem__ weight.abs().sum ():  tensor(16.0544, grad_fn=<SumBackward0>)
# Inside __getitem__ weight.abs().sum ():  tensor(16.0544, grad_fn=<SumBackward0>)
# Inside __getitem__ weight.abs().sum ():  tensor(16.0544, grad_fn=<SumBackward0>)
# Inside __getitem__ weight.abs().sum ():  tensor(16.0544, grad_fn=<SumBackward0>)
# Inside __getitem__ weight.abs().sum ():  tensor(16.0544, grad_fn=<SumBackward0>)
# Inside __getitem__ weight.abs().sum ():  tensor(16.0544, grad_fn=<SumBackward0>)
# Inside __getitem__ weight.abs().sum ():  tensor(16.0544, grad_fn=<SumBackward0>)
# Inside __getitem__ weight.abs().sum ():  tensor(16.0544, grad_fn=<SumBackward0>)


loader = DataLoader(dataset, num_workers=2, batch_size=5)
for data in loader:
    pass
# output is interleaved due to num_workers=2
# Inside __getitem__ weight.abs().sum (): Inside __getitem__ weight.abs().sum ():   tensor(16.0544, grad_fn=<SumBackward0>)tensor(16.0544, grad_fn=<SumBackward0>)
# Inside __getitem__ weight.abs().sum (): Inside __getitem__ weight.abs().sum ():   tensor(16.0544, grad_fn=<SumBackward0>)tensor(16.0544, grad_fn=<SumBackward0>)
# Inside __getitem__ weight.abs().sum (): Inside __getitem__ weight.abs().sum ():   tensor(16.0544, grad_fn=<SumBackward0>)tensor(16.0544, grad_fn=<SumBackward0>)
# Inside __getitem__ weight.abs().sum (): Inside __getitem__ weight.abs().sum ():   tensor(16.0544, grad_fn=<SumBackward0>)tensor(16.0544, grad_fn=<SumBackward0>)
# Inside __getitem__ weight.abs().sum ():  Inside __getitem__ weight.abs().sum (): tensor(16.0544, grad_fn=<SumBackward0>) 
# tensor(16.0544, grad_fn=<SumBackward0>)

Archjbald · March 2, 2023, 11:23am

Thank you for your reply!

After further investigations (I thought I had already tried that but apparently not), my issue seems related to the DataLoader workers. When set to 0, it works perfectly but doesn’t with several workers.

Is there a particular reason why the parallel versions would not properly copy the model’s weights?

ptrblck · March 2, 2023, 8:37pm

Multiple workers are fine in my code and I don’t know why you are seeing the issue.
Is my code also breaking in your environment?

Archjbald · March 2, 2023, 10:00pm

I might have figured out the issue: it is caused by the fact that my augmentation network was on GPU. Your code works perfectly fine until I transfer the Linear layer to the GPU.
Does this mean I have to perform either on a CPU or with no parallel data worker? I don’t know if it is an expected behavior, I don’t know much about parallel computing.

Thank you again for your help.

Using Pytorch model for data augment inside of __getitem__

Using Pytorch model for data augment inside of getitem