Writing Custom dataloader for Cycle Gans in pytorch

CS.Enthu · May 19, 2020, 3:04pm

I want to implement cycle GANs . I have preprocessed my datasets consisting of two different modality images.After preprocessing , Since I want to send image patches to my generator model, I exctracted the image patches using ‘unfold’ and saved them as .pt files . The size after extracting the patches are torch.Size([7, 4, 4, 64, 64, 64]) , which is a total of 112 patches of size 64 x 64 x64 Now how should I write my own dataset loader for these .pt files which are stored in the memory as tensors? Thanks in advance

ptrblck · May 20, 2020, 7:32am

If all image patches fit into memory, you could create a large tensor and directly index them.
If that’s not the case, you might create a custom Dataset for each stored file and e.g. lazily load the data in its __getitem__.

Assuming you don’t want to shuffle the dataset, you could also use a modulo operation to check, which file to load and which patch should be loaded via this pseudo code:

def __getitem__(self, index):
    file_idx = index // 112
    patch_index = index % 112
    # check of correct file is already open
    if self.loaded_data_path != self.data_path[file_idx]:
        # load file
    patch = self.loaded_data[patch_index]
    return patch

Note that shuffling the data (and thus the index) will open and close a lot of files, which might hurt the performance.

CS.Enthu · May 22, 2020, 1:18pm

Thank you .
I have one more basic question.
Now that I have stored my patches. How should I visualize each of the patches?
the patch size is 112, 64, 64, 64

Sorry, since I am a beginner !

ptrblck · May 22, 2020, 11:14pm

You could use a nested loop to iterate each sample (dim0) and each channel (dim1) to visualize the 64x64 patch using matplotlib.pyplot.imshow.

CS.Enthu · May 25, 2020, 9:17pm

Hello,

I extracted the image patches and wrote dataloader for GANs.
But Now I get errors as follows.I dont understand where I am going wrong with my understanding. I split the data afterwards. Although I know that we do it before dataloader. If that is the case, when I extract the patches , should I split the data after that?

I am trying to implement eriklindernoren/ **[PyTorch-GAN] cycle gan implementation. for my own dataset. In all the GANs example and implementation, the datasets are already saved in directories for train and testing. So I am have hard time to understand how should I do it my dataset after preprocessing.

In my dataprocessing file after I extracted the patches
patch_A = extract_patches(torch.tensor(ct_numpy)) >>torch.Size([112, 64, 64, 64])

patch_B =extract_patches(torch.tensor(pet_numpy)) >>torch.Size([112, 64, 64, 64])

In dataloader :

class MyDataset(Dataset) :
def init(self , patch_A, patch_B):
‘characterizes a dataset for pytorch’
self.patch_A = patch_A
self.patch_B = patch_B
self.transforms = transforms.ToTensor

 def __len__(self):
    'denotes the total number of samples'
    return len(self.patch_B) , len(self.patch_A)

 def __getitem__(self,index):
    'Generates one sample of data'
    #select sample
    x = self.patch_A[0]
    y = self.patch_B[0]
    
    return x,y

train_A_dataset = torch.utils.data.random_split(patch_A ,(0.7*len(patch_A)))

train_B_dataset =torch.utils.data.random_split(patch_B ,(0.7*len(patch_B)))

test_A_dataset =torch.utils.data.random_split(patch_A ,(len(patch_A)-len(train_A_dataset)))
test_B_dataset =torch.utils.data.random_split(patch_B ,(0.7*len(patch_B))-len(train_B_dataset))

It gives the error:
Traceback (most recent call last):
File “E:/example/DataLoader.py”, line 28, in
train_A_dataset = torch.utils.data.random_split(patch_A ,(0.7*len(patch_A)))
File “C:\Users\Anaconda3\lib\site-packages\torch\utils\data\dataset.py”, line 271, in random_split
if sum(lengths) != len(dataset):
TypeError: ‘float’ object is not iterable

ptrblck · May 26, 2020, 2:41am

Try to pass both lengths to the random_split method as:

train_A_dataset, test_A_dataset = torch.utils.data.random_split(patch_A ,[0.7*len(patch_A), 0.3*len(patch_A)])

Also, I assume you would get another error, since you are returning two lengths in your custom Dataset implementation:

def __len__(self):
    'denotes the total number of samples'
    return len(self.patch_B) , len(self.patch_A)

You should instead return a single length and make sure that both patch tensors can be indexed in this range.

CS.Enthu · May 26, 2020, 7:29am

It give me the error

Traceback (most recent call last):
File “E:/example/DataLoader.py”, line 30, in
train_A_dataset , test_A_dataset= torch.utils.data.random_split(patch_A, [0.7len(patch_A), 0.3len(patch_A)])
File “C:\Users\lib\site-packages\torch\utils\data\dataset.py”, line 274, in random_split
indices = randperm(sum(lengths)).tolist()
TypeError: randperm(): argument ‘n’ (position 1) must be int, not float

ptrblck · May 26, 2020, 7:32am

Try to wrap both values in an int via [int(0.7*len(patch_A)), int(0.3*len(patch_A))]

CS.Enthu · May 26, 2020, 8:07am

Now , its giving the error
valueError: Sum of input lengths does not equal the length of the input dataset!

ptrblck · May 26, 2020, 8:32am

This issue might be raised, if the integer rounding produces a sum of elements which doesn’t match the original size.
In that case, you could just subtract the first split:

train_A_dataset, test_A_dataset = torch.utils.data.random_split(
    patch_A ,[int(0.7*len(patch_A)), int(len(patch_A) - int(0.7*len(patch_A)))])

or use floor on one length and ceil on the other:

train_A_dataset, test_A_dataset = torch.utils.data.random_split(
    patch_A ,[math.floor(0.7*len(patch_A)), math.ceil(0.3*len(patch_A))])

CS.Enthu · May 26, 2020, 8:41am

Thank you so much ,it worked.