I want to implement cycle GANs . I have preprocessed my datasets consisting of two different modality images.After preprocessing , Since I want to send image patches to my generator model, I exctracted the image patches using ‘unfold’ and saved them as .pt files . The size after extracting the patches are torch.Size([7, 4, 4, 64, 64, 64]) , which is a total of 112 patches of size 64 x 64 x64 Now how should I write my own dataset loader for these .pt files which are stored in the memory as tensors? Thanks in advance
If all image patches fit into memory, you could create a large tensor and directly index them.
If that’s not the case, you might create a custom Dataset
for each stored file and e.g. lazily load the data in its __getitem__
.
Assuming you don’t want to shuffle the dataset, you could also use a modulo operation to check, which file to load and which patch should be loaded via this pseudo code:
def __getitem__(self, index):
file_idx = index // 112
patch_index = index % 112
# check of correct file is already open
if self.loaded_data_path != self.data_path[file_idx]:
# load file
patch = self.loaded_data[patch_index]
return patch
Note that shuffling the data (and thus the index
) will open and close a lot of files, which might hurt the performance.
Thank you .
I have one more basic question.
Now that I have stored my patches. How should I visualize each of the patches?
the patch size is 112, 64, 64, 64
Sorry, since I am a beginner !
You could use a nested loop to iterate each sample (dim0) and each channel (dim1) to visualize the 64x64 patch using matplotlib.pyplot.imshow
.
Hello,
I extracted the image patches and wrote dataloader for GANs.
But Now I get errors as follows.I dont understand where I am going wrong with my understanding. I split the data afterwards. Although I know that we do it before dataloader. If that is the case, when I extract the patches , should I split the data after that?
I am trying to implement eriklindernoren/ **[PyTorch-GAN] cycle gan implementation. for my own dataset. In all the GANs example and implementation, the datasets are already saved in directories for train and testing. So I am have hard time to understand how should I do it my dataset after preprocessing.
In my dataprocessing file after I extracted the patches
patch_A = extract_patches(torch.tensor(ct_numpy)) >>torch.Size([112, 64, 64, 64])
patch_B =extract_patches(torch.tensor(pet_numpy)) >>torch.Size([112, 64, 64, 64])
In dataloader :
class MyDataset(Dataset) :
def init(self , patch_A, patch_B):
‘characterizes a dataset for pytorch’
self.patch_A = patch_A
self.patch_B = patch_B
self.transforms = transforms.ToTensor
def __len__(self):
'denotes the total number of samples'
return len(self.patch_B) , len(self.patch_A)
def __getitem__(self,index):
'Generates one sample of data'
#select sample
x = self.patch_A[0]
y = self.patch_B[0]
return x,y
train_A_dataset = torch.utils.data.random_split(patch_A ,(0.7*len(patch_A)))
train_B_dataset =torch.utils.data.random_split(patch_B ,(0.7*len(patch_B)))
test_A_dataset =torch.utils.data.random_split(patch_A ,(len(patch_A)-len(train_A_dataset)))
test_B_dataset =torch.utils.data.random_split(patch_B ,(0.7*len(patch_B))-len(train_B_dataset))
It gives the error:
Traceback (most recent call last):
File “E:/example/DataLoader.py”, line 28, in
train_A_dataset = torch.utils.data.random_split(patch_A ,(0.7*len(patch_A)))
File “C:\Users\Anaconda3\lib\site-packages\torch\utils\data\dataset.py”, line 271, in random_split
if sum(lengths) != len(dataset):
TypeError: ‘float’ object is not iterable
Try to pass both lengths to the random_split
method as:
train_A_dataset, test_A_dataset = torch.utils.data.random_split(patch_A ,[0.7*len(patch_A), 0.3*len(patch_A)])
Also, I assume you would get another error, since you are returning two lengths in your custom Dataset
implementation:
def __len__(self):
'denotes the total number of samples'
return len(self.patch_B) , len(self.patch_A)
You should instead return a single length and make sure that both patch tensors can be indexed in this range.
It give me the error
Traceback (most recent call last):
File “E:/example/DataLoader.py”, line 30, in
train_A_dataset , test_A_dataset= torch.utils.data.random_split(patch_A, [0.7len(patch_A), 0.3len(patch_A)])
File “C:\Users\lib\site-packages\torch\utils\data\dataset.py”, line 274, in random_split
indices = randperm(sum(lengths)).tolist()
TypeError: randperm(): argument ‘n’ (position 1) must be int, not float
Try to wrap both values in an int
via [int(0.7*len(patch_A)), int(0.3*len(patch_A))]
Now , its giving the error
valueError: Sum of input lengths does not equal the length of the input dataset!
This issue might be raised, if the integer rounding produces a sum of elements which doesn’t match the original size.
In that case, you could just subtract the first split:
train_A_dataset, test_A_dataset = torch.utils.data.random_split(
patch_A ,[int(0.7*len(patch_A)), int(len(patch_A) - int(0.7*len(patch_A)))])
or use floor
on one length and ceil
on the other:
train_A_dataset, test_A_dataset = torch.utils.data.random_split(
patch_A ,[math.floor(0.7*len(patch_A)), math.ceil(0.3*len(patch_A))])
Thank you so much ,it worked.