How can I handle sklearn.Kfold with ImageFolder?

Maryam_M · June 19, 2020, 3:11am

I am trying to use Kfold on image classification task
/data/
class1/ image1
/image2
…
class2/image1
/image2
…
…
class6/image1
/image2
…
My code is as follows

dsets = torchvision.datasets.ImageFolder(data_dir)

for i_fold, (train_idx, valid_idx) in enumerate(folds.split(dsets))
   dataset_train = Subset(dsets, train_idx,transforms = transforms_train)
   dataset_valid = Subset(dsets, valid_idx,transforms=transforms_valid )
 ` trainloader = torch.utils.data.DataLoader( train, batch_size = batch_size, shuffle =True)`

 ` testloader = torch.utils.data.DataLoader(test, batch_size=batch_size,
     shuffle=True)`

for the subset I copied a new Subset to keep the transform in the Subset (not the parent).

the error i m getting is :
RecursionError: maximum recursion depth exceeded

the labels are the names of the folders

can someone please tell me what s wrong with my code?

Nikronic · June 19, 2020, 5:09am

Hi,

I ran your code and it works just fine. But I have changed few things that I cannot have a clear thought on them:

Is Subset the classes from torch.utils.data.Subset or you have created your own class?
In 2 last lines, you have passed train and test as datasets to data loaders. Is it typo?

Maryam_M:

trainloader = torch.utils.data.DataLoader( train ###??, batch_size = batch_size, shuffle =True)
testloader = torch.utils.data.DataLoader(test ###?, batch_size=batch_size, shuffle=True)

Here is the code I ran:


from sklearn.model_selection import KFold

!wget https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip
!unzip kagglecatsanddogs_3367a.zip

folds = KFold(shuffle=True)
dsets = torchvision.datasets.ImageFolder('PetImages')

for i_fold, (train_idx, valid_idx) in enumerate(folds.split(dsets)):
    dataset_train = data.Subset(dsets, train_idx)
    dataset_valid = data.Subset(dsets, valid_idx)
    trainloader = torch.utils.data.DataLoader(dataset_train, batch_size=256, shuffle =True)
    testloader = torch.utils.data.DataLoader(dataset_valid, batch_size=256, shuffle=True)

bests

Maryam_M · June 19, 2020, 6:36am

I realised that the code is fine it needed just to re-run to work.
Yes I used a new class of Subset that I found in the solutions here subset
Unfortunatly I got another issue this time with this loop

for i_fold, (train_idx, valid_idx) in enumerate(folds.split(dsets)):
…
trainloader = torch.utils.data.DataLoader( train, batch_size = batch_size, shuffle =True)
for epoch in range(N_EPOCHS):
for i, data in enumerate(trainloader , 0):
# get the inputs
inputs, labels = data

with the error
AssertionError: force_apply must have bool or int type

if you have a clue on why this accures

thanks you for your response

Nikronic · June 19, 2020, 7:29am

Could you share the stack trace of error and codes corresponding those errors?

Maryam_M · June 19, 2020, 8:04pm

the fonction i am calling for each fold is :

 def train_one_fold(i_fold, model, criterion, optimizer, dataloader_train, dataloader_valid):    
        train_fold_results = []
        for epoch in range(N_EPOCHS):
        
        model.train()
        tr_loss = 0    

        for i, data in enumerate(dataloader_train, 0):
                 # get the inputs 
                 inputs, labels = data

                if use_gpu:
                          inputs, labels = Variable(inputs.cuda()), Variable(labels.cuda(non_blocking=True))
                else:
                          inputs, labels = Variable(inputs), Variable(labels)

                outputs = model(inputs)
                loss = criterion(outputs, labels)                

                 loss.backward()

                 tr_loss += loss.item()

                 optimizer.step()
                 optimizer.zero_grad()

         # Validate
         model.eval()
         val_loss = 0
         val_preds = None
         val_labels = None

        for i, data in enumerate(dataloader_valid, 0):
                images, labels = data

                if use_gpu:
                        images, labels = (images.cuda()), (labels.cuda(async=True))
                else:
                      images, labels = Variable(images), Variable(labels)
                 
                with torch.no_grad():
                     outputs = model(images)

               loss = criterion(outputs, labels)
                val_loss += loss.item()
                preds = torch.softmax(outputs, dim=1).data.cpu()

               if val_preds is None:
                    val_preds = preds
               else:
                   val_preds = torch.cat((val_preds, preds), dim=0) 
    return val_preds

so here is the trace of error

AssertionError Traceback (most recent call last)

in ()
24 optimizer = optim.Adam(plist, lr=5e-5)
25
—> 26 val_preds, train_fold_results = train_one_fold(i_fold, model, criterion, optimizer, trainloader, testloader)
27 oof_preds[valid_idx, :] = val_preds.numpy()
28

---------------------------------------------------6 frames-------------------------------------------------------

in train_one_fold(i_fold, model, criterion, optimizer, dataloader_train, dataloader_valid)
12 tr_loss = 0
13
—> 14 for i, data in enumerate(dataloader_train, 0):
15 # get the inputs
16

/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py in _ next_(self)
343
344 def _ next_(self):
→ 345 data = self._next_data()
346 self._num_yielded += 1
347 if self._dataset_kind == _DatasetKind.Iterable and \

/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py in _next_data(self)
383 def _next_data(self):
384 index = self._next_index() # may raise StopIteration
→ 385 data = self._dataset_fetcher.fetch(index) # may raise StopIteration
386 if self._pin_memory:
387 data = _utils.pin_memory.pin_memory(data)

/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
42 def fetch(self, possibly_batched_index):
43 if self.auto_collation:
—> 44 data = [self.dataset[idx] for idx in possibly_batched_index]
45 else:
46 data = self.dataset[possibly_batched_index]

/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py in (.0)
42 def fetch(self, possibly_batched_index):
43 if self.auto_collation:
—> 44 data = [self.dataset[idx] for idx in possibly_batched_index]
45 else:
46 data = self.dataset[possibly_batched_index]

in _ getitem_(self, idx)
14 def _ getitem_(self, idx):
15 im, labels = self.dataset[self.indices[idx]]
—> 16 return self.transform(im), labels
17
18 def _ len_(self):

/usr/local/lib/python3.6/dist-packages/albumentations/core/composition.py in _ call_(self, force_apply, **data)
162
163 def _ call_(self, force_apply=False, **data):
→ 164 assert isinstance(force_apply, (bool, int)), “force_apply must have bool or int type”
165 need_to_run = force_apply or random.random() < self.p
166 for p in self.processors.values():

AssertionError: force_apply must have bool or int type

ptrblck · June 20, 2020, 8:26am

It seems that the error is raised by albumations, so you might need to check the transformations you are using.
Maybe you are trying to pass multiple values to one transformation and the second argument is accidentally interpreted as the force_apply argument.