ImageFolder is time consuming

image_path = "drive/MyDrive/Animal_Breed/TRAIN (1)/"

train_data_transform = transforms.Compose([

        transforms.Resize((224,224)),

        transforms.RandomHorizontalFlip(),

        transforms.RandomAffine(20),

        transforms.ToTensor(),

#         transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),

    ])

train_dataset = ImageFolder(image_path, transform = train_data_transform)
val_path = "drive/MyDrive/Animal_Breed/VAL/"

val_data_transform = transforms.Compose([

    transforms.Resize((224,224)),

    transforms.ToTensor(),

#     transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),

])

val_dataset = ImageFolder(val_path, transform = val_data_transform)
train_loader = torch.utils.data.DataLoader(train_dataset, 32, 

                                             shuffle=True,)

val_loader = torch.utils.data.DataLoader(val_dataset, 1, shuffle = False)
def train(model,dataloader,validloader,criterion,optimizer,epochs=50,scheduler=scheduler):

    max_valid_acc = 0

    train_acc,val_acc = 0,0

    for e in range(epochs):

        train_loss = 0.0

        model.train()     

        for data, labels in dataloader:

#             print(labels.shape[0])

            data, labels = data.to(device), labels.to(device)

#             print(data.shape)

            optimizer.zero_grad()

            target = model(data)

#             print(target.shape)

            loss = criterion(target.float(),labels.long())

            loss.backward()

            optimizer.step()

            train_loss = loss.item() * data.size(0)

            train_acc += torch.sum((torch.max(target, 1)[1] == labels.data),0)

            

        valid_loss = 0.0

        model.eval()     

        for data, labels in validloader:

            

            data, labels = data.to(device), labels.to(device)

            target = model(data)

            loss = criterion(target.float(),labels.long())

            valid_loss = loss.item() * data.size(0)

            val_acc += torch.sum((torch.max(target, 1)[1] == labels.data),0)

                        

        print(f'Epoch {e+1} \t\t Training Loss: {train_loss / len(dataloader)} \t\t Validation Loss: {valid_loss / len(validloader)}')

        print("Validation Accuracy ... :",val_acc/(len(validloader)))

        print("Train Accuracy ... :",train_acc/len(train_dataset))

        

        if val_acc > max_valid_acc:

            print(f'Validation Acc Increased({max_valid_acc:.6f}--->{val_acc:.6f}) \t Saving The Model')

            max_valid_acc = val_acc

            # Saving State Dict

            torch.save(model.state_dict(), 'HIGH_ACC.pth')

        scheduler.step(val_acc)

        train_acc = 0

        val_acc = 0

    return model

This is the code , I’m using VGG16 pretrained model, It’s been 30 minutes on colab but it has not finished 1 epoch yet! why it’s so much time consuming?

I don’t know where drive is located and how it’s mounted to the system, but in case it’s a network mount your current script performance would most likely be limited by the data loading speed from the network.
I don’t know, if Colab allows you to store data locally, but as a quick check you could replace the DataLoader with random tensors and check the training speed again.
If the usage of the random tensors accelerates the speed significantly, it would indeed point to a data loading bottleneck.

import os
data=[]
img_size = 48
# data = torch.empty((7000,img_size,img_size,3),dtype=torch.float32)
# label = torch.empty((7000))
def create_data():
    i = 0
    for item in (['abyssinian','american_bulldog','american_pit_bull_terrier','basset_hound',
                  'beagle','bengal','birman','bombay','boxer','british_shorthair','chihuahua',
                  'egyptian_mau','english_cocker_spaniel','english_setter','german_shorthaired',
                  'great_pyrenees','havanese','japanese_chin','keeshond','leonberger','maine_coon',
                  'miniature_pinscher','newfoundland','persian','pomeranian','pug','ragdoll','russian_blue',
                  'saint_bernard','samoyed','scottish_terrier','shiba_inu','siamese','sphynx','staffordshire_bull_terrier',
                  'wheaten_terrier','yorkshire_terrier']):
        path='drive/MyDrive/Animal_Breed/TRAIN' + item+"/"
        for img in os.listdir(path)[:]:       
            if i%1 == 0: print(i)
            new_img_array=cv2.imread(os.path.join(path,img))   
            new_img_array = cv2.cvtColor(new_img_array, cv2.COLOR_BGR2RGB)
#             print(new_img_array.shape)
            if item == 'abyssinian':
                data.append([new_img_array,0])
            elif item == 'american_bulldog':
                data.append([new_img_array,1])
            elif item == 'american_pit_bull_terrier' :
                data.append([new_img_array,2])
            elif item == 'basset_hound':
                data.append([new_img_array,3])
            elif item == 'beagle' :
                data.append([new_img_array,4])
            elif item == 'bengal':
                data.append([new_img_array,5])
            elif item == 'birman':
                data.append([new_img_array,6])
            elif item == 'bombay' :
                data.append([new_img_array,7])
            elif item == 'boxer':
                data.append([new_img_array,8])
            elif item == 'british_shorthair' :
                data.append([new_img_array,9])
            elif item == 'chihuahua':
                data.append([new_img_array,10])
            elif item == 'egyptian_mau':
                data.append([new_img_array,11])
            elif item == 'english_cocker_spaniel' :
                data.append([new_img_array,12])
            elif item == 'english_setter':
                data.append([new_img_array,13])
            elif item == 'german_shorthaired' :
                data.append([new_img_array,14])
            elif item == 'great_pyrenees':
                data.append([new_img_array,15])
            elif item == 'havanese':
                data.append([new_img_array,16])
            elif item == 'japanese_chin' :
                data.append([new_img_array,17])
            elif item == 'maine_coon':
                data.append([new_img_array,18])
            elif item == 'miniature_pinscher' :
                data.append([new_img_array,19])
            elif item == 'newfoundland':
                data.append([new_img_array,20])
            elif item == 'persian':
                data.append([new_img_array,21])
            elif item == 'pomeranian' :
                data.append([new_img_array,22])
            elif item == 'pug':
                data.append([new_img_array,23])
            elif item == 'ragdoll' :
                data.append([new_img_array,24])
            elif item == 'saint_bernard':
                data.append([new_img_array,25])
            elif item == 'keeshond':
                data.append([new_img_array,26])
            elif item == 'leonberger' :
                data.append([new_img_array,27])
            elif item == 'russian_blue':
                data.append([new_img_array,28])
            elif item == 'samoyed' :
                data.append([new_img_array,29])
            elif item == 'scottish_terrier':
                data.append([new_img_array,30])
            elif item == 'shiba_inu':
                data.append([new_img_array,31])
            elif item == 'siamese' :
                data.append([new_img_array,32])
            elif item == 'sphynx':
                data.append([new_img_array,33])
            elif item == 'staffordshire_bull_terrier' :
                data.append([new_img_array,34])
            elif item == 'wheaten_terrier':
                data.append([new_img_array,35])
            elif item == 'yorkshire_terrier':
                data.append([new_img_array,36])
            i+=1

I have done this and created whole DataLoader, It’s pretty faster, I think that’s network problem what happened in colab. But still, does ImageFolder read the data from the disk every epoch? I mean to say when I tried to train this (ImageFolder) on local machine there was lots of use of CPU and less of GPU (more time for CPU and less time for GPU), but when I created data as shown in the code above the GPU was almost fully occupied for the whole training phase.

ImageFolder will lazily load and process each sample to save memory.
Based on your code snippet you are preloading the entire dataset, which would of course cost a startup time, but could be faster in each iteration. This approach is often not possible, as datasets are often too large to fit into the RAM.
Also, if you store the data on a local SSD and use multiple workers in the DataLoader you might also be able to hide the data loading latency (each sample will be loaded and processed in the background, while the GPU is busy with the training).

Thank you for replying, How to decide the value of num_workers in DataLoader?

The optimal number of workers depends on the system setup and you could start with e.g. the number of CPU cores and compare it to a few other settings.

1 Like