I am experimenting with the following repository. GitHub - Keiku/PyTorch-Lightning-CIFAR10: "Not too complicated" training code for CIFAR-10 by PyTorch Lightning
I have implemented two methods, one is to load CIFAR-10 from torchvision and the other is to load CIFAR-10 as a custom dataset. Also, I have implemented two models: a lightweight model (eg scratch resnet18, timm MobileNet V3, etc.) and a relatively heavy model (eg scratch resnet50, timm resnet152).
After some experiments, I found the following.
- GPU usage remains high (nearly 100%) on any model when loading CIFAR-10 with torchvision
- When loading CIFAR-10 as a custom dataset, GPU usage remains relatively high (still temporarily zero) for heavy models
- When loading CIFAR-10 as a custom dataset, GPU usage remains low (going back and forth between 0% and 100%) for lightweight models (resnet18, MobileNetV3)
In this situation, is there a problem with the implementation code of the custom dataset? Also, please let me know if there is a way to increase GPU usage even for lightweight models.
I am experimenting in the following EC2 g4dn.xlarge environment.
⋊> ~ lsb_release -a (base) 21:45:51
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
⋊> ~ nvidia-container-cli info (base) 21:48:20
NVRM version: 450.80.02
CUDA version: 11.0
Device Index: 0
Device Minor: 0
Model: Tesla T4
GPU UUID: GPU-ba54be15-066e-e7e5-87d0-84b8ac2672c6
Bus Location: 00000000:00:1e.0
Your “lightweight models” need less GPU compute and thus shift the overall computation more towards the CPU workload, which is most likely defined by the data loading.
In such use cases (i.e. using tiny models), you would have to make sure the data loading won’t be a bottleneck, since the GPU workload is tiny as explained before.
Based on your observations it seems that the custom CIFAR dataset is slower in the data loading pipeline than the torchvision implementation, which lets the GPU starve especially for tiny workloads.
The implementation of my custom dataset is simple. I have implemented it as follows, but what is the likely bottleneck?
from pathlib import Path
import numpy as np
import pandas as pd
from PIL import Image
def __init__(self, cfg, train, transform=None):
self.transform = transform
self.cfg = cfg
self.split_dir = "train" if train else "test"
self.root_dir = Path(cfg.dataset.root_dir)
self.image_dir = self.root_dir / "cifar" / self.split_dir
self.file_list = [p.name for p in self.image_dir.rglob("*") if p.is_file()]
self.labels = [re.split("_|\.", l) for l in self.file_list]
self.targets = self.label_mapping(cfg)
def label_mapping(self, cfg):
labels = self.labels
label_mapping_path = Path(cfg.dataset.root_dir) / "cifar/labels.txt"
df_label_mapping = pd.read_table(label_mapping_path.as_posix(), names=["label"])
df_label_mapping["target"] = range(cfg.train.num_classes)
label_mapping_dict = dict(
targets = [label_mapping_dict[i] for i in labels]
def __getitem__(self, index):
filename = self.file_list[index]
targets = self.targets[index]
image_path = self.image_dir / filename
image = Image.open(image_path.as_posix())
if self.transform is not None:
transform = self.transform
image = transform(image)
return image, targets
I’m sorry I understand the cause. Loading images on AWS EFS was the cause of low GPU usage. GPU uasge remained high (nearly 100%) when loaded from AWS EBS.