num_cpu=12
train_dataset = torchvision.datasets.UCF101(
UCF101_ROOT_PATH,
UCF101_ANNO_PATH,
frames_per_clip=12,
step_between_clips=100,
num_workers=num_cpu,
train=True,
transform=train_transforms,
fold=self.fold,
)
dataloader = DataLoader(
self.train_dataset,
batch_size=self.batch_size,
num_workers=num_cpu,
collate_fn=custom_collate,
shuffle=True,
pin_memory=True,
)
Training freezes every 50 epochs, one CPU core is active 100% (but I have 12 cores), and GPU-util is 0% on nvidia-smi.
After some time training resumes. This behavior repeats every 50 epochs.