I am trying to build a face verification model using a siamese model on wikifaces dataset. Link to repo https://github.com/sriharsha0806/Face-Verification
I built a custom dataset and currently, when running train.py each epoch is taking too long. The dataset contains 2,24,800 images. I used only wiki faces which are 60,000 images in total. Still, it is taking more than 1-2 hours to complete each epoch. May I know what is the bug in the code or how to accelerate it?
Can you please post the output while training. Post the timing for each epoch also
for reference, I ran the experiment on my device with the below mentioned configurations
This is the output and time taken by 100 iterations in an each epoch. In each iteration, there are 4 examples.
Configuration of device: GTX970
batch_size = 4
number_of_workers = 4
Hello. Maybe the FC layers are clogging up a lot of time. Remove the fully connected layers and replace them with a AveragePooling layer
Hi, as @raghavendragaleppa mentioned try using a fully convolutional network or a larger batch size(4 seems too low). But I doubt batch_size could be a bottleneck given the FC layers size.
Hi, I am running main program with a batch size of 32
will try that but I think the main problem is with dataloader. I ran a dataiter
vis_dataloader = DataLoader(siamese_dataset,
from time import perf_counter
dataiter = iter(vis_dataloader)
example_batch = next(dataiter)
t2 = perf_counter()
concatenated = torch.cat((example_batch,example_batch))
It’s taking an average of 4s. In each epoch there are around 6000 iterations. I think this is where most of the time for code is elapsing. Anyway I will try out your idea and will let you.
The num_workers=8 is a very high number if your CPU is not powerful enough and instead it will be an overhead, and besides num_worker doesn’t help much when your batch_size is low and you have less amount of augmentations going on. The dataloader works on CPU so keeping num_workers = 8 will be problematic if your CPU is not powerfull enough. Check the timing for each iteration for num_workers=1 and num_workers=0 and also check the timings by using pin_memory=True.
Also i can see that you have very less augmentation going on, so the problem must be in the dataloader itself. Reduce the num_workers to 1 and see how much each iteration takes.
Hi @raghavendragaleppa and @mailcorahul
Actually, I forgot to mention net.train() before training This is what causing the lag. I apologise for the mistake.