Hi,
I’m training a segmentation model, basically torchvision.models.segmentation.deeplabv3_resnet50
It takes ~45 minutes for one epoch. I feel this is too slow. I hope someone can help me find redundancy/mistakes (if any) in my codes. Or any suggestions on how to speed up each epoch are appreciated.
- Image shape: 512x512
- Training size: 20k
- Batch size: 8
- GPU: NVIDIA Tesla V100
- Memory-Usage: 11805MiB / 16160MiB
- GPU-Util: 85%-100%
My codes almost followed this tutorial
model = models.segmentation.deeplabv3_resnet50(pretrained=1, progress=0, num_classes=10)
model.to(device)
optimizer = optim.Adam(model.parameters(), lr=0.0001)
lr_scheduler = optim.lr_scheduler.ExponentialLR(optimizer=optimizer, gamma=0.8)
for epoch in range(num_epochs):
for phase in ['train', 'test']:
if phase == 'train':
model.train()
else:
model.eval()
for batch in dataloader_dict[phase]:
image, label = batch['image'].to(device), batch['label'].to(device)
optimizer.zero_grad()
# forward pass
with torch.set_grad_enabled(phase == 'train'):
output = model(image)['out']
loss = criterion(output, label)
prediction = output.argmax(dim=1)
if phase == 'train':
loss.backward()
optimizer.step()
if phase == 'train':
lr_scheduler.step()