Hi
Im training a semantic segmentation model and find that my training speed is very slow. Im using a custom dataloader, and my images are about 350* 400 pixels in size. Each training instance is taking about 1.3 sec, and Im using a TeslaP100 on google Cloud.
I used time.time and profiled my training code. I found that the time taken to transfer the image data was the one taking the most amount of time, almost 1 sec.
Can someone help me with ideas on how to speed it up? should I use pytorchs dataloader and pin_memory?
Ive attached the training function below. The lines taking a long time are marked
def train(trainable_model, train_data, optimizer, epoch, criterion):
total_train_data = train_data.__len__()
batch_indices = np.array_split(np.random.permutation(total_train_data),
math.ceil(total_train_data / settings.opt['batch_size']))
trainable_model.train()
total_train_loss = 0
trainable_model = trainable_model.cuda()
criterion = criterion.cuda()
for batch_index, indices in enumerate(tqdm(batch_indices)):
optimizer.zero_grad()
for idx,index in enumerate(tqdm(indices)):
rgb, mask, filename,humanImg = train_data[index]
var_rgb = Variable(rgb.unsqueeze(0).float())
var_mask = Variable(mask.float())
var_mask = var_mask.cuda() <--- taking 5 ms
var_rgb = var_rgb.cuda() <--- taking 1 sec
output = trainable_model(var_rgb)
loss = ((criterion(output, var_mask.unsqueeze(0).long()) / (len(indices)))
total_train_loss += loss.data[0]
loss.backward()
del var_mask, var_rgb # after every batch delete reference to any variables
optimizer.step()