I am trying to train one model using multiple data loaders.
An example I thought of is below.
for epoch in epochs:
train_loader = zip(train_loader_list[0], train_loader_list[1], train_loader_list[2])
for batch_idx, sample in enumerate(train_loader):
"""
for batch_idx in range(len(train_loader)):
sample = batch_queue.get()
batch_queue.task_done()
"""
data_time.update(time.time() - end)
for i in range(0, len(sample)):
input = sample[i]['image'].cuda()
target = sample[i]['label'].cuda()
target_exist = sample[i]['exist'].cuda()
output, output_exist = model(input) # output_mid
loss_seg = criterion(torch.nn.functional.log_softmax(output, dim=1), target)
loss_exist = criterion_exist(output_exist, target_exist)
loss = loss_seg + loss_exist * 0.1
loss_avg += loss
loss_avg = loss_avg / len(sample)
losses.update(loss_avg.data.item(), input.size(0))
optimizer.zero_grad()
loss_avg.backward()
optimizer.step()
I want to average the loss values of 3 data loaders and backward them.
I use DistributedDataParallel
“RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [32]] is at version 5; expected version 4 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).”
An above error occurs when proceeding in the following way.
I don’t know which part is the problem.
Please help me.
Thank you