Here I make bad_variable_used_across_loop an attribute of Foo only to record the value of for further use. But this variable keeps gradient flow through across batch!
To solve this, add model.bad_variable_used_across_loop.detach() at the end of each training loop.
model = Foo()
for step in range(100000):
start = time.time()
x = torch.randn([10, 2])
loss = model(x).sum()
loss.backward()
end = time.time()
model.bad_variable_used_across_loop.detach() # detach it
print(f'step {step:05d}: {end-start:.2f}s')
Hi, similar issue occurs to me while training, but my problem is that after I stop the process, load latest model and continue training, the speed becomes normal. I know my problem can be fixed by restarting the process, but just wondering why and how can I solve the problem. I am using 2080 ti and this happens both while training YOLO and my customized model.
Thank you in advance.
hey, could you please explain more about the usage .detach() a bit more in the case of accumulating of loss. Sorry if my question is too basic, Iām still a new hand to PyTorch.
epoch_loss = 0
n_train = len(train_loader)
with tqdm(total=n_train, desc=f'Epoch {epoch + 1}/{epochs}', unit='img') as pbar:
for batch in train_loader:
net.train()
imgs = batch['image']
true_masks = batch['labels']
imgs = imgs.to(device=device, dtype=torch.float32)
mask_type = torch.float32 if net.n_classes == 1 else torch.long
true_masks = true_masks.to(device=device, dtype=mask_type)
logits,probs,masks_pred = net(imgs)
logits = torch.squeeze(logits,1)
loss = criterion(logits, true_masks)
epoch_loss += loss.item()/n_train
optimizer.zero_grad()
loss.backward()
optimizer.step()
# to be continued
If I understood it correctly in this case, epoch_loss = 0, wouldnāt let the loss being saved from one iteration to the another. And, this is called .detach().
Thank u.
Hey, I am facing the same problem where the training speed for each batch grows within an epoch. I dont quite understand how did you solve this problem?
just run into this issue on some code where each epoch ~50k minibtaches are processed for training.
processing an epoch starts at ~13.6it/s and ends up with ~6it/s.
after some digging, it turns out, this slowness is caused by this operation that is executed every minibatch:
stats = stats + new_list
new_list holds 32 values. unfortunately, this is a very expensive operation since it will create a new list for stats. the cost depends on the size of the list. this stats is reset every epoch, but, it still grows quickly during iteration over minibatches. toward the end of an epoch, stats holds about 1.6 million elements.
adding new elements should be done directly on the existing list stats using in-place operation such as:
these operations wont create a new list.
using the second method stats.extend(new_list) maintains the processing time of each minibatch at the same level: ~13.6it/s during the entire epoch.
note: all values are detached, and on cpu.