Torch.isnan/torch.isinf cost long in the first iteration step

The bug part resulting the long running time is listed here. Bug part

The code calculates loss for individual sample in batch to avoid single sample loss nan or inf.

The shapes for loss_dsr_c and loss_dsr_mc are same, which is [batch_size].

I tried to profile the code as shown below. Profiling part

start = time.time()
if torch.isnan(loss_dsr_c[idx]) or torch.isnan(loss_dsr_mc[idx]) or \
   torch.isinf(loss_dsr_c[idx]) or torch.isinf(loss_dsr_mc[idx]):
    pass
    # imgs, imgname, grphs = gt_batch['img'], gt_batch['imgname'], gt_batch['grph']
    # debug_rend_out(imgs, grphs, cur_rend_out, cur_dsr_mc_img_label, \
    #                cur_dsr_mc_dist_mat, idx)
    # logger.warning(f'loss is nan for {imgname[idx]}')
    # logger.warning(f'current_rend - {torch.unique(cur_rend_out)}')
    # logger.warning(f'Rend_DSR_C - {torch.unique(rend_dsr_c)}')
    # logger.warning(f'Rend_DSR_MC - {torch.unique(rend_dsr_mc)}')
    # loss_dsr_c[idx] = 0.
    # loss_dsr_mc[idx] = 0.
logger.info(f"Step {idx}: " + str(time.time()-start) + "s")

Here is the corresponding console output.

2022-09-09 14:24:05.891 | INFO     | dsr.losses.losses:sr_losses:467 - Step 1: 1.0544300079345703s
2022-09-09 14:24:05.892 | INFO     | dsr.losses.losses:sr_losses:467 - Step 2: 6.937980651855469e-05s
2022-09-09 14:24:05.892 | INFO     | dsr.losses.losses:sr_losses:467 - Step 3: 6.127357482910156e-05s
2022-09-09 14:24:05.893 | INFO     | dsr.losses.losses:sr_losses:467 - Step 4: 0.00018215179443359375s
2022-09-09 14:24:05.893 | INFO     | dsr.losses.losses:sr_losses:467 - Step 5: 5.7697296142578125e-05s
2022-09-09 14:24:05.893 | INFO     | dsr.losses.losses:sr_losses:467 - Step 6: 7.033348083496094e-05s
2022-09-09 14:24:05.894 | INFO     | dsr.losses.losses:sr_losses:467 - Step 7: 7.462501525878906e-05s
2022-09-09 14:24:05.894 | INFO     | dsr.losses.losses:sr_losses:467 - Step 8: 7.319450378417969e-05s
..................................................

It is weird to find that only in the first iteration step, the cost time is much longer than the others while running the same code (i.e. torch.isnan and torch.isinf).