Why time.time() in python is inaccurte?

LeeDoYup · August 28, 2020, 6:48am

Thanks Tom.
I checked both time.perf_counter() and time.process_time() with torch.cuda.synchronize(), and got similar results to time.time()

iv) use time.perf_counter() w/ torch.cuda.synchronize()

shuffle time: 0.0650 s
inf time: 0.0587 s

v) use time.process_time() w/ torch.cuda.synchronize()

shuffle time: 0.0879 s
inf time: 0.0584 s

When comparing all the results, the inference time is consistent,
but the shuffle time is inconsistent by the profiling method.

the shuffle time is shuffleBN is Moco

github.com

facebookresearch/moco/blob/78b69cafae80bc74cd1a89ac3fb365dc20d157d3/moco/builder.py#L133



# compute query features
q = self.encoder_q(im_q)  # queries: NxC
q = nn.functional.normalize(q, dim=1)

# compute key features
with torch.no_grad():  # no gradient to keys
    self._momentum_update_key_encoder()  # update the key encoder

    # shuffle for making use of BN
    im_k, idx_unshuffle = self._batch_shuffle_ddp(im_k)

    k = self.encoder_k(im_k)  # keys: NxC
    k = nn.functional.normalize(k, dim=1)

    # undo shuffle
    k = self._batch_unshuffle_ddp(k, idx_unshuffle)

# compute logits
# Einstein sum is more intuitive
# positive logits: Nx1

which gathers all samples, shuffles the index, and reallocates mini-batch to each GPU.

github.com

facebookresearch/moco/blob/78b69cafae80bc74cd1a89ac3fb365dc20d157d3/moco/builder.py#L69-L94


def _batch_shuffle_ddp(self, x):
    """
    Batch shuffle, for making use of BatchNorm.
    *** Only support DistributedDataParallel (DDP) model. ***
    """
    # gather from all gpus
    batch_size_this = x.shape[0]
    x_gather = concat_all_gather(x)
    batch_size_all = x_gather.shape[0]

    num_gpus = batch_size_all // batch_size_this

    # random shuffle index
    idx_shuffle = torch.randperm(batch_size_all).cuda()

    # broadcast to all gpus
    torch.distributed.broadcast(idx_shuffle, src=0)

    # index for restoring
    idx_unshuffle = torch.argsort(idx_shuffle)

This file has been truncated. show original

I cannot infer a reason why only the record time of this shuffle operation has varied by measuring method.