GPU memory consumption and inference time is higher for first inference

It is noticed that gpu memory consumption and inference time is much higher for first inference than subsequent inferences. I have set batch size = 1. E.g. Performing inference on 9 samples is giving this inference time and memory consumption. For memory consumption I reset peak memory stats at every iteration of dataloader and use max memory allocated after the inference to get the gpu memory consumption for each inference. For inference time, I simply use time library and get the difference of start and end time.

Here is the code snippet:


    with torch.no_grad():
        for shape_index, (file_path) in enumerate(tqdm(dataset.file_paths, desc='eval', ncols=0)):

            torch.cuda.empty_cache()
            torch.cuda.reset_peak_memory_stats()


            data = np.loadtxt(file_path).astype(np.float32)
            start_time = time.time()

            # model inference
            inputs = torch.from_numpy(data).float().to(configs.device)
            vote_confidences = F.softmax(model(inputs), dim=1)

            time_consumption = (time.time() - start_time)
            memory_consumption = (torch.cuda.max_memory_allocated())
			

Here are the inference time and memory consumption.

memory_consumption: [10448984064, 1206454272, 1206454272, 1206454272, 1206454272, 1206454272, 1206454272, 1206454272, 1206454272]

time_consumption: [4.7059266567230225, 0.8842918872833252, 0.9028427600860596, 0.8580131530761719, 0.862271785736084, 0.8550527095794678, 0.8648409843444824, 0.8601820468902588, 0.8670477867126465]

Is there any specific reason for this observation?

Many thanks