Loop the same image 5 times for predict, the first loop is much slower than others

class model_predict():
    def __init__(self, model_path, use_cuda):
        self.use_cuda = use_cuda  # torch.cuda.is_available()
        self.transform_train = transforms.Compose([
            transforms.Resize((128, 128)),
            transforms.ToTensor(),
            transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
        ])
        nn = torch.load(model_path)
        self.net = nn['net']
        self.net.eval()
        if self.use_cuda:
            self.net.cuda()

    def test(self, img_path):
        #start1 = time.time()
        testset = MyDataset(txt=img_path, transform=self.transform_train)
        testloader = torch.utils.data.DataLoader(testset, batch_size=100, shuffle=False, num_workers=0)
        for batch_idx, (inputs, targets) in enumerate(testloader):
            if self.use_cuda:
                inputs, targets = inputs.cuda(), targets.cuda()
            inputs, targets = Variable(inputs), Variable(targets)
            start1 = time.time()
            outputs = self.net(inputs)
            _, predicted = torch.max(outputs.data, 1)
            print("Test time used1:", (time.time() - start1) * 1000)
            return predicted[0]

model = model_predict('../model/model1.pt', torch.cuda.is_available())

array = [
    '../imges/1.jpg',
    '../imges/1.jpg',
    '../imges/1.jpg',
    '../imges/1.jpg',
    '../imges/1.jpg'
]

for img in array:
    model.test(img)

The result time:
Test time used1: 1501.086711883545
Test time used1: 23.436307907104492
Test time used1: 23.624181747436523
Test time used1: 23.0100154876709
Test time used1: 23.102998733520508

I do not know why the first loop is slower than other?
Any ideas?

CUDA calls are asynchronous, so if you would like to time them, you should call torch.cuda.cynchronize() before starting and stopping the timer.
Besides that, the first loop might be slower, because the DataLoader has to load the initial batch which might take some time. The following batches might be already waiting in the queue.

in which line we write this code torch.cuda…synchronize

Add them before starting and stopping the timers via:

torch.cuda.synchronize()
t0 = time.perf_counter()

or use torch.utils.benchmark, which will synchronize and add warmup iterations.

sir predication time of model takes 30secs i want to descrse predication time any idea for that

Profile the code first and narrow down where the bottleneck is. Without any information about the workload and which part is slow we can only speculate.

how to profile the code i dont know

You could use the PyTorch Profiler or e.g. Nsight Systems.

i predicating image time is slow