Strange time cost when benchmark essentially the same code

I have two Module, the logic as follow:

import torch
import torch.nn as nn

class Model1(nn.Module):
	def __init__(self, ...)

	def forward(self, x):
		return some_operation_on(x)

class Model2(nn.Module):
	def __init__(self, ...):
		self.m1 = Model1(...)

	def forward(self, x):
		output = self.m1(x)
		return output

when I use Model1 & Model2 to test cpu inference time, Model2 will cost 1 time than Model1. So what’s cause of the problem.