I have the following code snippet.
import time import torch import numpy as np import matplotlib.pyplot as plt from scripts.custom.functional.python.functional_operations import get_functional_operation device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') input = torch.randint(32, size=(128, 32, 64, 64), dtype=torch.float, device=device, requires_grad=True) weights = torch.randint(32, size=(64, 32, 5, 5), dtype=torch.float, device=device, requires_grad=True) n_rounds = 1024 duration = np.zeros((n_rounds)) conv2d = get_functional_operation('fft-custom') for i in range(n_rounds): start = time.time() output = conv2d(input, weights) if torch.cuda.is_available(): torch.cuda.synchronize() duration[i] = time.time() - start
I got assigned the
Tesla V100-SXM2... and using the Torch Version
I ran this code once and got the following plot for
Then I reset my kernel and ran it again, resulting on the following plot.
As part of my research I have to measure the time of the functions I am working with, I am wonder what makes these oscillations on the execution time, even on an extremely controlled and unpractical scenario like the one above were the exact same function with the exact same parameters are evaluated over and over.
Would that be cause because I am using a virtual environment that does not guarantee dedicated resources?
By the way, how does the graph execution gets optimized anyways? PyTorch is purely on lazy-execution, right? I noticed that usually the first couple of interactions are slower, kind of the execution is still being optimized or it is “warming up”. So, when measuring time I usually disregard the first
128 iterations to be safe. Is there a fixed number of “warming up” iterations?