I’m encountering an unexpected issue in my machine learning pipeline. Specifically, after loading the model weights onto the GPU, the CPU-based preprocessing function, which relies solely on NumPy and ndarray operations and is entirely independent of the GPU or the model, experiences a significant slowdown.
# super slow
def preprocess(arr):
...do some with numpy
return arr
device = "cuda"
model = model.to(device)
ckpt = torch.load(saved_checkpoint_path, map_location=device)
model_state = ckpt['model']
model.load_state_dict(model_state)
src = []
for data in file_list:
ndarray = read_data(data)
output = preprocess(ndarray)
src.append(output)
There is no data transfer between the CPU and GPU for the actual data used in the preprocessing. Strangely, performing preprocessing and then loading the model weights to the GPU separately is faster compared to the scenario where the weights are loaded onto the GPU before the preprocessing step. This suggests that the act of loading model weights onto the GPU is somehow affecting the CPU-based preprocessing, even though there should be no direct interaction between the two.
# super fast
src = []
for data in file_list:
ndarray = read_data(data)
output = preprocess(ndarray)
src.append(output)
device = "cuda"
model = model.to(device)
ckpt = torch.load(saved_checkpoint_path, map_location=device)
model_state = ckpt['model']
model.load_state_dict(model_state)
I’ve ruled out data transfer issues and confirmed that the preprocessing itself is not GPU-dependent. Has anyone encountered a similar situation, and if so, what could be causing this unexpected slowdown? I would appreciate any insights or suggestions on how to troubleshoot and resolve this issue.