Making torch load faster

Bjorn_Lindqvist · September 30, 2024, 4:44pm

Is there any way of making “import torch” run faster? I get that PyTorch is huge so loading all its code must take a few seconds. However, many times I’m just testing things out and only need a fraction of what PyTorch has to offer. Is it possible to load PyTorch incrementally or something to reduce the very annoying 5-10 second “lag” import torch causes?

ptrblck · September 30, 2024, 8:07pm

5-10 seconds sounds way too long as a quick check shows my env imports torch in <1 sec:

%time import torch
CPU times: user 759 ms, sys: 70.2 ms, total: 829 ms
Wall time: 850 ms

(unsure how exact this way of timing is but it also corresponds to a naive manual timing)

Bjorn_Lindqvist · October 1, 2024, 7:01am

Here are my stats:

In [1]: %time import torch
CPU times: user 2.74 s, sys: 139 ms, total: 2.88 s
Wall time: 2.09 s

But the real culprit is actually another import which for some stupid reason loads TensorFlow:

In [1]: %time from torch.utils.tensorboard import SummaryWriter
2024-10-01 08:59:52.196854: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical r
esults due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_
ONEDNN_OPTS=0`.
2024-10-01 08:59:52.208956: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to
register factory for plugin cuFFT when one has already been registered
2024-10-01 08:59:52.222618: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to
register factory for plugin cuDNN when one has already been registered
2024-10-01 08:59:52.226873: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting
to register factory for plugin cuBLAS when one has already been registered
2024-10-01 08:59:52.236922: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU i
nstructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-10-01 08:59:52.933249: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
CPU times: user 3.25 s, sys: 652 ms, total: 3.91 s
Wall time: 3.36 s

I’m running PyTorch why would I need TensorFlow?

AlexP4499 · October 1, 2024, 9:24am

Totally different approach - but would using Jupyter Notebook for dev be an approach that works for you?

I use VS code for my main development environment, but I switch to Jupyter for when I want to rapidly iterate.

I find the import/module caching to significantly help in this respect, as I have all my base classes and imports in a top tell which I don’t need to recalculate, then I am free to play with model params or new classes in new cells.

I know VS code supports notebooks… but I prefer the browser environment for some reason… I find it a bit more intuitive having the two completely separated.

ptrblck · October 1, 2024, 1:41pm

You don’t and I also don’t even have it installed:

%time import torch
CPU times: user 813 ms, sys: 56.9 ms, total: 870 ms
Wall time: 896 ms

%time from torch.utils.tensorboard import SummaryWriter
CPU times: user 87.7 ms, sys: 4.02 ms, total: 91.7 ms
Wall time: 91.4 ms

import tensorflow
Traceback (most recent call last):

  Cell In[3], line 1
    import tensorflow

ModuleNotFoundError: No module named 'tensorflow'

Bjorn_Lindqvist · October 1, 2024, 3:25pm

I removed TensorFlow but importing PyTorch is still slow:

In [1]: %time import torch
CPU times: user 2.66 s, sys: 765 ms, total: 3.42 s
Wall time: 2.92 s
In [2]:

Maxim_Egorushkin · October 5, 2024, 7:12pm

One design pattern I have been using for efficient parallelization, is import required libraries and load/construct datasets once, and then fork child processes in the middle of Python functions, which proceed with computations without a delay with PyTorch or TensorFlow. The child processes must terminate with os._exit to prevent destruction of process shared resources.

roach · October 7, 2024, 11:58am

I remember a couple of months back that i installed torch and then installed tensorboard (which also installed tensorflow as a dependency). Whenever i would load torch from that point on it would subsequently also load tensorflow. Maybe one of the modules within your installation of torch got crossovered with a tensorflow import?

Bjorn_Lindqvist · February 7, 2025, 9:35am

No, now I only have torch installed. The import lag really annoying when creating command-line utilities based on torch. E.g. “some-script.py --help” just shows a help message, but you still have the two seconds lag because of an “import torch” statement.

Here is a treemap created using “-X importtime”:

Also found this: Delving into what happens when you `import torch` - PyTorch Developer Mailing List