Operation execution error

  1. I want to measure the execution time of “torch.ops.aten.embedding.default” operation. My code -
primals_2 = torch.rand(2, 768, dtype=torch.float32)
primals_205 = torch.randint(1, 5000, [1, 62], dtype=torch.int64)
embedding_1 = torch.ops.aten.embedding.default(primals_2, primals_205)

Here, primals_2 and primals_205 are two random tensors. I got the shape of these two tensors from FX graph. While I am executing the code snippet, I am getting following error -

Traceback (most recent call last):
File “operation_test.py”, line 36, in
embedding_1 = torch.ops.aten.embedding.default(primals_2, primals_205)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/miniconda3/envs/pytorch/lib/python3.11/site-packages/torch/_ops.py”, line 513, in call
return self._op(*args, **(kwargs or {}))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: index out of range in self

  1. for the “torch.ops.prims.unsafe_index_put.default” operation execution -
slice_4 = torch.randint(1, 5000, [1, 62], dtype=torch.int64)
full_default_3 = torch.rand(512, 768, dtype=torch.float32)
_unsafe_index_put = torch.ops.prims._unsafe_index_put_.default(full_default_3, slice_4, where, True)

I am getting following error -
Traceback (most recent call last):
File “operation_test.py”, line 38, in
_unsafe_index_put = torch.ops.prims.unsafe_index_put.default(full_default_3, slice_4, where, True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/miniconda3/envs/pytorch/lib/python3.11/site-packages/torch/_ops.py”, line 822, in getattr
raise AttributeError(
AttributeError: ‘_OpNamespace’ ‘prims’ object has no attribute ‘unsafe_index_put

  1. for operation “torch.ops.aten.index_put_.default”
full_default_16 = torch.randint(1, 5000, [1], dtype=torch.int64)
remainder = torch.randint(1, 5000, [1], dtype=torch.int64)
full_default_17 = torch.rand(1, 54, 2, dtype=torch.float32)
tangents_1 = torch.rand(1, 2, dtype=torch.float32)
index_put = torch.ops.aten.index_put_.default(full_default_17, [full_default_16, remainder], tangents_1, True)

Traceback (most recent call last):
File “operation_test.py”, line 40, in
index_put = torch.ops.aten.index_put_.default(full_default_17, [full_default_16, remainder], tangents_1, True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/miniconda3/envs/pytorch/lib/python3.11/site-packages/torch/_ops.py”, line 513, in call
return self._op(*args, **(kwargs or {}))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: index 4451 is out of bounds for dimension 0 with size 1

I am getting this operation list for Bert model. Any kind of help will be appreciated.

In the first example it seems you are trying to index a weight tensor with two embeddings with an index containing values in [1, 4999] so the indexing error is expected. I would suggest to check the same for 3.

  1. Are you suggesting checking in source code of “torch.ops.aten.embedding.default” and “torch.ops.aten.index_put_.default” operations? How do I find the source code for this two operation in pytorch repository?

  2. When I am training a model using pytorch, for each iteration, it generates tensors for inputs, weight, bias, activation and gradients. Is there any way to track tensors’ generation in pytorch source code? Or in which class (pytorch source code) these tensors get created?

@ptrblck . For 1 and 3, index out of range, I figured out the problem. As you said in your answer, my array is pointing to the wrong index. For 2, " torch.ops.prims.unsafe_index_put .default" operation is giving ‘_OpNamespace’ ‘prims’ object has no attribute ‘unsafe_index_put ’ error. Can you suggest something for this one?

I’m unsure how the 2nd use case is created and thus how the error is seen. Could you share more information about it or post a minimal and executable code snippet reproducing it?

@ptrblck

model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased").to("cuda")
torch._dynamo.reset()
model_compiled = torch.compile(
        model,
        options={
            "trace.enabled": True,
        },
)

labels = torch.tensor([1], dtype=torch.int64).to("cuda")
input_ids = torch.tensor([[  101,  2007, 18856,  8486,  7629,  1005,  1055,  6689,  1010,  8040,
         22658,  1011, 20228, 10593,  1005,  1055,  2190,  1011,  4855,  3688,
          2085,  2024,  2048,  5850,  2109,  2362,  2000,  7438, 28389,  1039,
          1010,  1996,  3424, 24093,  2140, 17357, 19395, 18891,  6657,  1998,
          2019,  6970,  7512,  2239,  4200,  2170, 25039,  1011, 17174,  2078,
          1012,   102,  2007, 18856,  8486,  7629,  1005,  1055,  6689,  1010,
          8040, 22658,  1011, 20228, 10593,  1005,  1055,  2190,  1011,  4855,
          3688,  2024,  2085,  3424, 24093,  2140,  4319, 19395, 18891,  6657,
          1998,  2019,  6970,  7512,  2239,  4200,  2170, 25039,  1011, 17174,
          2078,  1011,  1011,  2048,  5850,  2109,  2362,  2000,  7438, 28389,
          1039,  1012,   102]], dtype=torch.int64).to("cuda")
token_type_ids = torch.tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1]], dtype=torch.int64).to("cuda")
attention_mask = torch.tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1]], dtype=torch.int64).to("cuda")


learning_rate = 0.001
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model_compiled.parameters(), lr=learning_rate)

def optimizer_step_fn(optimizer):
    def f():
        optimizer.step()
    return torch.compile(
        f,
        options={
            "trace.enabled": True,
        },
    )

optimizer_step = optimizer_step_fn(optimizer)
optimizer.zero_grad()

outputs = model_compiled(input_ids = input_ids, token_type_ids = token_type_ids, attention_mask = attention_mask)
loss = criterion(outputs.logits, labels)
loss.backward()
optimizer_step()

This is my code which generates fx graph. Inside backward fx graph this " torch.ops.prims.unsafe_index_put .default" operation name is generated.

I cannot reproduce the issue using:

slice_4 = torch.randint(1, 512, [1, 62], dtype=torch.int64)
full_default_3 = torch.rand(512, 768, dtype=torch.float32)
_unsafe_index_put = torch.ops.prims._unsafe_index_put_.default(full_default_3, slice_4, torch.rand(62, 768), True)
_unsafe_index_put 
# tensor([[0.5664, 0.9468, 0.7460,  ..., 0.1945, 0.1702, 0.3088],
#         [0.0828, 0.1542, 0.1710,  ..., 0.2704, 0.7797, 0.2151],
#         [0.5169, 0.3854, 0.3590,  ..., 0.3073, 0.8750, 0.3119],
#         ...,
#         [0.4596, 0.4713, 0.2332,  ..., 0.7994, 0.5864, 0.2013],
#         [0.6200, 0.7418, 0.4930,  ..., 0.8694, 0.2228, 0.8329],
#         [0.8061, 0.6463, 0.4386,  ..., 0.1215, 0.1760, 0.8339]])

(and fixing the invalid indices) in torch==2.4.0.dev20240506+cu124.

@ptrblck I am using torch 2.2.1 to run the operation and getting this error. I have included my conda environment installed packages. Which torch version you are using to run the operation?

_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
aiohttp 3.9.3 pypi_0 pypi
aiosignal 1.3.1 pypi_0 pypi
astunparse 1.6.3 pypi_0 pypi
atk-1.0 2.36.0 ha1a6a79_0
attrs 23.2.0 pypi_0 pypi
bzip2 1.0.8 h5eee18b_5
c-ares 1.19.1 h5eee18b_0
ca-certificates 2024.2.2 hbcca054_0 conda-forge
cairo 1.16.0 hb05425b_5
certifi 2024.2.2 pypi_0 pypi
charset-normalizer 3.3.2 pypi_0 pypi
cmake 3.26.4 h96355d8_0
contourpy 1.2.1 pypi_0 pypi
cycler 0.12.1 pypi_0 pypi
datasets 2.18.0 pypi_0 pypi
dill 0.3.8 pypi_0 pypi
evaluate 0.4.1 pypi_0 pypi
expat 2.5.0 h6a678d5_0
expecttest 0.2.1 pypi_0 pypi
filelock 3.13.1 pypi_0 pypi
font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge
font-ttf-inconsolata 3.000 h77eed37_0 conda-forge
font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge
font-ttf-ubuntu 0.83 h77eed37_1 conda-forge
fontconfig 2.14.1 h4c34cd2_2
fonts-conda-ecosystem 1 0 conda-forge
fonts-conda-forge 1 0 conda-forge
fonttools 4.51.0 pypi_0 pypi
freetype 2.10.4 h0708190_1 conda-forge
fribidi 1.0.10 h36c2ea0_0 conda-forge
frozenlist 1.4.1 pypi_0 pypi
fsspec 2024.2.0 pypi_0 pypi
gdk-pixbuf 2.42.10 h5eee18b_0
glib 2.78.4 h6a678d5_0
glib-tools 2.78.4 h6a678d5_0
gobject-introspection 1.72.0 py311hbb6d50b_2
graphite2 1.3.14 h295c915_1
graphviz 2.50.0 h3cd0ef9_0
gtk2 2.24.33 h73c1081_2
gts 0.7.6 h64030ff_2 conda-forge
harfbuzz 4.3.0 hf52aaf7_1
huggingface-hub 0.21.4 pypi_0 pypi
hypothesis 6.98.17 pypi_0 pypi
icu 58.2 hf484d3e_1000 conda-forge
idna 3.6 pypi_0 pypi
intel-openmp 2023.1.0 hdb19cb5_46306
jinja2 3.1.3 pypi_0 pypi
jpeg 9e h166bdaf_1 conda-forge
kiwisolver 1.4.5 pypi_0 pypi
krb5 1.20.1 h143b758_1
ld_impl_linux-64 2.38 h1181459_1
lerc 3.0 h295c915_0
libcurl 8.5.0 h251f7ec_0
libdeflate 1.17 h5eee18b_1
libedit 3.1.20230828 h5eee18b_0
libev 4.33 h7f8727e_1
libffi 3.4.4 h6a678d5_0
libgcc-ng 11.2.0 h1234567_1
libgd 2.3.3 h695aa2c_1
libglib 2.78.4 hdc74915_0
libgomp 11.2.0 h1234567_1
libiconv 1.17 h166bdaf_0 conda-forge
libnghttp2 1.57.0 h2d74bed_0
libpng 1.6.39 h5eee18b_0
librsvg 2.54.4 h36cc946_3
libssh2 1.10.0 hdbd6064_2
libstdcxx-ng 11.2.0 h1234567_1
libtiff 4.5.1 h6a678d5_0
libtool 2.4.6 h6a678d5_1009
libuuid 1.41.5 h5eee18b_0
libuv 1.44.2 h5eee18b_0
libwebp-base 1.3.2 h5eee18b_0
libxcb 1.15 h7f8727e_0
libxml2 2.10.4 hcbfbd50_0
lz4-c 1.9.4 h6a678d5_0
markupsafe 2.1.5 pypi_0 pypi
matplotlib 3.8.4 pypi_0 pypi
mkl 2023.1.0 h213fc3f_46344
mkl-include 2023.1.0 h06a4308_46344
mpmath 1.3.0 pypi_0 pypi
multidict 6.0.5 pypi_0 pypi
multiprocess 0.70.16 pypi_0 pypi
ncurses 6.4 h6a678d5_0
networkx 3.2.1 pypi_0 pypi
ninja 1.10.2 h06a4308_5
ninja-base 1.10.2 hd09550d_5
numpy 1.26.4 pypi_0 pypi
nvidia-cublas-cu12 12.1.3.1 pypi_0 pypi
nvidia-cuda-cupti-cu12 12.1.105 pypi_0 pypi
nvidia-cuda-nvrtc-cu12 12.1.105 pypi_0 pypi
nvidia-cuda-runtime-cu12 12.1.105 pypi_0 pypi
nvidia-cudnn-cu12 8.9.2.26 pypi_0 pypi
nvidia-cufft-cu12 11.0.2.54 pypi_0 pypi
nvidia-curand-cu12 10.3.2.106 pypi_0 pypi
nvidia-cusolver-cu12 11.4.5.107 pypi_0 pypi
nvidia-cusparse-cu12 12.1.0.106 pypi_0 pypi
nvidia-nccl-cu12 2.19.3 pypi_0 pypi
nvidia-nvjitlink-cu12 12.4.99 pypi_0 pypi
nvidia-nvtx-cu12 12.1.105 pypi_0 pypi
openssl 3.0.13 h7f8727e_0
packaging 24.0 pypi_0 pypi
pandas 2.2.1 pypi_0 pypi
pango 1.50.7 h05da053_0
pcre2 10.42 hebb0a14_0
pillow 10.2.0 pypi_0 pypi
pip 23.3.1 py311h06a4308_0
pixman 0.40.0 h7f8727e_1
psutil 5.9.8 pypi_0 pypi
pyarrow 15.0.1 pypi_0 pypi
pyarrow-hotfix 0.6 pypi_0 pypi
pyparsing 3.1.2 pypi_0 pypi
python 3.11.8 h955ad1f_0
python-dateutil 2.9.0.post0 pypi_0 pypi
python-graphviz 0.20.1 pyh22cad53_0 conda-forge
pytz 2024.1 pypi_0 pypi
pyyaml 6.0.1 pypi_0 pypi
readline 8.2 h5eee18b_0
regex 2023.12.25 pypi_0 pypi
requests 2.31.0 pypi_0 pypi
responses 0.18.0 pypi_0 pypi
rhash 1.4.3 hdbd6064_0
safetensors 0.4.2 pypi_0 pypi
setuptools 68.2.2 py311h06a4308_0
six 1.16.0 pypi_0 pypi
sortedcontainers 2.4.0 pypi_0 pypi
sqlite 3.41.2 h5eee18b_0
sympy 1.12 pypi_0 pypi
tbb 2021.8.0 hdb19cb5_0
tk 8.6.12 h1ccaba5_0
tokenizers 0.15.2 pypi_0 pypi
torch 2.2.1 pypi_0 pypi
torchvision 0.17.1 pypi_0 pypi
torchviz 0.0.2 pypi_0 pypi
tqdm 4.66.2 pypi_0 pypi
transformers 4.38.2 pypi_0 pypi
triton 2.2.0 pypi_0 pypi
types-dataclasses 0.6.6 pypi_0 pypi
typing-extensions 4.10.0 pypi_0 pypi
tzdata 2024.1 pypi_0 pypi
urllib3 2.2.1 pypi_0 pypi
wheel 0.41.2 py311h06a4308_0
xxhash 3.4.1 pypi_0 pypi
xz 5.4.6 h5eee18b_0
yarl 1.9.4 pypi_0 pypi
zlib 1.2.13 h5eee18b_0
zstd 1.5.5 hc292b87_0

I’m using a nightly binary from yesterday: torch==2.4.0.dev20240506+cu124.

@ptrblck . I used “pip3 install torch==2.4.0.dev20240506+cu124 --index-url https://download.pytorch.org/whl/nightly/cu124” command to install the pytorch nightly version. Still, I am getting the ‘_OpNamespace’ ‘prims’ object has no attribute ‘unsafe_index_put ’ error.

Q2) 1. When I am training a model using pytorch, for each iteration, it generates tensors for inputs, weight, bias, activation and gradients. Is there any way to track tensors’ generation in PyTorch source code? Or in which class (PyTorch source code) these tensors get created?

If you are able to reproduce the issue using my code snippet, I wouldn’t know why it’s failing as it’s working for me. If you are using your own code, post a minimal code snippet to reproduce the issue.

Parameters are not recreated in each iteration. Are you looking for the actual initialization of tensors or which function exactly?

@ptrblck .

  1. I have used your code snippet to run but it is giving me the error. If you want, I can share the conda env installed package list.
  2. I am looking for the actual initialization of tensors. Does PyTorch have a data structure such as a list or dictionary for storing all the created tensors?
  3. Previously I mentioned FX graph generation using Torch Dynamo and Torch Compiler. These FX graphs give names to each tensor. In which class in PyTorch source code it generates the tensor names?

@ptrblck
I will be grateful if you kindly guide me with my questions. Thanks.