Second forward call of torchscripted module breaks on cuda

AndriyMulyar · June 16, 2021, 5:10pm

I have a torchscripted module which I am testing out for prediction. Each instance is a dictionary of tensors. The first instance I pass into the torchscripted module correctly runs through the model and generates an acceptable output. The second instance to the same torchscript object causes the below error. This only occurs when running on a cuda device and not on CPU. I have insured its not a problem with the data by passing in the exact same instance twice and observing the error get thrown on the second forward pass. Any idea what this could be?

Traceback (most recent call last):
  File "predict.py", line 60, in <module>
    output_greedy = script(batch)
  File "xxx/env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
RuntimeError: default_program(22): error: extra text after expected end of number

1 error detected in the compilation of "default_program".

nvrtc compilation failed: 

#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)


template<typename T>
__device__ T maximum(T a, T b) {
  return isnan(a) ? a : (a > b ? a : b);
}

template<typename T>
__device__ T minimum(T a, T b) {
  return isnan(a) ? a : (a < b ? a : b);
}

extern "C" __global__
void fused_neg_add_mul(float* t0, float* aten_mul) {
{
  if (512 * blockIdx.x + threadIdx.x<67 ? 1 : 0) {
    float v = __ldg(t0 + (512 * blockIdx.x + threadIdx.x) % 67);
    aten_mul[512 * blockIdx.x + threadIdx.x] = ((0.f - v) + 1.f) * -1.000000020040877e+20.f;
  }
}
}

ptrblck · June 17, 2021, 5:12am

Are you scripting the same model in the same file and the second invocation of torch.jit.script(model) raises this issue?
If so, do you see the error on any model or just your custom one? Could you also post the output of python -m torch.utils.collect_env here, please?

AndriyMulyar · June 17, 2021, 3:24pm

Are you scripting the same model in the same file and the second invocation of torch.jit.script(model) raises this issue?
I tried the following and got the same error. Scripting in the same file and then running the two prediction calls and scripting in the file, saving to disk, reloading in the same file and running the two inference calls. I was originally scripting with module.to_torchscript() in pytorch lightning so tried to switch to invoking torch.jit.script as well. Same error.
If so, do you see the error on any model or just your custom one? Could you also post the output of python -m torch.utils.collect_env here, please?

Collecting environment information...
PyTorch version: 1.8.1+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.10.2

Python version: 3.6 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: Tesla V100-SXM2-16GB
Nvidia driver version: 460.80
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.4
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] pytorch-lightning==1.3.5
[pip3] torch==1.8.1
[pip3] torch4u==0.0.1
[pip3] torchmetrics==0.3.2
[conda] Could not collect

AndriyMulyar · June 17, 2021, 4:49pm

I tried a basic example with another module and it seems to not have issue. However, the module I am torchscripting is much more complex. Any advice on how one may proceed? I additionally tried running on an A100 with CUDA 11 and hit the same issue where it fails on the second attempt at a forward pass.

import torch
from torch import nn

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

module = NeuralNetwork().to(device='cuda')


script = torch.jit.script(module)
input = torch.rand((1,28,28)).to(device='cuda')

out = script(input)
out = script(input)
print(out)

Joshua_Patterson · November 18, 2021, 5:43am

I’m getting the same issue.

I load my script model and simply loop over images. The 2nd image results in this error every time (even when it’s the same image). Is there a way forward?

I triedthe example from @AndriyMulyar and the error does not occure.

tom · November 20, 2021, 7:29am

The second loop doing this is because this is when the JIT fusers try to produce an optimized kernel by default.
This could be your environment if all JIT compilation fails, or a bug in the fuser.

Maybe you can isolate the issue by

making sure something works (from a fuser tutorial or so),
if it does, try to reduce your example by leaving out half of the computation to see if it still happens. If you do this a few times, you can replace inputs by random tensors of the same dtype, shape and stride. This might get us a shareable sellf-contained example.

Best regards

Thomas

Sanster · February 16, 2022, 11:54am

Is there any way to turn off this feature? Speed is not so important in our scenario. Thanks

tom · February 16, 2022, 12:46pm

From adapting the source of torch.jit.fuser, you can disable it with

torch._C._jit_override_can_fuse_on_cpu(False)
torch._C._jit_override_can_fuse_on_gpu(False)
torch._C._jit_set_texpr_fuser_enabled(False)
torch._C._jit_set_nvfuser_enabled(False)

Might be good to add a torch.jit.fuser('off') or so.

prem · September 13, 2022, 11:19pm

I think this is an invalid literal, see [JIT] NVRTC unknown error · Issue #62962 · pytorch/pytorch · GitHub

future-xy · September 20, 2022, 2:46pm

I met the same problem when trying to export BERT large model as torchscript format and serve with Triton Server. My PyTorch version is 1.12.1. The problem occurred during the second forward call.

Seungsu · September 26, 2022, 12:54am

I got the same error. Only when use pytorch 1.12.0

Triton Docker Environment : nvidia pytorch r22.05 (1.12.0) // Error

Local Environment : pytorch 1.11.0 // It works

future-xy · September 26, 2022, 8:32am

Update, local PyTorch 1.12.1 works well but Triton 22.04 failed. I solved it by updating Triton to 22.08.

zhangyuygss · September 29, 2022, 11:13am

Same issue here.
The torch api torch._C._jit_override_can_fuse_on_gpu(False) sucessfully disabled fuse and fix the problem. But when use env variable export PYTORCH_JIT_USE_NNC_NOT_NVFUSER=1 as mentioned here, the env variable seems not working and the error still happen.
I need to use the env var method since I can’t use the torch api for some reason, any idea on this?
Torch version is 1.8.1

ptrblck · September 29, 2022, 3:44pm

Your PyTorch version is too old and you are linking to the README of torch==1.12 which was the first version setting nvFuser on by default.

isa.bayramov · February 7, 2023, 9:30am

I got the same Error in first forward call.

Code to reproduce:

# pip install torch==1.11.0+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
# pip install transformers==4.22

import torch
from transformers import BertForSequenceClassification, AutoTokenizer

options = {
    'max_length': 512,
    'padding': 'max_length',
    'truncation': True,
    'add_special_tokens': True,
    'return_tensors': 'pt',
}
args = {
    'pretrained_model_name_or_path': "dbmdz/bert-base-german-uncased",
    'do_lower_case': True,
    'local_files_only': False,
    'use_fast': False,
}

# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(**args)

# load BERT with classification head
device = 'cuda'
bert = BertForSequenceClassification.from_pretrained(
    pretrained_model_name_or_path="dbmdz/bert-base-german-uncased",
    local_files_only=False,
    num_labels=10,
    torchscript=True,
    return_dict=False,
    )
bert = bert.to(device)

# SAMPLE INFERENCE
text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, **options)
encoded_input = {k: encoded_input[k].to(device) for k in encoded_input}
output = bert(**encoded_input)
output

# Creating the trace
bert.eval()
traced_model = torch.jit.trace(bert, [encoded_input[k] for k in encoded_input.keys()])

# TEST DATASET
text_1 = "Text one."
tokenized_text_1 = tokenizer(text_1, **options)

text_2 = "Text two!"
tokenized_text_2 = tokenizer(text_2, **options)

# Using a traced model for BATCH inference
inference_input_1 = {k:tokenized_text_1[k].unsqueeze(0).to(device) for k in tokenized_text_1.keys()}
inference_input_2 = {k:tokenized_text_2[k].unsqueeze(0).to(device) for k in tokenized_text_2.keys()}

print('FIRST PASS')
traced_model(**inference_input_1)

print('SECOND PASS')
traced_model(**inference_input_2)

Packages:

torch==1.11.0+cu113
transformers==4.22.0
huggingface-hub==0.12.0
tokenizers==0.12.1

Output of python -m torch.utils.collect_env:

Collecting environment information...
PyTorch version: 1.11.0+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: 10.0.0-4ubuntu1 
CMake version: version 3.22.6
Libc version: glibc-2.31

Python version: 3.8.10 (default, Nov 14 2022, 12:59:47)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.10.147+-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.2.152
GPU models and configuration: GPU 0: Tesla T4
Nvidia driver version: 510.47.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.1
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.6
[pip3] torch==1.11.0+cu113
[pip3] torchaudio==0.13.1+cu116
[pip3] torchsummary==1.5.1
[pip3] torchtext==0.14.1
[pip3] torchvision==0.14.1+cu116
[conda] Could not collect

Code Output:

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.

Moving 0 files to the new cache system

0/0 [00:00<?, ?it/s]
Downloading (…)okenizer_config.json: 100%
59.0/59.0 [00:00<00:00, 1.96kB/s]
Downloading (…)lve/main/config.json: 100%
433/433 [00:00<00:00, 16.2kB/s]
Downloading (…)solve/main/vocab.txt: 100%
247k/247k [00:00<00:00, 8.81MB/s]
Downloading (…)"pytorch_model.bin";: 100%
442M/442M [00:11<00:00, 30.4MB/s]

Some weights of the model checkpoint at dbmdz/bert-base-german-uncased were not used when initializing BertForSequenceClassification: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dbmdz/bert-base-german-uncased and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

FIRST PASS

---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

<ipython-input-4-8e13bd40136b> in <module>
     53 
     54 print('FIRST PASS')
---> 55 traced_model(**inference_input_1)
     56 
     57 print('SECOND PASS')

/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

RuntimeError: default_program(22): error: extra text after expected end of number

default_program(25): error: extra text after expected end of number

2 errors detected in the compilation of "default_program".

nvrtc compilation failed: 

#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)


template<typename T>
__device__ T maximum(T a, T b) {
  return isnan(a) ? a : (a > b ? a : b);
}

template<typename T>
__device__ T minimum(T a, T b) {
  return isnan(a) ? a : (a < b ? a : b);
}

extern "C" __global__
void fused_mul_div_add(float* tattention_scores_1, float* tv_, float* aten_add, float* aten_mul) {
{
if (blockIdx.x<1ll ? 1 : 0) {
    float v = __ldg(tv_ + (long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x));
    aten_mul[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = v * -3.402823466385289e+38.f;
  }  float v_1 = __ldg(tattention_scores_1 + (long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x));
  float v_2 = __ldg(tv_ + ((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) % 512ll);
  aten_add[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = v_1 / 8.f + v_2 * -3.402823466385289e+38.f;
}
}

ptrblck · February 7, 2023, 9:43am

Could you update PyTorch to the latest stable or nightly release and check if you would still see the same error, please?

isa.bayramov · February 7, 2023, 9:53am

New Version:
torch==1.13.1+cu116
!pip install torch --extra-index-url https://download.pytorch.org/whl/cu116

New Error in first forward call:

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.

Moving 0 files to the new cache system

0/0 [00:00<?, ?it/s]
Downloading (…)okenizer_config.json: 100%
59.0/59.0 [00:00<00:00, 2.29kB/s]
Downloading (…)lve/main/config.json: 100%
433/433 [00:00<00:00, 19.5kB/s]
Downloading (…)solve/main/vocab.txt: 100%
247k/247k [00:00<00:00, 1.00MB/s]
Downloading (…)"pytorch_model.bin";: 100%
442M/442M [00:01<00:00, 287MB/s]

Some weights of the model checkpoint at dbmdz/bert-base-german-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at dbmdz/bert-base-german-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

FIRST PASS

---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

<ipython-input-4-8e13bd40136b> in <module>
     53 
     54 print('FIRST PASS')
---> 55 traced_model(**inference_input_1)
     56 
     57 print('SECOND PASS')

/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1192         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194             return forward_call(*input, **kwargs)
   1195         # Do not call functions when jit is used
   1196         full_backward_hooks, non_full_backward_hooks = [], []

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py(280): transpose_for_scores
/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py(315): forward
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1182): _slow_forward
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1194): _call_impl
/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py(427): forward
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1182): _slow_forward
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1194): _call_impl
/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py(497): forward
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1182): _slow_forward
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1194): _call_impl
/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py(611): forward
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1182): _slow_forward
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1194): _call_impl
/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py(1022): forward
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1182): _slow_forward
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1194): _call_impl
/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py(1560): forward
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1182): _slow_forward
/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1194): _call_impl
/usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py(976): trace_module
/usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py(759): trace
<ipython-input-4-8e13bd40136b>(41): <module>
/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py(3326): run_code
/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py(3249): run_ast_nodes
/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py(3057): run_cell_async
/usr/local/lib/python3.8/dist-packages/IPython/core/async_helpers.py(68): _pseudo_sync_runner
/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py(2881): _run_cell
/usr/local/lib/python3.8/dist-packages/IPython/core/interactiveshell.py(2854): run_cell
/usr/local/lib/python3.8/dist-packages/ipykernel/zmqshell.py(536): run_cell
/usr/local/lib/python3.8/dist-packages/ipykernel/ipkernel.py(306): do_execute
/usr/local/lib/python3.8/dist-packages/tornado/gen.py(209): wrapper
/usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py(543): execute_request
/usr/local/lib/python3.8/dist-packages/tornado/gen.py(209): wrapper
/usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py(268): dispatch_shell
/usr/local/lib/python3.8/dist-packages/tornado/gen.py(209): wrapper
/usr/local/lib/python3.8/dist-packages/ipykernel/kernelbase.py(365): process_one
/usr/local/lib/python3.8/dist-packages/tornado/gen.py(748): run
/usr/local/lib/python3.8/dist-packages/tornado/gen.py(787): inner
/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py(743): _run_callback
/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py(690): <lambda>
/usr/lib/python3.8/asyncio/events.py(81): _run
/usr/lib/python3.8/asyncio/base_events.py(1859): _run_once
/usr/lib/python3.8/asyncio/base_events.py(570): run_forever
/usr/local/lib/python3.8/dist-packages/tornado/platform/asyncio.py(149): start
/usr/local/lib/python3.8/dist-packages/ipykernel/kernelapp.py(612): start
/usr/local/lib/python3.8/dist-packages/traitlets/config/application.py(992): launch_instance
/usr/local/lib/python3.8/dist-packages/ipykernel_launcher.py(16): <module>
/usr/lib/python3.8/runpy.py(87): _run_code
/usr/lib/python3.8/runpy.py(194): _run_module_as_main
RuntimeError: shape '[1, 1, 12, 64]' is invalid for input of size 393216

ptrblck · February 7, 2023, 9:59am

Thanks for the update! Based on this error it seems the code generation issue is gone and now a shape error is raised.
Is your model working fine in “eager” mode and are you only seeing this error in TorchScript via torch.jit.trace?

isa.bayramov · February 7, 2023, 10:26am

Works for single input but throws error for batch input.

Code snippet:

bert = bert.to(device)
bert.eval()

# Using model for single inference
print('Single Input for Inference')
inference_input_1 = {k:tokenized_text_1[k].to(device) for k in tokenized_text_1.keys()}
inference_input_2 = {k:tokenized_text_2[k].to(device) for k in tokenized_text_2.keys()}

print('FIRST PASS')
bert(**inference_input_1)

print('SECOND PASS')
bert(**inference_input_2)


# Using model for BATCH inference
print('Batch Input for Inference')
inference_input_1 = {k:tokenized_text_1[k].unsqueeze(0).to(device) for k in tokenized_text_1.keys()}
inference_input_2 = {k:tokenized_text_2[k].unsqueeze(0).to(device) for k in tokenized_text_2.keys()}

print('FIRST PASS')
bert(**inference_input_1)

print('SECOND PASS')
bert(**inference_input_2)

Output:

Single Input for Inference
FIRST PASS
SECOND PASS
Batch Input for Inference
FIRST PASS

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-15-9f0a1fbf549b> in <module>
     56 
     57 print('FIRST PASS')
---> 58 bert(**inference_input_1)
     59 
     60 print('SECOND PASS')

3 frames

/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    974             raise ValueError("You have to specify either input_ids or inputs_embeds")
    975 
--> 976         batch_size, seq_length = input_shape
    977         device = input_ids.device if input_ids is not None else inputs_embeds.device
    978 

ValueError: too many values to unpack (expected 2)

ptrblck · February 7, 2023, 11:24am

I assume your code now breaks without scripting it in:

--> 976         batch_size, seq_length = input_shape

which is again another error compared to the previous stacktrace.
This time it fails as it expects input_shape to contain two values, which isn’t the case, so make sure your inputs have the expected shape and number of dimensions.