Cannot call sizes() on tensor with symbolic sizes/strides

Hi,
I have an issue using torch.compile(). I’m currently using nightly, with the env being:

Singularity> python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 2.1.0.dev20230904+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 11 (bullseye) (x86_64)
GCC version: (Debian 10.2.1-6) 10.2.1 20210110
Clang version: Could not collect
CMake version: version 3.27.2
Libc version: glibc-2.31

Python version: 3.11.4 (main, Aug 16 2023, 05:31:52) [GCC 10.2.1 20210110] (64-bit runtime)
Python platform: Linux-5.14.0-284.25.1.el9_2.x86_64-x86_64-with-glibc2.31
Is CUDA available: False
CUDA runtime version: 12.2.140
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No devices found.
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          128
On-line CPU(s) list:             0-127
Thread(s) per core:              1
Core(s) per socket:              64
Socket(s):                       2
NUMA node(s):                    4
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD EPYC 7742 64-Core Processor
Stepping:                        0
Frequency boost:                 enabled
CPU MHz:                         3271.206
CPU max MHz:                     2250.0000
CPU min MHz:                     1500.0000
BogoMIPS:                        4499.93
Virtualization:                  AMD-V
L1d cache:                       4 MiB
L1i cache:                       4 MiB
L2 cache:                        64 MiB
L3 cache:                        512 MiB
NUMA node0 CPU(s):               0-31
NUMA node1 CPU(s):               32-63
NUMA node2 CPU(s):               64-95
NUMA node3 CPU(s):               96-127
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Retbleed:          Mitigation; untrained return thunk; SMT disabled
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es

Versions of relevant libraries:
[pip3] mypy==1.5.1
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.24.1
[pip3] pytorch-lightning==2.0.8
[pip3] pytorch-metric-learning==2.3.0
[pip3] pytorch-ranger==0.1.1
[pip3] pytorch-triton==2.1.0+e6216047b8
[pip3] torch==2.1.0.dev20230904+cu121
[pip3] torch-optimizer==0.3.0
[pip3] torchaudio==2.2.0.dev20230904+cu121
[pip3] torchmetrics==0.11.4
[pip3] torchvision==0.16.0.dev20230904+cu121
[pip3] triton==2.0.0
[conda] Could not collect

And the issue:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/torch/fx/graph_module.py", line 274, in __call__
    return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<eval_with_key>.179", line 23, in forward
    matmul = torch.matmul(permute_2, transpose);  permute_2 = transpose = None
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Cannot call sizes() on tensor with symbolic sizes/strides

Exception raised from sizes_default at ../c10/core/TensorImpl.h:617 (most recent call first):

Call using an FX-traced Module, line 23 of the traced Module's generated forward function:
    transpose = permute.transpose(-1, -2);  permute = None
    matmul = torch.matmul(permute_2, transpose);  permute_2 = transpose = None

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    truediv = matmul / 8.0;  matmul = None

    add = truediv + mul;  truediv = mul = None


While executing %submod_1 : [num_users=2] = call_module[target=compiled_submod_1](args = (%getitem, %getitem_1, %getitem_2), kwargs = {})
Original traceback:
None


You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True

I tried to add dynamic=True to torch.compile(), but that did not help. The only difference between two runs is the added compilation. The non-compiled run is successful, whereas the compiled one fails on the first step of sanity checking dataloader (with no gradients saved, validation dataloader).

You might need to disable https://github.com/pytorch/pytorch/blob/main/torch/_dynamo/config.py#L73

automatic_dynamic_shapes = True instead since you’re using nightlies

Thanks @marksaroufim, it did help!