Hello, we encountered a ‘file name too long’ bug while using PyTorch compile. The specific error is:
OSError: [Errno 36] File name too long: '/tmp/torchinductor_hadoop-pnc/triton/3/b13585fb327ef97fcb905e2d1fea4abede414a1d70d76544643737e47b9f4dd6/__grp__triton_per_fused__log_softmax__to_copy_abs_add_binary_cross_entropy_with_logits_div_gt_hypot_lt_mean_mse_loss_mul_neg_nll_loss_forward_pow_randint_relu_rsub_sigmoid_smooth_l1_loss_sub_sum_where_68.json.tmp.pid_3437_3b13d3c6-b5ce-4ce1-976a-5d1c7fffeef5'
detail error log
[rank1]: Traceback (most recent call last):
[rank1]: File “/usr/local/lib/python3.11/site-packages/torch/_inductor/compile_worker/subproc_pool.py”, line 337, in do_job
[rank1]: result = job()
[rank1]: ^^^^^
[rank1]: File “/usr/local/lib/python3.11/site-packages/torch/_inductor/runtime/compile_tasks.py”, line 61, in _worker_compile_triton
[rank1]: kernel.precompile(warm_cache_only=True)
[rank1]: File “/usr/local/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py”, line 267, in precompile
[rank1]: self._precompile_worker()
[rank1]: File “/usr/local/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py”, line 296, in _precompile_worker
[rank1]: compile_results.append(self._precompile_config(c))
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File “/usr/local/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py”, line 537, in _precompile_config
[rank1]: binary = triton.compile(*compile_args, **compile_kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File “/usr/local/lib/python3.11/site-packages/triton/compiler/compiler.py”, line 300, in compile
[rank1]: fn_cache_manager.put_group(metadata_filename, metadata_group)
[rank1]: File “/usr/local/lib/python3.11/site-packages/triton/runtime/cache.py”, line 105, in put_group
[rank1]: return self.put(grp_contents, grp_filename, binary=False)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File “/usr/local/lib/python3.11/site-packages/triton/runtime/cache.py”, line 122, in put
[rank1]: with open(temp_path, mode) as f:
[rank1]:
OSError: [Errno 36] File name too long: ‘/tmp/torchinductor_hadoop-pnc/triton/3/b13585fb327ef97fcb905e2d1fea4abede414a1d70d76544643737e47b9f4dd6/__grp__triton_per_fused__log_softmax__to_copy_abs_add_binary_cross_entropy_with_logits_div_gt_hypot_lt_mean_mse_loss_mul_neg_nll_loss_forward_pow_randint_relu_rsub_sigmoid_smooth_l1_loss_sub_sum_where_68.json.tmp.pid_3437_3b13d3c6-b5ce-4ce1-976a-5d1c7fffeef5
It appears that when compiling, the names for the generated Triton kernels are concatenated from the names of the fused operators. This can easily result in excessively long filenames when a large number of operators are fused. Are there any methods to resolve this issue?
My PyTorch version is 2.7.0