Tensorboard segmentation fault

Hi.

I’m using pytorch 1.8.1 version with horovod.

I wanted to profile my code but when it reached batch to profile, it gives the following error.

(omitted…)

[1,0]:[node07:11736] *** Process received signal ***
[1,0]:[node07:11736] Signal: Segmentation fault (11)
[1,0]:[node07:11736] Signal code: (-6)
[1,0]:[node07:11736] Failing at address: 0x44e00002dd8
[1,0]:[node07:11736] [ 0] [1,0]:/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fe67d5b6390]
[1,0]:[node07:11736] [ 1] [1,0]:/home/name/.conda/envs/horovod2/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so(+0xe0693fe)[0x7fe615e523fe]
[1,0]:[node07:11736] [ 2] [1,0]:/home/name/.conda/envs/horovod2/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so(+0xe06c808)[0x7fe615e55808]
[1,0]:[node07:11736] [ 3] [1,0]:/home/name/.conda/envs/horovod2/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so(+0xe214858)[0x7fe615ffd858]
[1,0]:[node07:11736] [ 4] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7fe67d5ac6ba]
[1,0]:[node07:11736] [ 5] [1,0]:/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fe67d2e251d]
[1,0]:[node07:11736] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 0 on node node07 exited on signal 11 (Segmentation fault).

Moreover, .json file is not created even if above error does not appeard…

Please let me know how to solve this problem.
thank you.