Slowdown in CPU-based Preprocessing After Loading Model Weights onto GPU

I discovered that the issue is related to the “subprocess” module, as discussed in Torch models on GPU slow down python subprocess module?. In my preprocessing code, I have utilized the subprocess module, as shown below.

from subprocess import PIPE, Popen
input = sample.tobytes()
cmd = [
    "/usr/bin/sox", 
    ...
]
p = Popen(cmd, stdin=PIPE, stdout=PIPE, stderr=PIPE)
out, err = p.communicate(input)

Actually I conducted a test by loading the model weight on the CPU and observed the same issue as described earlier. What is your perspective on how the model weight influences the functioning of the subprocess? Could you offer any advice or insights on this matter?


+ version information

[pip3] numpy==1.26.2
[pip3] torch==1.12.1+cu113
[pip3] torchaudio==0.12.1+cu113