Problems with torch.multiprocessing on CPU

Unfortunately, for quite some time now, I have encountered problems with the module torch.multiprocessing. As a MWE, I am trying to square a PyTorch tensor on CPU, which does not work:

import torch
import numpy as np
import torch.multiprocessing as mp

def square(i, x, queue):
    print('In process {}'.format(i))
    if isinstance(x, np.ndarray):

if __name__ == '__main__':
    print('Number of available CPUs: {}'.format(mp.cpu_count()))

    processes = []
    queue = mp.Queue()

    # Set up some sample data:
    x = np.arange(64)

    # Start <mp.cpu_count()> processes with the square function as the target and an invidual piece of data to process:
    for i in range(mp.cpu_count()):
        start_index = mp.cpu_count()*i
        proc = mp.Process(target = square, args = (i, x[start_index:start_index+mp.cpu_count()], queue))

    # Wait for each process to finish before returning to the main thread:
    for proc in processes:

    # Terminate each process:
    for proc in processes:

    # Convert the multiprocessing queue into a list:
    results = []
    while not queue.empty():


This results in the following error message (Python version 3.8.9, PyTorch version 1.9.0):

Traceback (most recent call last):
  File "", line 55, in <module>
  File "/..../lib/python3.8/multiprocessing/", line 116, in get
    return _ForkingPickler.loads(res)
  File "/..../lib/python3.8/site-packages/torch/multiprocessing/", line 289, in rebuild_storage_fd
    fd = df.detach()
  File "/..../lib/python3.8/multiprocessing/", line 57, in detach
    with _resource_sharer.get_connection(self._id) as conn:
  File "/.../lib/python3.8/multiprocessing/", line 87, in get_connection
    c = Client(address, authkey=process.current_process().authkey)
  File "/.../lib/python3.8/multiprocessing/", line 502, in Client
    c = SocketClient(address)
  File "/.../lib/python3.8/multiprocessing/", line 630, in SocketClient
ConnectionRefusedError: [Errno 111] Connection refused

However, when I import the module multiprocessing instead of torch.multiprocessing and use x = np.arange(64) instead of x = torch.arange(64), the script works fine! I tested the code on two machines, can anybody confirm this weird behavior? And does anybody see what I am doing wrongly? Or is torch.multiprocessing not supposed to be used on CPU?