zmq.error.ZMQError: Address already in use


I was testing out a vanilla feed-forward neural network to predict MNIST dataset on macOS 11.2.3, python 3.7, torch version 1.8 using Jupyter lab notebook and ran into a zmq.error type of error.

Version information:
torch == 1.8.1
python == 3.7.10
macOS == 11.2.3 (BigSur)
Processor: 2.3 GHz Quad-Core Intel Core i5
Graphics: Intel Iris Plus Graphics 655 1536 MB
Memory: 16 GB 2133 MHz LPDDR3

Excepted outcome:
Loss function and model training on the MNIST dataset

I was able to run this on an earlier macOS version and on a separate linux env on local cluster.

Full error message :

[W ParallelNative.cpp:206] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
[W ParallelNative.cpp:206] Warning: Cannot set number of intraop threads after parallel work has started or after set_num_threads call when using native parallel backend (function set_num_threads)
OMP: Error #15: Initializing libiomp5.dylib, but found libomp.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see

File “zmq/backend/cython/socket.pyx”, line 540, in zmq.backend.cython.socket.Socket.bind
File “zmq/backend/cython/checkrc.pxd”, line 28, in zmq.backend.cython.checkrc._check_rc
zmq.error.ZMQError: Address already in use

Code used:

#Device setting 
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

#Import MNIST dataset
train_dataset = torchvision.datasets.MNIST(root='data/',

test_dataset = torchvision.datasets.MNIST(root='data/',
input_tensor, label = train_dataset[0]
print('MNIST dataset with {} train data and {} test data'.format(len(train_dataset), len(test_dataset)))
print('Type of data in dataset: {} AND {}'.format(type(input_tensor), type(label)))
print('Input tensor image dimensions: {}'.format(input_tensor.shape))

#Convert dataset to a dataloader class for ease of doing batching and SGD operations 
from import Dataset, DataLoader
train_loader = DataLoader(dataset = train_dataset,
                          batch_size = batch_size,
                          num_workers = 2)

test_loader = DataLoader(dataset = test_dataset,
                        batch_size = batch_size, 
                        num_workers = 2)

#Take a look at one batch 
examples = iter(train_loader)
samples, labels =
print(samples.shape, labels.shape)

#Plotting first 4 digits in the dataset: 
for i in range(4):
    plt.subplot(2, 2, i+1)
    plt.imshow(samples[i][0],, interpolation="nearest")

#Model hyper-parameters for the fully connected Neural network 
input_size = 784 # Image input for the digits - 28 x 28 x 1 (W-H-C) -- flattened in the end before being fed in the NN 
num_hidden_layers = 1
hidden_layer_size = 100
num_classes = 10 
num_epochs = 20
batch_size = 64 
learning_rate = 0.01

#Define a model 
class NeuralNet(nn.Module):
    def __init__(self, input_size, num_hidden_layers, hidden_layer_size, num_classes):
        super(NeuralNet, self).__init__()
        self.L1 = nn.Linear(in_features = input_size, out_features = hidden_layer_size)
        self.relu = nn.ReLU()
        self.num_hidden_layers = num_hidden_layers
        if (self.num_hidden_layers-1) > 1:
            self.L_hidden = nn.ModuleList( [nn.Linear(in_features = hidden_layer_size, out_features = hidden_layer_size) for _ in range(num_hidden_layers-1)] )
            self.relu_hidden = nn.ModuleList( [nn.ReLU() for _ in range(num_hidden_layers-1)] )
            self.L2 = nn.Linear(in_features = hidden_layer_size, out_features = hidden_layer_size)
        self.L_out = nn.Linear(in_features = hidden_layer_size, out_features = num_classes)
    def forward(self, x):
        out = self.relu(self.L1(x))
        if (self.num_hidden_layers-1) > 1:
            #print('computing for multiple layers')
            for L_hidden, relu_hidden in zip(self.L_hidden, self.relu_hidden):
                out = relu_hidden(L_hidden(out))
            out = self.relu(self.L2(out))
        out = self.L_out(out) #No softmax or cross-entropy activation just the output from linear transformation
        return out

# model instantiate
model = NeuralNet(input_size=input_size, num_hidden_layers=num_hidden_layers, hidden_layer_size=hidden_layer_size, num_classes=num_classes)

#Loss and optimizer 
criterion = nn.CrossEntropyLoss() #This is implement softmax activation for us so it is not implemented in the model 
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

The Jupyter notebook shutdown with a kernel error when executing this cell:

AsyncIOLoopKernelRestarter: restarting kernel (1/5), keep random ports
#Training loop  --- 
n_total_steps = len(train_loader)
for epoch in range(num_epochs):
    for i, (image_tensors, labels) in enumerate(train_loader):
        #image tensor = 100, 1, 28, 28 --> 100, 784 input needed 
        image_input_to_NN = image_tensors.view(-1,28*28).to(device)
        labels =
        #Forward pass 
        outputs = model(image_input_to_NN)
        loss = criterion(outputs, labels)
        optimizer.zero_grad() #Detach and flush the gradients 
        loss.backward() #Backward gradients evaluation 
        optimizer.step() #To update the weights/parameters in the NN 
        if (epoch) % 10 == 0 and (i+1) % 300 == 0: 
            print(f'epoch {epoch+1} / {num_epochs}, step {i+1}/{n_total_steps}, loss = {loss.item():.4f}')

This appears to be an issue when handling network training. This is my first time posting on this forum, let me know if I need to provide any more details to make debugging this easier.

There is a partial solution to this problem as mentioned in this StackOverflow issue.

But it switches off any parallelization across CPUs for PyTorch and makes the execution extremely slow.