Pytorch SSLError on Dataloader when Workers are greater than 1

I have created a Dataset object that loads some data from an API when loading an item

class MyDataset(Dataset):

    def __init__(self, obj_ids = []):
        """
        """
        super(Dataset, self).__init__()

        self.obj_ids = obj_ids

    def __len__(self):
        return len(self.obj_ids)

    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()

        result = requests.get('/api/url/{}'.format(idx))

        ## Post processing work...

Then I add it to my Dataloader:

data_loader = torch.utils.data.DataLoader(
              dataset, batch_size=2, shuffle=True, num_workers=1,
              collate_fn=utils.collate_fn)

Everything works fine when training this with num_workers=1. But when I increase it to 2 or greater I get an error in my training loop.

On this line:

train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)

SSLError: Caught SSLError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py", line 384, in _make_request
    six.raise_from(e, None)
  File "<string>", line 2, in raise_from
  File "/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py", line 380, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.7/http/client.py", line 1373, in getresponse
    response.begin()
  File "/usr/lib/python3.7/http/client.py", line 319, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.7/http/client.py", line 280, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.7/ssl.py", line 1071, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.7/ssl.py", line 929, in read
    return self._sslobj.read(len, buffer)
ssl.SSLError: [SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:2570)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/usr/local/lib/python3.7/dist-packages/urllib3/connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/local/lib/python3.7/dist-packages/urllib3/util/retry.py", line 399, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='mydomain.com', port=443): Max retries exceeded with url: 'url_with_error_is_here' (Caused by SSLError(SSLError(1, '[SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:2570)')))

If I remove the post request, I stop getting the SSL error, so the problem most be something with the requests.post library or urllib maybe.

I changed the domain and url on the error to dummy values, but both url’s and domains work when having just 1 worker.

I’m running this in a google collab environment with GPU enabled, but also tried it on my local machine and getting the same problem.

Can anyone help me to solve this issue?

Based on the error description it seems you are running into an error using multiprocessing and SSL, which seems to be related to this post.