Cannot measure CUDA usage on MNIST example

I am trying to measure CUDA usage on MNIST example on PyTorch 0.4.0 by following commands, but it failed.
How to avoid this issue?

$ python -m torch.utils.bottleneck main.py

Error message is follows.

===
Taceback (most recent call last):
  File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 280, in <module>
    main()
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 261, in main
    autograd_prof_cpu, autograd_prof_cuda = run_autograd_prof(code, globs)
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 155, in run_autograd_prof
    result.append(run_prof(use_cuda=True))
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/bottleneck/__main__.py", line 149, in run_prof
    exec(code, globs, None)
  File "main.py", line 110, in <module>
    main()
  File "main.py", line 105, in main
    train(args, model, device, train_loader, optimizer, epoch)
  File "main.py", line 29, in train
    for batch_idx, (data, target) in enumerate(train_loader):
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 286, in __next__
    return self._process_next_batch(batch)
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/opt/conda/lib/python3.6/site-packages/torchvision/datasets/mnist.py", line 68, in __getitem__
    img, target = self.train_data[index], self.train_labels[index]
RuntimeError: /pytorch/torch/csrc/autograd/profiler.h:52: initialization error
===

References
mnist example
https://github.com/pytorch/examples/blob/master/mnist/main.py
bottleneck
https://pytorch.org/docs/stable/bottleneck.html

Could you check with 'num_workers': 0 in main.py.

Thank you
It works fine, when I change the num_workers from 1 to 0.