Simple PytorchBenchmark script gives CUDA forked subprocess error

I am trying to run a simple benchmark script, but it fails due to a CUDA error, which leads to another error:

Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Traceback (most recent call last):
  File "/home/cbarkhof/code-thesis/Experimentation/Benchmarking/", line 23, in <module>
  File "/home/cbarkhof/.local/lib/python3.6/site-packages/transformers/benchmark/", line 674, in run
    memory, inference_summary = self.inference_memory(model_name, batch_size, sequence_length)
ValueError: too many values to unpack (expected 2)

My script is simply:

from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments

benchmark_args = PyTorchBenchmarkArguments(models=["bert-base-uncased"],
                                           sequence_lengths=[8, 32, 128, 512],

benchmark = PyTorchBenchmark(benchmark_args)

I am not aware of doing any multi-processing, so why is this happening?

If anyone can point me to why this might be happening. Please let me know :). Cheers!