Abort() has been called During DataLoader Iteration in Libtorch on Windows (MSVC + CUDA 11.8)

I am experiencing a crash during the iteration of a DataLoader when using Libtorch (C++ API for PyTorch) on Windows. The program fails consistently upon entering the loop for accessing the DataLoader content, regardless of dataset complexity, CUDA or CPU execution, or threading settings (num_workers). Here is a minimal reproducible example:

#include <iostream>
#include <torch/torch.h>

class DummyDataset : public torch::data::Dataset<DummyDataset> {
    private:
        size_t size_;
    public:
        explicit DummyDataset(size_t size) : size_(size) {}

        torch::data::Example<> get(size_t index) override {
            std::cout << "Getting data for index: " << index << std::endl;
            torch::Tensor data = torch::ones({3, 224, 224}, torch::kFloat32); 
            torch::Tensor label = torch::tensor(static_cast<int>(index % 2), torch::kInt64); 
            return {data, label};
        }

        torch::optional<size_t> size() const override {
            return size_;
        }
};

int main() {

    const size_t dataset_size = 12; 
    const int batch_size = 4;
    const int num_workers = 2; 

    auto train_dataset = DummyDataset(dataset_size).map(torch::data::transforms::Stack<>());

    auto train_loader = torch::data::make_data_loader(
        std::move(train_dataset),
        torch::data::DataLoaderOptions().batch_size(batch_size).workers(num_workers)
    );

    std::cout << "Iterating through DataLoader." << std::endl;
   
    for (auto& batch : *train_loader) {  // FAILS HERE!
        std::cout << "Batch data size: " << batch.data.sizes()
                    << ", Batch label size: " << batch.target.sizes() << std::endl;
    }

    return 0;
}

The program crashes when entering the for loop of the DataLoader:

for (auto& batch : *train_loader)

The expected output Batch data size... is never printed. Instead, a Windows error dialog appears stating:

When debugging with gdb, I get the following backtrace:

Reading symbols from .\Debug\Planets_and_Moons.exe...
(No debugging symbols found in .\Debug\Planets_and_Moons.exe)
(gdb) run
Starting program: C:\Users\sauls\Desktop\a\build\Debug\Planets_and_Moons.exe
[New Thread 24504.0x5f08]
[New Thread 24504.0x38c4]
[New Thread 24504.0x5ca4]
[New Thread 24504.0x1fec]
Iterating through DataLoader.[New Thread 24504.0x4e00]

gdb: unknown target exception 0xe06d7363 at 0x7ff93afffa4c

Thread 1 received signal ?, Unknown signal.
0x00007ff93afffa4c in RaiseException () from C:\Windows\System32\KernelBase.dll
(gdb) bt
#0  0x00007ff93afffa4c in RaiseException () from C:\Windows\System32\KernelBase.dll
#1  0x00007ff91aa35267 in _CxxThrowException () from C:\Windows\SYSTEM32\vcruntime140.dll
#2  0x00007ff8e7eb6bfe in c10!?torchCheckFail@detail@c10@@YAXPEBD0I0@Z () from C:\Users\sauls\Desktop\a\build\Debug\c10.dll
#3  0x00007ff8e7e68d89 in c10!?torchInternalAssertFail@detail@c10@@YAXPEBD0I0UCompileTimeEmptyString@12@@Z ()
   from C:\Users\sauls\Desktop\a\build\Debug\c10.dll
#4  0x00007ff73b9cfdbf in Planets_and_Moons!??1Await@ivalue@c10@@UEAA@XZ ()
#5  0x00007ff73b9d4f81 in Planets_and_Moons!??1Await@ivalue@c10@@UEAA@XZ ()
#6  0x00007ff73b9d4e9d in Planets_and_Moons!??1Await@ivalue@c10@@UEAA@XZ ()
#7  0x00007ff73b9d5e98 in Planets_and_Moons!??1Await@ivalue@c10@@UEAA@XZ ()
#8  0x00007ff73b9d5f55 in Planets_and_Moons!??1Await@ivalue@c10@@UEAA@XZ ()
#9  0x00007ff73b9c8c0e in Planets_and_Moons!??1Await@ivalue@c10@@UEAA@XZ ()
#10 0x00007ff73b997705 in Planets_and_Moons!??1Await@ivalue@c10@@UEAA@XZ ()
#11 0x00007ff73b9db969 in Planets_and_Moons!??1Await@ivalue@c10@@UEAA@XZ ()
#12 0x00007ff73b9db80e in Planets_and_Moons!??1Await@ivalue@c10@@UEAA@XZ ()
#13 0x00007ff73b9db6ce in Planets_and_Moons!??1Await@ivalue@c10@@UEAA@XZ ()
--Type <RET> for more, q to quit, c to continue without paging--c
#14 0x00007ff73b9db9fe in Planets_and_Moons!??1Await@ivalue@c10@@UEAA@XZ ()
#15 0x00007ff93cf7259d in KERNEL32!BaseThreadInitThunk () from C:\Windows\System32\kernel32.dll
#16 0x00007ff93d7eaf38 in ntdll!RtlUserThreadStart () from C:\Windows\SYSTEM32\ntdll.dll
#17 0x0000000000000000 in ?? ()

Environment:

  • OS: Windows 11
  • Compiler: MSVC 14.29 (Visual Studio 2019)
  • Libtorch Version: 2.5.1+cu118 (CUDA) (also tested with 2.5.1+cpu, 2.4.1+cu118 and 2.1.2+cu118)
  • CUDA Version: 11.8
  • GPU: NVIDIA GeForce RTX 3070 (laptop)

What I have tried:

  1. Disable Multithreading: Set num_workers = 0 to avoid threading issues (Issue persists).
  2. Change batch size: Set batch_size=1 (Issue persists).
  3. Simplified Dataset: Replaced dataset logic with dummy tensors (Issue persists).
  4. Force CPU Execution: Used CPU tensors only (no .to(torch::kCUDA)). (Issue persists).
  5. Checked .dll Placement: All necessary .dll files are in the same directory as the executable. (Issue persists).
  6. Reinstalled libtorch, cuda and visual studio tools (Issue persists).
  7. Access DataLoader Without Loop (Issue persists):
auto iterator = data_loader->begin();
auto batch = *iterator; // FAILS HERE
  1. Small tests to see if cuda was available and/or if threading produced errors, but all worked fine.
#include <iostream>
#include <thread>
#include <torch/torch.h>

void cuda_thread_test(int thread_id) {
    try {
        torch::Tensor tensor = torch::ones({100, 100}, torch::kCUDA);

        tensor = tensor * 2;

        std::cout << "Thread " << thread_id << ": Tensor sum = " << tensor.sum().item<float>() << std::endl;
    } catch (const std::exception& e) {
        std::cerr << "Thread " << thread_id << ": Exception: " << e.what() << std::endl;
    }
}

int main() {
    if (!torch::cuda::is_available()) {
        std::cerr << "CUDA is not available!" << std::endl;
        return -1;
    }

    std::cout << "CUDA is available!" << std::endl;

    const int num_threads = 4;
    std::thread threads[num_threads];

    for (int i = 0; i < num_threads; ++i) {
        threads[i] = std::thread(cuda_thread_test, i);
    }

    for (int i = 0; i < num_threads; ++i) {
        threads[i].join();
    }

    std::cout << "All threads completed successfully." << std::endl;
    return 0;
}

Questions

  1. What could cause a crash during the iteration of a DataLoader, considering the extensive testing and discarded possibilities (e.g., threading, dataset logic, CUDA availability)?
  2. Are there additional debugging steps I can take to identify the root cause?
  3. Is this a known issue with Libtorch on Windows?

I have found a solution to the problem I was experiencing with the DataLoader crash during iteration. In case anyone else encounters this issue in the future, here’s what worked for me.

Instead of building and running the application in Debug mode:

cmake --build .
./Debug/program.exe

You need to compile and run it in Release mode:

cmake --build . --config Release
./Release/program.exe

If anyone has an explanation of why this works, I would appreciate it.