I am experiencing a crash during the iteration of a DataLoader
when using Libtorch (C++ API for PyTorch) on Windows. The program fails consistently upon entering the loop for accessing the DataLoader content, regardless of dataset complexity, CUDA or CPU execution, or threading settings (num_workers). Here is a minimal reproducible example:
#include <iostream>
#include <torch/torch.h>
class DummyDataset : public torch::data::Dataset<DummyDataset> {
private:
size_t size_;
public:
explicit DummyDataset(size_t size) : size_(size) {}
torch::data::Example<> get(size_t index) override {
std::cout << "Getting data for index: " << index << std::endl;
torch::Tensor data = torch::ones({3, 224, 224}, torch::kFloat32);
torch::Tensor label = torch::tensor(static_cast<int>(index % 2), torch::kInt64);
return {data, label};
}
torch::optional<size_t> size() const override {
return size_;
}
};
int main() {
const size_t dataset_size = 12;
const int batch_size = 4;
const int num_workers = 2;
auto train_dataset = DummyDataset(dataset_size).map(torch::data::transforms::Stack<>());
auto train_loader = torch::data::make_data_loader(
std::move(train_dataset),
torch::data::DataLoaderOptions().batch_size(batch_size).workers(num_workers)
);
std::cout << "Iterating through DataLoader." << std::endl;
for (auto& batch : *train_loader) { // FAILS HERE!
std::cout << "Batch data size: " << batch.data.sizes()
<< ", Batch label size: " << batch.target.sizes() << std::endl;
}
return 0;
}
The program crashes when entering the for
loop of the DataLoader
:
for (auto& batch : *train_loader)
The expected output Batch data size...
is never printed. Instead, a Windows error dialog appears stating:
When debugging with gdb, I get the following backtrace:
Reading symbols from .\Debug\Planets_and_Moons.exe...
(No debugging symbols found in .\Debug\Planets_and_Moons.exe)
(gdb) run
Starting program: C:\Users\sauls\Desktop\a\build\Debug\Planets_and_Moons.exe
[New Thread 24504.0x5f08]
[New Thread 24504.0x38c4]
[New Thread 24504.0x5ca4]
[New Thread 24504.0x1fec]
Iterating through DataLoader.[New Thread 24504.0x4e00]
gdb: unknown target exception 0xe06d7363 at 0x7ff93afffa4c
Thread 1 received signal ?, Unknown signal.
0x00007ff93afffa4c in RaiseException () from C:\Windows\System32\KernelBase.dll
(gdb) bt
#0 0x00007ff93afffa4c in RaiseException () from C:\Windows\System32\KernelBase.dll
#1 0x00007ff91aa35267 in _CxxThrowException () from C:\Windows\SYSTEM32\vcruntime140.dll
#2 0x00007ff8e7eb6bfe in c10!?torchCheckFail@detail@c10@@YAXPEBD0I0@Z () from C:\Users\sauls\Desktop\a\build\Debug\c10.dll
#3 0x00007ff8e7e68d89 in c10!?torchInternalAssertFail@detail@c10@@YAXPEBD0I0UCompileTimeEmptyString@12@@Z ()
from C:\Users\sauls\Desktop\a\build\Debug\c10.dll
#4 0x00007ff73b9cfdbf in Planets_and_Moons!??1Await@ivalue@c10@@UEAA@XZ ()
#5 0x00007ff73b9d4f81 in Planets_and_Moons!??1Await@ivalue@c10@@UEAA@XZ ()
#6 0x00007ff73b9d4e9d in Planets_and_Moons!??1Await@ivalue@c10@@UEAA@XZ ()
#7 0x00007ff73b9d5e98 in Planets_and_Moons!??1Await@ivalue@c10@@UEAA@XZ ()
#8 0x00007ff73b9d5f55 in Planets_and_Moons!??1Await@ivalue@c10@@UEAA@XZ ()
#9 0x00007ff73b9c8c0e in Planets_and_Moons!??1Await@ivalue@c10@@UEAA@XZ ()
#10 0x00007ff73b997705 in Planets_and_Moons!??1Await@ivalue@c10@@UEAA@XZ ()
#11 0x00007ff73b9db969 in Planets_and_Moons!??1Await@ivalue@c10@@UEAA@XZ ()
#12 0x00007ff73b9db80e in Planets_and_Moons!??1Await@ivalue@c10@@UEAA@XZ ()
#13 0x00007ff73b9db6ce in Planets_and_Moons!??1Await@ivalue@c10@@UEAA@XZ ()
--Type <RET> for more, q to quit, c to continue without paging--c
#14 0x00007ff73b9db9fe in Planets_and_Moons!??1Await@ivalue@c10@@UEAA@XZ ()
#15 0x00007ff93cf7259d in KERNEL32!BaseThreadInitThunk () from C:\Windows\System32\kernel32.dll
#16 0x00007ff93d7eaf38 in ntdll!RtlUserThreadStart () from C:\Windows\SYSTEM32\ntdll.dll
#17 0x0000000000000000 in ?? ()
Environment:
- OS: Windows 11
- Compiler: MSVC 14.29 (Visual Studio 2019)
- Libtorch Version: 2.5.1+cu118 (CUDA) (also tested with 2.5.1+cpu, 2.4.1+cu118 and 2.1.2+cu118)
- CUDA Version: 11.8
- GPU: NVIDIA GeForce RTX 3070 (laptop)
What I have tried:
- Disable Multithreading: Set num_workers = 0 to avoid threading issues (Issue persists).
- Change batch size: Set batch_size=1 (Issue persists).
- Simplified Dataset: Replaced dataset logic with dummy tensors (Issue persists).
- Force CPU Execution: Used CPU tensors only (no .to(torch::kCUDA)). (Issue persists).
- Checked .dll Placement: All necessary .dll files are in the same directory as the executable. (Issue persists).
- Reinstalled libtorch, cuda and visual studio tools (Issue persists).
- Access DataLoader Without Loop (Issue persists):
auto iterator = data_loader->begin();
auto batch = *iterator; // FAILS HERE
- Small tests to see if cuda was available and/or if threading produced errors, but all worked fine.
#include <iostream>
#include <thread>
#include <torch/torch.h>
void cuda_thread_test(int thread_id) {
try {
torch::Tensor tensor = torch::ones({100, 100}, torch::kCUDA);
tensor = tensor * 2;
std::cout << "Thread " << thread_id << ": Tensor sum = " << tensor.sum().item<float>() << std::endl;
} catch (const std::exception& e) {
std::cerr << "Thread " << thread_id << ": Exception: " << e.what() << std::endl;
}
}
int main() {
if (!torch::cuda::is_available()) {
std::cerr << "CUDA is not available!" << std::endl;
return -1;
}
std::cout << "CUDA is available!" << std::endl;
const int num_threads = 4;
std::thread threads[num_threads];
for (int i = 0; i < num_threads; ++i) {
threads[i] = std::thread(cuda_thread_test, i);
}
for (int i = 0; i < num_threads; ++i) {
threads[i].join();
}
std::cout << "All threads completed successfully." << std::endl;
return 0;
}
Questions
- What could cause a crash during the iteration of a
DataLoader
, considering the extensive testing and discarded possibilities (e.g., threading, dataset logic, CUDA availability)? - Are there additional debugging steps I can take to identify the root cause?
- Is this a known issue with Libtorch on Windows?