Stack overflow - conv1d.forward in DLL on CUDA

I faced a problem with calling conv1d.forward in DLL on CUDA. In the following C++ code snippet line 4 ‘conv1d.forward’ crashes with stack overflow. The full CPP file (27 lines) is at the bottom of this message.

auto Net = torch::nn::Conv1d(torch::nn::Conv1dOptions(21, 2, 3));
Net->to(device);
torch::Tensor X = torch::rand({ 5,21,25 }).to(device);
torch::Tensor Y = Net->forward(X);

Having experimented on two PCs with different GPU types I found the problem is consistently reproduced if all of the following criteria are met:

  1. DLL. The same code in console EXE application runs normally.
  2. GPU/CUDA. There is no problem running the same code on CPU
  3. Convolutional layer. No problem with other layer types, e.g. linear.

I tried debugging DLL in Visual Studio (debugger window screenshot attached). Call stack suggests that stack overflow happens inside cudnn_cnn_infer64_8.dll module. This is part of Nvidia CUDNN library.

I am not sure if this error is part of Pytorch or Nvidia CUDNN. If anyone has any suggestion on how to resolve this please respond.


#include <torch/torch.h>
#define XLExport extern "C" __declspec(dllexport)
XLExport int _stdcall MLP_DLL();

int _stdcall MLP_DLL()
{
	std::ofstream log("output.txt");

	torch::Device device(torch::kCPU);
	if (torch::cuda::is_available()) {		
		log << "Cuda found" << std::endl;
		device = torch::Device(torch::kCUDA);
	}

	auto Net = torch::nn::Conv1d(torch::nn::Conv1dOptions(21, 2, 3));
	Net->to(device);
	torch::Tensor X = torch::rand({ 5,21,25 }).to(device);

	log << " starting forward" << std::endl;		
	
	torch::Tensor Y = Net->forward(X);
	log << "Y = " << std::endl;
	log << Y << std::endl;

	return 0;
}

UPDATE: this was my mistake. I used rundll32.exe to execute my DLL. Apparently, as rundll32 was originally designed to run Windows modules it is not suitable for custom-made DLL libraries.

My DLL runs normally if called not by rundll32.