Using libtorch with Cuda

Hi, I want to do fault injection on a neural network for a project and I chose Pytorch C++ as a framework. Since I want to use nvbitFI as fault injector, so I need an explicit calls to a CUDA kernel function. The idea is to write only the forward function since fault injection will be done in the inference phase and not during training. I have already tried using the ->to(device) function of pytorch, but the fault injector does not detect and kernel operation.
I wrote my own class that extends conv2dimpl and calls a cuda function I wrote, but when I try to compile it, I get the following error.

the code is the following:

torch::Tensor forward_cuda(torch::Tensor weights, torch::Tensor input, torch::Tensor bias, torch::Tensor output);

class Conv2dCudaImpl : public torch::nn::Conv2dImpl{
public:
Conv2dCudaImpl(
int kernelSize, int inChannels, int outChannels,
int padding, int dilation = 1, int stride = 1,
int groups = 1, std::string padding_mode = “zeros”,
bool useBias = true) : torch::nn::Conv2dImpl(torch::nn::Conv2dOptions(inChannels, outChannels, kernelSize).padding(padding).dilation(dilation).stride(stride)){
this->kernelSize = kernelSize;
this->padding = padding;
this->dilation = dilation;
this->stride = stride;
this->inChannels = inChannels;
this->outChannels = outChannels;

  	max = groups / (inChannels*kernelSize*kernelSize);
  	min = -1*max;
  	
  	weight = (max-min)*torch::rand({outChannels, inChannels, kernelSize, kernelSize}, torch::requires_grad(true)) + min; 
  	
  	if(useBias == false){
  		bias = 0 * torch::rand({outChannels});
  	}else{
  		bias = (max-min)*torch::rand({outChannels}, torch::requires_grad(true)) + min;
  	}
  }
  
  torch::Tensor forward(const torch::Tensor& input){
  	//height of input matrix, padding, dilation, kernel height, stride
  	out_h = (input.data().size(2)+2*padding-dilation*(weight.data().size(2)-1)-1)/stride+1;
  	
  	//width of input matrix, padding, dilation, kernel width, stride
  	out_w = (input.data().size(3)+2*padding-dilation*(weight.data().size(3)-1)-1)/stride+1;
  	
  	//batch size, number of output channels, height, width, just to initialize the tensor to the correct size
  	output = torch::randn({input.data().size(0), outChannels, out_h, out_w});
  	
  	return forward_cuda(weight, input, bias, output);
  }

private:
variable declaration
};
TORCH_MODULE(Conv2dCuda);

I found another code for UNET here, which I used for reference alongside the tutorial I found on the Pytorch website and when I try to compile it does not give me this error.