Avoid loading Torch library multiple times in CUDA

In my program, I am using my own decoder that reads video data in CUDA memory. The decoder reads the video frame like OpenCV. I am using converting the frame data to torch tensor and using in my model.
When I am initializing the decoder normally, it takes 50MB CUDA memory. Initialising means it starts decoding the frames. But, when I am initializing the decoder with torch c++ lib, its taking 500MB, which is much higher. And if I am initializing 2 decoders, its creating 2x500MB = 1GB memory, which is unusual, as it should not load the torch lib every time.

#include <torch/script.h>
#include <torch/torch.h>
#define kCHANNELS 3

void deleter(void* arg) {};
struct Net{

    void    *pReader = NULL;
 
    Net (const char * video_url) {   
    void *pCudaContext      = nullptr;
    int iGPUDeviceNumber    = 0;

    ///////////// Here, I am initialing the decoder which has nothing to do with torch lib
    ///////////// When I am just initialing the decoder, its taking lot of CUDA memory, although it does not need torch lib to run.
    pReader = decoder::Init (  video_url,
                                        NULL,
                                        iGPUDeviceNumber,
                                        ....................
                                        ....................); };
   
   torch::Tensor getFrame(){
            //////////////////////// Here, I am reading the data frame
            pOutData        = 0;
            nResult = decoder::GetVideoFrame (pReader, &pOutData);
            auto options = torch::TensorOptions().dtype(torch::kUInt8).device(torch::kCUDA, 0);  
            auto input_tensor = torch::from_blob(
                    pOutData, {1, 360, 640, kCHANNELS}, deleter, options);
           return input_tensor.clone()
        }};