In my program, I am using my own decoder that reads video data in CUDA memory. The decoder reads the video frame like OpenCV. I am using converting the frame data to torch tensor and using in my model.
When I am initializing the decoder normally, it takes 50MB CUDA memory. Initialising means it starts decoding the frames. But, when I am initializing the decoder with torch c++ lib, its taking 500MB, which is much higher. And if I am initializing 2 decoders, its creating 2x500MB = 1GB memory, which is unusual, as it should not load the torch lib every time.
#include <torch/script.h>
#include <torch/torch.h>
#define kCHANNELS 3
void deleter(void* arg) {};
struct Net{
void *pReader = NULL;
Net (const char * video_url) {
void *pCudaContext = nullptr;
int iGPUDeviceNumber = 0;
///////////// Here, I am initialing the decoder which has nothing to do with torch lib
///////////// When I am just initialing the decoder, its taking lot of CUDA memory, although it does not need torch lib to run.
pReader = decoder::Init ( video_url,
NULL,
iGPUDeviceNumber,
....................
....................); };
torch::Tensor getFrame(){
//////////////////////// Here, I am reading the data frame
pOutData = 0;
nResult = decoder::GetVideoFrame (pReader, &pOutData);
auto options = torch::TensorOptions().dtype(torch::kUInt8).device(torch::kCUDA, 0);
auto input_tensor = torch::from_blob(
pOutData, {1, 360, 640, kCHANNELS}, deleter, options);
return input_tensor.clone()
}};