Thanks @ptrblck for the reply. I noticed that this exception occurs when the model in python is not quantised. The exception I got happened for mobilenetv3 faster rcnn and retinanet. However, when I built a model with quantisation layers then I do not get an exception.
I will try your suggestion and see how it goes.
Update
I did try to clone the tensor but still met with memory exception in another thread. Maybe IOS does not allow running of large models, since the detect function manages to run without errors for quantised models.
I realised the problem was I think because the pointer to imageBuffer was somehow deallocated by the time it tried to execute the model inference. I was passing a pointer to imageBuffer from swift to C++. Somehow I am not sure why imageBuffer seems to be deallocated when I called the inference code from swift. I fixed it by defining a @State property for the imageBuffer variable in swift