Access externally updated tensors on gpu from Torchscript

Hi team,

I have a use-case like this. There is a pipeline which will be updating embeddings on GPU. Consider these standard embeddings of different dimensions. And lets say there is a pipeline written in cuda or otherwise which updates these tensors in GPU. This is live ingestion.

Now I have pytorch model which wants to run certain computation on this data. For example the model might want to do a dot product between say a query embedding it received as input with this data. The cardinality of this data could be high. For example we might be talking of millions of documents on gpu. I needed an efficient way to access this data within the pytorch model on every request. Within the model I do batch operations, so for example query.document_transposed, where both query and document are vectors.

Also note that the data is live ingested. Which means at any point some documents will be marked dirty and I will need a way to access the document tensors which are not dirty.

What is the best way to go about this ? My environment is Torchscript models on nvidia A100s. And this is an online system, so latency is critical.

Currently I have implemented this by implementing update/insert APIs within the model and defining storage within the model to store the embeddings within the model itself. However I want to externalize this.

Thanks,