The code above calculate mat mul of two vectors(A,B^T) but I only need diagonal result of cos_sim, since I only need to get a similarity only between matching title and context (cos_sim[i][i]).

Is there any way to calculate without using lots of rams?

In 142565x1024, I can understand 1024 is the embedding dim. What is 142565?

If it doesn’t fit in the memory, try using lazy loading (say load just one pair at a time and compute similarity score). This will slow down the computation process (unless you parallelize it), but save RAM.

Makes sense.
Don’t do mat multiplication, instead use lazy loading with a for loop (or yield). This removes vectorization (hence memory consumption) but would increase the computation time.