Is it possible to lazy-generate a larger-than-memory random matrix with pytorch? Ideally I am looking to generate a 1e9 x 5000 matrix and compute the scalar product with another 5k x 5k matrix.
I had a look at keops, but I didn’t manage to get too far with that.

Is it be possible to do this in a distributed manner, for example by interfacing somehow with pyspark?

torch.distributed package can help to distribute tensors across multiple nodes, but, as of today, you still need to implement the distribute matrix multiplication on your own. If you are looking for features like Mesh-TensorFlow. It is not yet available.

This is a very interesting request. Do you mind share some some details of the application?

Say I want to perform N monte carlo simulations each consisting in a dot product of a vector of length M (M=5k) times a square matrix M x M. One way is a very lengthy and never ending loop. A matrix multiplication is however much faster. This is particularly useful in credit risk, to estimate distributions of losses. With numpy you can only use an in-between solution by making batches, but being able to lazily perform such operation gives massive advantages. Currently I solved the problem by a combination of xarray and dask, but would be great to see this in pytorch as well.