Machine Learning Inference in the cloud


I would like to inquire about the feasibility of performing machine learning inference in a distributed manner, specifically by assigning different layers of the neural network to run on separate GPUs. Is it possible to execute distributed inference in this fashion, where each layer is processed on a different GPU? Furthermore, I am interested in learning about any existing implementations or frameworks that support this type of distributed inference.