I would like to perform model parallelism, as the model size is too big to even fit in one GPU. Based on Single-Machine Model Parallel Best Practices — PyTorch Tutorials 2.0.0+cu117 documentation, these model parallelism techniques need me to modify the model class. Is there any way to directly parallelize these models as black boxes without modifying their class? Thanks.