Parallelize parts of the model on a single GPU

I have a model that consists of two parts: an image encoder and a text encoder. Features from each encoder I then concatenate. Currently, encoders are executed sequentially in my code during the forward pass. So I would like to know if there is a way to parallelize these encoders in a forward pass?