Alternatives to parallel processing of ensemble of the same architecture?

fabrizio_chavez · November 6, 2024, 10:12pm

Hello,

I have made a class that instantiates an autoencoder. Now I want a class that instantiates some number of these autoencoders. The goal is to pass the same inputs to ALL the autoencoders in parallel. As you can see, my current implementation does not achieve this in parallel:

class KAutoEncoders(nn.Module):
  def __init__(self, k, data_dim, hidden_dim, batch_normalize=False):
    super(KAutoEncoders, self).__init__()
    self.k = k
    self.data_dim = data_dim
    self.hidden_dim = hidden_dim

    
    self.autoencoders = nn.ModuleList(
        [AutoEncoder(data_dim, hidden_dim, batch_normalize) for _ in range(k)])

  def forward(self, x):
    reconstructions = []
    embeddings = []

  for autoencoder in self.autoencoders:
     embedding, reconstruction = autoencoder(x)
     embeddings.append(embedding)
     reconstructions.append(reconstruction)

  reconstructions = torch.stack(reconstructions, dim=1)  
  embeddings = torch.stack(embeddings, dim=1) 

  return embeddings, reconstructions

I saw that vmap is a possible solution but it completely escapes my understanding at the moment (I read the tutorial and much of the information that is dropped there requires that I go read up on more background-- I am still new, sadly).

Any other alternatives? If not, can anyone suggest how can I use vmap here?

Thank you.

paulge · November 7, 2024, 9:23am

Am I understanding correctly that you want to run multiple instances of your auto encoder class in parallel for either inference or training? Then you are going to want to look at torch.nn.DataParallel and or torch.nn.DistributedDataParallel
They are fairly easy to initialize and you can run them in parallel. However, you need multiple GPUs for that. If you want to do it on a single GPU you can use vmap. I will reply in a separate comment to that

paulge · November 7, 2024, 9:36am

Now if you want multiple instances on the same GPU you can use the model esembling tutorial you provided. But don’t initialize each auto encoder in the module list.
You specify a number of models, lets say 5 and then initialize all your models on your GPU device in a list like so

class AutoEncoder(nn.Module):
  def __init__(self, data_dim, hidden_dim, batch_normalize):
    super().__init__()
    self.data_dim = data_dim
    self.hidden_dim = hidden_dim
    self.batch_normalize = batch_normalize

    def forward(self, x):
      # implement forward pass for a single one of your autoencoders here
      return embedding, reconstruction


num_models = 5
data = [] # list or however you store and load your data
device = "cuda" # or "mps" whatever your device is
models = [AutoEncoder().to(device) for _ in range(num_models)] # move models to device
predictions = [model(input) for model, input in zip(models, data[:num_models])] # call each model with different input
# or this if you want to call each model with the same input
same_input = data[0] # for example first element from data
predictions = [model(same_input) for mode in models]

Then you can stack the predictions like you did. If you can implement this you can easily convert it into a call to torch.vmap(). Which removes the for loop logic. But the above should help you to get started.

fabrizio_chavez · November 12, 2024, 6:16am

Thank you so much! I haven’t implemented this yet, as I am dealing with other technical details but I will be bookmarking this for when the times comes.