I am trying to use
pytorch to perform simple calculations across multiple gpu. I am not wanting to train a machine learning model. I’ve posted this in the distributed forum here, but I haven’t gotten a response back about a particular question. Here is the code I have thus far:
import torch import torch.multiprocessing as mp import torch.distributed as dist import torch.nn.functional as F import pandas as pd def calc_cos_sims(rank, world_size): dist.init_process_group('gloo', rank=rank, world_size=world_size) cuda_device = torch.device('cuda:'+str(rank)) data_path = './embed_pairs_df_million_part_' + str(rank) + '.pkl' tmp_df = pd.read_pickle(data_path) embeds_a_list = [embed_a for embed_a in tmp_df['embeds_a']] embeds_b_list = [embed_b for embed_b in tmp_df['embeds_b']] embeds_a_tensor = torch.tensor(embeds_a_list, device=cuda_device) embeds_b_tensor = torch.tensor(embeds_b_list, device=cuda_device) cosine_tensor = F.cosine_similarity(embeds_a_tensor, embeds_b_tensor) def main(): world_size = 4 #since I have 4 GPUs on a single machine mp.spawn(calc_cos_sims, args=(world_size,), nprocs=world_size, join=True) if __name__ == 'main': main()
Basically, the code calculates the cosine similarity between two different embeddings. I have 4 GPU available to me and I have split my data into 4 slices to run on a given GPU.
It was recommended to use the pytorch collective api to aggregate the results. I read through it, but I’m not entirely sure how to implement it. How would that be done in this case or is there a better way to do all of this? I’d like to be able to save off the aggregated results to a file or have available for use at a further point in my program.
I welcome any feedback about potential improvements. Thank you in advance!