Can I use _share_memory() and just write to different indices:
tensor = torch.randn([5,5])._share_memory()
And then from multiple processes:
tensor[process_id] = process_id
Is there a better way involving all_reduce or something? I want non-blocking writes, as in just hogwild writes from multiple processes in parallel.