Quick way to convert state_dicts from CPU to JSON

theoryofjake · March 18, 2021, 7:23pm

What’s up everyone,

I currently have a distributed reinforcement learning framework built using PyTorch. Upon profiling my code, a major damper on my throughput (network updates per unit time) is getting the parameters (state_dicts) of my networks from the OrderedDict class torch uses to JSON format for sending over a network using gRPC.

A code example of what I’m currently doing:

# convert torch state dict to json
for entry in actor_params:
    actor_params[entry] = actor_params[entry].cpu().data.numpy().tolist()
actor_params = json.dumps(actor_params)

where actor_params is just a model state_dict.

To summarize, I just need a quick way to get from torch CPU state_dict → JSON (speed this block up). I do this for six networks sequentially, so that’s where my speed issue is. Any help or ideas are greatly appreciated.

Cheers!

mrshenli · March 18, 2021, 10:12pm

This is actually one of the reasons why we built PyTorch RPC, i.e., you don’t need to serialize tensors into json/string before passing it through communication. The native PyTorch RPC will serialize actor_params into a binary payload + a list of tensors, so that tensor storage/contents are kept as-is. Then the TensorPipe backend can directly send those tensors to the destination.

RPC API: Distributed RPC Framework — PyTorch master documentation
Toy RL tutorial: Getting Started with Distributed RPC Framework — PyTorch Tutorials 1.8.0 documentation

More tutorials: PyTorch Distributed Overview — PyTorch Tutorials 1.8.0 documentation

theoryofjake · March 28, 2021, 4:49pm

Thanks so much, I’ll give this a shot and update soon.

theoryofjake · March 28, 2021, 5:33pm

Is there any update on [RFC] Add Windows support to torch.distributed package · Issue #42095 · pytorch/pytorch · GitHub, windows support for this package? torch.distributed.rpc.is_available() returns False for me on my windows machine.

mrshenli · June 6, 2021, 8:00pm

Hey @theoryofjake, RPC is not available on Windows yet. Regarding that issue, MSFT team helped a lot on enabling DDP on Windows, which is now available as a prototype feature in the latest release.

cc @pbelevich

theoryofjake · June 8, 2021, 2:08am

Thanks so much for your reply. I have one more question for you, as I’m now working on a Linux machine:

I have read through your tutorials for the parameter server and the DDP example. I have an application that just needs to share model parameters from a GPU process to a different CPU process. Which makes the most sense: a parameter-server type application using the remote calls, DDP, or sending tensors with recv/send?

Thanks again y’all.

mrshenli · June 8, 2021, 2:35am

Not PS: Since the goal is just to pass parameters cross two processes, parameter-server (PS) might be an overkill, as PS usually serves multiple parallel trainers.

Not DDP: Since you need to synchronize parameters, DDP might not be a good fit either, as DDP synchronizes model gradients.

send/recv vs RPC: this depends on how the problem is written. In general, senc/recv is a better fit for single-program multi-data (SPMD) applications, while with RPC, there is usually one driver/master that coordinates all computations in a cluster.

send/recv: The main difference between send/recv and RPC is that when using send/recv both processes need to proceed in the same pace, i.e., when one process calls send, the other one must call recv. If this is how your program is designed, then send/recv should be sufficient (though you still need to convert your model parameters into one tensor and then call send/recv, or calling one send/recv for each parameter).
RPC: When using RPC, you can program the entire logic on the master, and all other processes just block on the rpc.shutdown() call. You don’t need to worry about things like serializing a model, or coordinate multiple processes, etc.