(Computationally) Optimizing highly dynamic graphs

Hi all,

I’ve been working on a system for automatically embedding arbitrary JSON with for training with grad decent via backprop through structure. The way it works is by treating JSON as a tree-structure and using a special recursive neural network. However instead of using just a single recursive functions, different functions are applied in different places.

I have this all working (see: https://github.com/EndingCredits/json2vec) and it works pretty well from a model perspective, but understandably it’s very slow. Obviously the basic model is more complicated than a typical MLP, but I supect most of this is due to other overheads.

I was wondering if people with more understanding of the baremetal pytorch would be able to give some pointers for optimising esoteric models. (Of course, if anyone has any other suggestings regarding implementation, that would also be welcome.)

I’ve had a basic poke around with cProfiler, but I didn’t really spot anything big, and I don’t really know what to look for. One thing I did try was move all the raw data into a single tensor, and point the individual operations to different slices, but that didn’t seem to have a huge impact.