I want use glow to train NN on a new chip. I wanna ask a few question about Training. Thanks for helping!
In the example/mnist.cpp, I found the code use same data to train and infer, is that right?
How do I specify a loss function before training? How do I know how many steps I should train before success?
I can load a modle from caffee2 or onnx and use it to infer, But if I want to load a net from caffee2 or onnx and use it to train, what should I do? Should I modify the code of " image-iclassifier" to support train mode?
Fair warning, the Glow team is pretty focused on inference right now, so you’ll likely encounter some rough edges in training. Keep in mind Glow currently has no way to save training results, so you’re limited to training and performing inference in a single process. To answer your questions, though:
In the example/mnist.cpp, I found the code use same data to train and infer, is that right?
Yes, that’s currently what it’s doing. There’s no reason for it though; you could easily separate the data into train and test sets.
How do I specify a loss function before training? How do I know how many steps I should train before success?
You’ll need to specify the loss function as part of the graph (e.g. Softmax, CrossEntryopyLoss, etc.).
I can load a modle from caffee2 or onnx and use it to infer, But if I want to load a net from caffee2 or onnx and use it to train, what should I do? Should I modify the code of " image-iclassifier" to support train mode?
So, we’ve never actually tried training from a c2 or onnx model. In theory one could make it work but I’m not sure the loader knows how to create a graph for training. (Specifically, I think it might make all the weights constant). You can give this a shot by modifying image-classifier but I’d be very surprised if it works out-of-the-box.
Glow can save_weights, But it has no way to save the model, which make supporting training pretty hard. Is there any way to modify the code to save the trained model?
I found nearly all AI compiler focused on inference other than training. Why? Is it too hard to support training? Or nobody need AI compiler for training?
If I wanna make glow to be a backend for a AI framework such as Tensorflow, is it possible?
It absolutely can be done. It would just take some work But we would be more than happy to see improvements here, so please feel free to send PRs! See this issue we have open.
One reason AI compilers might not support training is that inference is just an easier place to start. That doesn’t mean training is too hard, though. We would like to eventually expand our training support.
Yes, as long as you can get it into a format Glow can import, such as ONNX proto. See this post.