About Efficient code

Hello , I’m partialy new here and wanted to ask about efficiency of c++ frontend . Maybe some one has few tips about implementing models and then sending them to gpu . What are good practice in libtorch and what things to keep in mind when building models to save extra miliseconds .

I’m building GA network for my environment which will be placend in gpu memory too. So basically idea is. 1 environment and thousands of agents .

Maybe there is some tutorials of managing libtorch to fill full potential , because it’s main reason I switched to c++ frontend.

Thanks