New RL library, for PyTorch!

Greetings everyone! I am happy to announce that my RL library, Machin, designed for PyTorch, is close to its first public debut after several months of hard development!

Machin is designed with the elegant torch style in mind, while aiming to cover most of the functions provided by Ray. It is hosted at GitHub - iffiX/machin: Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

Currently, Machin is at this development stage:

alpha-1: laying down a framework for all code
alpha-2: finish testing all modules                       
alpha-3: clean and update old examples               
alpha-3.5: add architecture docs, tutorial docs, etc.
beta-1: public tests, collect response and perfect design    < Machin is at here
first official release!

Algorithms

Machin is able to support a variety of RL algorithms, including:

Single agent algorithms:

Multi-agent algorithms:

Massively parallel algorithms:

Enhancements:

Algorithms to be supported:

Parallel

Machin is capable of:

  1. Distributed & asynchronus training: provide implementations for parameter servers and gradient reduction servers.
  2. Advance RPC: Role based launching, service registraction, resource sharing. constructed on the basis of torch RPC.
  3. Process level Parallelization, based on enhanced Processes, Threads, Pools, Events.
  4. Model level Parallelization, Splitting(Not yet implemented) and Assignning model shards to devices, by heuristic.

Utilities

  1. Model checking toolset: check for nan values, abnormal gradients etc in your model, based on hooks.
  2. Logging videos, images, produced during your training
  3. Model serialization, auto loading and management toolset. No longer worry about unmatched devices!
  4. Many more.
8 Likes

I wonder if there is any specific need for Model based methods? I am quite interested in this area, and wants to hear your suggestions on:

  1. What model-based algorithms are STOA and worth supporting.
  2. Is there a good overview paper for this ?