How to implement a Continuous Control of a quadruped robot with Deep Reinforcement Learning in Pytorch and Pybullet?

I have designed this robot in URDF format and its environment in pybullet. Each leg has a minimum and maximum value of movement.

What reinforcement algorithm will be best to create a walking policy in a simple environment in which a positive reward will be given if it walks in the positive X-axis direction?

The expected output from the policy is an array in the range of (-1, 1) for each joint. The input of the policy is the position of each joint, the center of mass of the body, the difference in height between the floor and the body to see if it has fallen and the movement in the x-axis.


left_front_joint => lower="-0.4" upper=“2.5” id=0

left_front_leg_joint => lower="-0.6" upper=“0.7” id=2

right_front_joint => lower="-2.5" upper=“0.4” id=3

right_front_leg_joint => lower="-0.6" upper=“0.7” id=5

left_back_joint => lower="-2.5" upper=“0.4” id=6

left_back_leg_joint => lower="-0.6" upper=“0.7” id=8

right_back_joint => lower="-0.4" upper=“2.5” id=9

right_back_leg_joint => lower="-0.6" upper=“0.7” id=11

The code above is just a test of the environment with a manual set of movements hardcoded in the robot just to test how it could walk later. The environment is set to real time, but I assume it needs to be in a frame by frame lapse during the policy training.

A video of it can be seen in:

import pybullet as p
import time
import pybullet_data

def moveLeg( robot=None, id=0, position=0, force=1.5  ):
    if(robot is None):

pixelWidth = 1000
pixelHeight = 1000
camTargetPos = [0,0,0]
camDistance = 0.5
pitch = -10.0
upAxisIndex = 2
yaw = 0

physicsClient = p.connect(p.GUI)#or p.DIRECT for non-graphical version
p.setAdditionalSearchPath(pybullet_data.getDataPath()) #optionally
viewMatrix = p.computeViewMatrixFromYawPitchRoll(camTargetPos, camDistance, yaw, pitch, roll, upAxisIndex)
planeId = p.loadURDF("plane.urdf")
cubeStartPos = [0,0,0.05]
cubeStartOrientation = p.getQuaternionFromEuler([0,0,0])
#boxId = p.loadURDF("r2d2.urdf",cubeStartPos, cubeStartOrientation)
boxId = p.loadURDF("src/spider.xml",cubeStartPos, cubeStartOrientation)
# boxId = p.loadURDF("spider_simple.urdf",cubeStartPos, cubeStartOrientation)

toggle = 1


for i in range (10000):

    moveLeg( robot=boxId, id=0,  position= toggle * -2 ) #LEFT_FRONT
    moveLeg( robot=boxId, id=2,  position= toggle * -2 ) #LEFT_FRONT

    moveLeg( robot=boxId, id=3,  position= toggle * -2 ) #RIGHT_FRONT
    moveLeg( robot=boxId, id=5,  position= toggle *  2 ) #RIGHT_FRONT

    moveLeg( robot=boxId, id=6,  position= toggle *  2 ) #LEFT_BACK
    moveLeg( robot=boxId, id=8,  position= toggle * -2 ) #LEFT_BACK

    moveLeg( robot=boxId, id=9,  position= toggle *  2 ) #RIGHT_BACK
    moveLeg( robot=boxId, id=11, position= toggle *  2 ) #RIGHT_BACK
    toggle = toggle * -1

    #viewMatrix        = p.computeViewMatrixFromYawPitchRoll(camTargetPos, camDistance, yaw, pitch, roll, upAxisIndex)
    #projectionMatrix  = [1.0825318098068237, 0.0, 0.0, 0.0, 0.0, 1.732050895690918, 0.0, 0.0, 0.0, 0.0, -1.0002000331878662, -1.0, 0.0, 0.0, -0.020002000033855438, 0.0]
    #img_arr = p.getCameraImage(pixelWidth, pixelHeight, viewMatrix=viewMatrix, projectionMatrix=projectionMatrix, shadow=1,lightDirection=[1,1,1])

cubePos, cubeOrn = p.getBasePositionAndOrientation(boxId)

1 Like