stanfordroboticsclub · optimass · Apr 11, 2020 · Apr 11, 2020 · Aug 4, 2020 · Aug 4, 2020
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,138 @@
+# Created by .ignore support plugin (hsz.mobi)
+### Python template
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+.idea/
+wandb/
+.DS_Store
 *.DS_Store
-*.pyc
-**/__pycache__
+*.hdf5
+scripts/*.npz
+scripts/*.pkl
diff --git a/LICENSE b/LICENSE
diff --git a/README.md b/README.md
@@ -1,41 +1,130 @@
-# Stanford Quadruped
-
-## Overview
-This repository hosts the code for Stanford Pupper and Stanford Woofer, Raspberry Pi-based quadruped robots that can trot, walk, and jump. 
+# Stanford Quadruped Gym 
 
+This repo is adapted from the original Standford Quadruped repo at https://github.com/stanfordroboticsclub/StanfordQuadruped and we've added an additional simulator and gym-compatible environments
+
 ![Pupper CC Max Morse](https://live.staticflickr.com/65535/49614690753_78edca83bc_4k.jpg)
 
-Video of pupper in action: https://youtu.be/NIjodHA78UE
 
-Project page: https://stanfordstudentrobotics.org/pupper
+## Installation
+
+We assume you got an environment where `pip` and `python` point to the Python 3 binaries. This repo was tested on Python 3.6.
+
+```bash
+git clone https://github.com/fgolemo/StanfordQuadruped.git
+cd StanfordQuadruped
+pip install -e .[sim] 
+```
+
+#### Robot installation
+
+```bash
+sudo pip install -e .[robot]
+```
+
+## Getting Started
+
+The new simulator lives in `stanford_quad/sim/simulator2.py` in the `PupperSim2` class. You can run the simulator in dev mode by running the script
+
+    python scripts/07-debug-new-simulator.py
+
+## Gym Environments
+
+There are currently 11 environments. All come in 2 variants: `Headless` and `Graphical`. `Headless` is meant for training and launches the PyBullet sim without any GUI, `Graphical` is meant for local debugging, offers a GUI to inspect the robot and is significantly slower.
+
+You can try out one of the walking environments, by running:
+
+    python scripts/08-run-walker-gym-env.py
+
+### Walking
+
+In all environments, the observations are the same:
+
+**Observation space**:
+- 12 leg joints in the order
+  - front right hip
+  - front right upper leg
+  - front right lower leg
+  - front left hip/upper/lower leg
+  - back right hip/upper/lower leg
+  - back left hip/upper/lower leg
+- 3 body orientation in euler angles
+- 2 linear velocity (only along the plane, we don't care about z velocity
+
+The joints are normalized to be in [-1,1] but the orientation and velocity can be arbitrarily large.
+
+The **action space** in both environments is also 12-dimensional (corresponding to the same 12 joints as above in that order) and also normalized in [-1,1] but the effects are different between both environments (see below).
+
+In both environments, the goal is to walk/run as fast as possible straight forward. This means the **reward** is calculated as relative increase of x position with respect to the previous timestep (minus a small penalty term for high action values, same as in HalfCheetah-v2)
+
+#### Parameters
+
+There are several settings for the walking environment and a handful of combinations of these parameters have been given dedicated names. Here's the list of parameters and their meaning:
+
+- `debug` (bool): If True, this will turn ON the GUI. Usually the `[Headless|Graphical]` name in the environment specifies that this is False/True respectively.
+- `steps` (int): default 120, Length of an episode. This corresponds to 2 seconds at 60Hz.
+- `relative_action` (bool): If False, action commands correspond directly to the joint positions. If True, then at each step, the actions are added to the stable resting position, i.e. instead of `robot.move(action)`, it's `robot.move(REST_POSE+action)`.
+- `action_scaling` (float): By default, the robot has a large movement range and very responsive joints, meaning the policy can pick the maximum negative joint position in one step and the maximum positive joint position in the next step. This causes a lot of jitter. In order to reduce this, this setting allows to restrict the movement range. Best used in combination with `relative_action`.
+- `action_smoothing` (int): Another method to reduce jitter. If this is larger than 1, actions aren't applied to the robot directly anymore but instead go into a queue of this length. At each step, the mean of this queue is applied to the robot.
+- `random_rot` (3-tuple of floats): This allows to specify the initial random rotation. The 3 elements in the triple correspond to rotation around the x/y/z axes respectively. The rotations are drawn from a normal distribution centered at 0 and this value specifies the variance on each axis. Values are in degrees.
+- `reward_stability` (float): Specifies the coefficient of the IMU reward term that encourages stability. By default, it's 0. 
+
+Based on our previous experiments, the following set of parameters seem to perform best (corresponding to the environment **`Pupper-Walk-Relative-ScaledNSmoothed3-RandomZRot-Headless-v0`**):
+
+```python
+params = {
+    "debug": False,
+    "steps": 120,
+    "relative_action": True,
+    "action_scaling": 0.3,
+    "action_smoothing": 3,
+    "random_rot": (0,0,15),
+    "reward_stability": 0
+}
+```       
+
+#### Pupper-Walk-Absolute-[Headless|Graphical]-v0
+
+In this environment, you have full absolute control over all joints and their resting position is set to 0 (which looks unnatural - the robot is standing fully straight, legs extended). Any action command is sent directly to the joints.
+
+#### Pupper-Walk-Relative-[Headless|Graphical]-v0
+
+In this env, your actions are relative to the resting position (see image at the top - roughly that). Meaning an action of `[0]*12` will put the Pupper to a stable rest. Action clipping is done after summing the current action and the action corresponding to the resting position, which means the action space is asymmetric - e.g. if a given joint's resting position is at `0.7`, then the action space for that joint is `[-1.7,.3]`. This is intentional because it allows the Pupper to start with a stable position.
+
+#### Pupper-Walk-Relative-ScaledDown_[0.05|0.1|0.15|...|0.5]-[Headless|Graphical]-v0
+
+Similar to the `Pupper-Walk-Relative` but here the actions are multiplied with a factor (in the environment name) to reduce the range of motion.
+
+#### Pupper-Walk-Relative-ScaledDown-RandomZRot-[Headless|Graphical]-v0
+
+Like the `Pupper-Walk-Relative-ScaledDown_0.3` but with random initial z rotation (rotation is drawn from a normal distribution, centered at 0, variance of 15 degrees).
+
+#### Pupper-Walk-Relative-ScaledNSmoothed3-RandomZRot-[Headless|Graphical]-v0
+
+Like `Pupper-Walk-Relative-ScaledDown-RandomZRot` but with an additional action smoothing of **3**.
+
+#### Pupper-Walk-Relative-ScaledNSmoothed5-RandomZRot-[Headless|Graphical]-v0
 
-Documentation & build guide: https://pupper.readthedocs.io/en/latest/
+Like `Pupper-Walk-Relative-ScaledDown-RandomZRot` but with an additional action smoothing of **5**.
 
-## How it works
-![Overview diagram](imgs/diagram1.jpg)
-The main program is ```run_robot.py``` which is located in this directory. The robot code is run as a loop, with a joystick interface, a controller, and a hardware interface orchestrating the behavior. 
+#### Pupper-Walk-Relative-Smoothed5-RandomZRot-[Headless|Graphical]-v0
 
-The joystick interface is responsible for reading joystick inputs from a UDP socket and converting them into a generic robot ```command``` type. A separate program, ```joystick.py```, publishes these UDP messages, and is responsible for reading inputs from the PS4 controller over bluetooth. The controller does the bulk of the work, switching between states (trot, walk, rest, etc) and generating servo position targets. A detailed model of the controller is shown below. The third component of the code, the hardware interface, converts the position targets from the controller into PWM duty cycles, which it then passes to a Python binding to ```pigpiod```, which then generates PWM signals in software and sends these signals to the motors attached to the Raspberry Pi.
-![Controller diagram](imgs/diagram2.jpg)
-This diagram shows a breakdown of the robot controller. Inside, you can see four primary components: a gait scheduler (also called gait controller), a stance controller, a swing controller, and an inverse kinematics model. 
+Like `Pupper-Walk-Relative-ScaledNSmoothed5-RandomZRot` but with the actions only averaged in a list of 5, not scaled.
 
-The gait scheduler is responsible for planning which feet should be on the ground (stance) and which should be moving forward to the next step (swing) at any given time. In a trot for example, the diagonal pairs of legs move in sync and take turns between stance and swing. As shown in the diagram, the gait scheduler can be thought of as a conductor for each leg, switching it between stance and swing as time progresses. 
+#### Pupper-Walk-Relative-RewardStable0.5-[Headless|Graphical]-v0
 
-The stance controller controls the feet on the ground, and is actually quite simple. It looks at the desired robot velocity, and then generates a body-relative target velocity for these stance feet that is in the opposite direction as the desired velocity. It also incorporates turning, in which case it rotates the feet relative to the body in the opposite direction as the desired body rotation. 
+Like `Pupper-Walk-Relative` but with the additional reward term for body orientation close to zero. Coefficient for the reward term is 0.5
 
-The swing controller picks up the feet that just finished their stance phase, and brings them to their next touchdown location. The touchdown locations are selected so that the foot moves the same distance forward in swing as it does backwards in stance. For example, if in stance phase the feet move backwards at -0.4m/s (to achieve a body velocity of +0.4m/s) and the stance phase is 0.5 seconds long, then we know the feet will have moved backwards -0.20m. The swing controller will then move the feet forwards 0.20m to put the foot back in its starting place. You can imagine that if the swing controller only put the leg forward 0.15m, then every step the foot would lag more and more behind the body by -0.05m. 
+#### Pupper-Walk-Relative-RewardStable0.5-ScaledDown3-[Headless|Graphical]-v0
 
-Both the stance and swing controllers generate target positions for the feet in cartesian coordinates relative the body center of mass. It's convenient to work in cartesian coordinates for the stance and swing planning, but we now need to convert them to motor angles. This is done by using an inverse kinematics model, which maps between cartesian body coordinates and motor angles. These motor angles, also called joint angles, are then populated into the ```state``` variable and returned by the model. 
+Like `Pupper-Walk-Relative-RewardStable0.5` but additionally with the actions scaled by 0.3
 
+#### Pupper-Walk-Relative-RewardStable0.5-ScaledDown-RandomZRot-[Headless|Graphical]-v0
 
-## How to Build Pupper
-Main documentation: https://pupper.readthedocs.io/en/latest/
+Like `Pupper-Walk-Relative-RewardStable0.5-ScaledDown3` but with additional random initial rotation around the z axis
 
-You can find the bill of materials, pre-made kit purchasing options, assembly instructions, software installation, etc at this website.
+#### Pupper-Walk-Relative-RewardStable0.5-ScaledNSmoothed-RandomZRot-[Headless|Graphical]-v0
 
+Like `Pupper-Walk-Relative-RewardStable0.5-ScaledDown-RandomZRot` but additionally with the actions smoothed with queue length 3.
 
-## Help
-- Feel free to raise an issue (https://github.com/stanfordroboticsclub/StanfordQuadruped/issues/new/choose) or email me at nathankau [at] stanford [dot] edu
-- We also have a Google group set up here: https://groups.google.com/forum/#!forum/stanford-quadrupeds
 
 
diff --git a/pupper/HardwareConfig.py b/pupper/HardwareConfig.py
diff --git a/pyproject.toml b/pyproject.toml
@@ -0,0 +1,3 @@
+[tool.black]
+line-length = 120
+target-version = ['py37']
diff --git a/robot.service b/robot.service
@@ -5,7 +5,7 @@ After=joystick.service
 
 [Service]
 ExecStartPre=-sudo pigpiod
-ExecStart=/usr/bin/python3 /home/pi/StanfordQuadruped/run_robot.py
+ExecStart=/usr/bin/python3 /home/pi/StanfordQuadruped/scripts/run_robot.py
 KillSignal=2
 TimeoutStopSec=10