Reinforcement Learning

MyoAssist’s reinforcement learning (RL) pipeline is built on top of Stable-Baselines3 (SB3) PPO and a set of custom MuJoCo environments that simulate human–exoskeleton interaction. This page gives you a bird’s-eye view of how everything fits together and where to find more information.

Reinforcement learning (RL) is a machine learning paradigm where an agent learns to make decisions by interacting with an environment and receiving feedback in the form of rewards. In the context of MyoAssist, RL is used to train control policies for human–exoskeleton systems within MuJoCo simulation environments.

Reinforcement Learning Overview

Observation Space:
In our environments, the agent receives observations that include:

Joint angles
Joint velocities
Muscle activations
Sensory data (such as ground contact, force sensors, etc.)
etc

Action Space:
The agent outputs actions that control:

Muscle activations (for the human actor network)
Exoskeleton control values (for the exoskeleton actor network)

Training Workflow

Define a config – start from an existing JSON preset or create one from scratch.

Launch training

python rl_train/run_train.py --config_file_path rl_train/train_configs/my_config.json

Monitor progress – logs & results in results/train_session_*.

Evaluate policy –

python rl_train/run_policy_eval.py results/train_session_<timestamp>

Analyze results – automatic plots + gait metrics saved under analyze_results/.

Key Features

Multi-Actor Support – Separate networks for human muscles and exoskeleton actuators (see Network Index Handler).
Terrain Curriculum – Train on a progression of terrains from flat to rough (Terrain Types).
Reference Motion Imitation – Optional imitation reward using ground-truth gait trajectories.
Realtime Evaluation – Run policies in realtime with --flag_realtime_evaluate.

Getting Started

This guide shows you the fastest way to test the RL system and run training in the MyoAssist RL system.

RL Training Entry Points

Here is a quick overview of the main entry point scripts in the rl_train folder:

File	Purpose
`run_sim_minimal.py`	The simplest way to create and test a MyoAssist RL environment. No training, just environment creation and random actions.
`run_train.py`	Main entry point for running RL training sessions. Loads configuration, sets up environments, and starts training.
`run_policy_eval.py`	Entry point for evaluating and analyzing trained policies. Useful for testing policy performance and generating analysis results.

Quick Test Commands

1. Environment Creation Example

See how to create a simulation environment and run for 150 frames(5sec):

python rl_train/run_sim_minimal.py

mac:
```
mjpython rl_train/run_sim_minimal.py
```
Note: If you need MuJoCo visualizer in mac os, simply use mjpython instead of python to run your script.
You do not need to install anything extra—just change the command:

Note:
If you see the error message ModuleNotFoundError: No module named 'flatten_dict', simply run the command again. This will usually resolve the problem automatically.

result of run_sim_minimal.py

What this does:

Shows an example of creating a Gym wrapped MuJoCo simulation environment
No actual training - just environment creation example

Terminated vs Truncated In-depth explanation of the terminated and truncated values in Gymnasium’s Env.step API

2. Quick Training Test

Run a minimal training session to verify everything works:

python rl_train/run_train.py --config_file_path rl_train/train/train_configs/test.json --flag_rendering

What this does:

Runs actual reinforcement learning training
Training for only a few short timesteps
Uses 1 environment (minimal resource usage)
Enables rendering to see the simulation
Logs results after every rollout (4 steps) for immediate feedback

3. Check Results

After training, check the results folder:

# Results location
rl_train/results/train_session_[date-time]/

Training session result example

What you’ll find:

analyze_results_[timesteps]_[evaluate_number]: Training analysis results
session_config.json: Configuration used for this training
train_log.json: Training log data
trained_models/: Trained models(.zip) saved at each log interval - can be used for evaluation or transfer learning

Full Training (When Ready)

Once you’ve verified everything works, run full training:

python rl_train/run_train.py --config_file_path rl_train/train/train_configs/imitation_tutorial_22_separated_net_partial_obs.json

This file is the default example configuration we provide.
For more details, see the RL Configuration section.

Note:
The provided config sets num_envs to 32.
Depending on your PC’s capability, try lowering this to 4, 8, or 16.
You should also adjust n_steps accordingly.
For example, if you use num_envs=16 (half of 32), you should double n_steps to keep the total batch size the same.

Policy Evaluation

Test a trained model:

python rl_train/run_policy_eval.py [path/to/trainsession/folder]

Example (evaluating with a pretrained model we provide):

python rl_train/run_policy_eval.py docs/assets/tutorial_rl_models/train_session_20250728-161129_tutorial_partial_obs

After training, an analyze_results folder will be created inside your train_session directory.
This folder contains various plots and videos that visualize your agent’s performance.

Where to find:

rl_train/results/train_session_[date-time]/analyze_results/

What’s inside:
- Multiple plots (e.g., reward curves, kinematics, etc.)
- Videos

The parameters used for evaluation and analysis (such as which plots/videos are generated) are controlled by the evaluate_param_list in your session_config.json file.

For more details on how to customize these parameters, see the RL Configuration section.

Transfer Learning

python rl_train/run_train.py --config_file_path [path/to/transfer_learning/config.json] --config.env_params.prev_trained_policy_path [path/to/pretrained_model]

or you can specify the env_params.prev_trained_policy_path in config(.json) file

Note: The [path/to/pretrained_model] should point to a .zip file, but do not include the .zip extension in the path.

Realtime Policy Running

You can run a trained policy in realtime simulation: