RL Training Code Structure
This page provides an overview of the directory layout and the main responsibilities of each module inside the rl_train
package. Use it as a quick reference when you need to modify, debug, or extend the training pipeline.
Entry Points
Script | Purpose |
---|---|
run_sim_minimal.py | Quickly spin up an environment and roll random actions for smoke-testing the simulation. |
run_train.py | Main training launcher. Reads a JSON config, constructs environments, and starts Stable-Baselines3 PPO training. |
run_policy_eval.py | Replay a trained policy in evaluation mode and generate analysis artefacts. |
All three scripts accept a rich set of CLI flags so that most hyper-parameters can be overridden without editing the JSON config files.
Directory Layout
rl_train/
├── envs/ # Gym / MuJoCo environment definitions
│ ├── myoassist_leg_base.py
│ ├── myoassist_leg_imitation.py
│ └── environment_handler.py
│
├── train/ # Training pipeline (configs, commands, policies)
│ ├── train_configs/ # JSON files that fully specify a training session
│ ├── train_commands/ # Convenience shell commands for long experiments
│ └── policies/ # Custom policy networks
│
├── utils/ # Generic utilities used across training / analysis
│ └── learning_callback.py # Custom SB3 callback for logging & checkpoints
│
├── analyzer/ # Post-training analysis & visualisation
│ ├── gait_analyze.py
│ ├── gait_evaluate.py
│ └── train_analyzer.py
│
├── reference_data/ # Human Mo-cap data used for imitation or evaluation
│ └── short_reference_gait.npz
│
└── results/ # Auto-generated output (checkpoints, logs, videos)
envs/
Home of all MuJoCo-based Gym environments
File | Key Class | Notes |
---|---|---|
myoassist_leg_base.py | MyoAssistLegBase | Base class that wires intrinsic simulation logic, observation construction and reward terms. |
myoassist_leg_imitation.py | MyoAssistLegImitationEnv | Environment for muscle-driven imitation learning (human-only). |
myoassist_leg_imitation_exo.py | MyoAssistLegImitationExoEnv | Variant that adds exoskeleton actuation. |
environment_handler.py | EnvironmentHandler | Factory that instantiates and vectorises envs based on JSON config. |
train/
Launch, configure, and extend PPO training
train_configs/
– Dozens of ready-made JSON presets. The file name usually describes the experiment (imitation_tutorial_22_separated_net_partial_obs.json
).train_commands/
– Helper shell scripts or*.sh
bundles so long experiments can be reproduced easily on a cluster.policies/
– Custom network architectures. If absent, SB3’s default MLP is used.
utils/
Shared helpers – no training logic inside
File | What it does |
---|---|
learning_callback.py | Saves checkpoints, videos and metrics every N steps. Also handles curriculum switches. |
train_log_handler.py | Small wrapper around loguru to standardise log output across scripts. |
numpy_utils.py | Misc. helper functions for fast array ops. |
data_types.py | Pydantic-style typed dicts used for config validation. |
analyzer/
Post-hoc evaluation & visualisation
The analysis pipeline is modular – run train_analyzer.py
to generate plots in results/train_session_*/analyze_results/
.
reference_data/
Contains reference gait trajectories (e.g., NPZ files) used for imitation or for computing biomechanical metrics.
Typical Data Flow
run_train.py
loads a JSON config → constructs anEnvironmentHandler
.- The handler creates multiple
MyoAssistLegImitationEnv
instances and wraps them using SB3’sSubprocVecEnv
. - A PPO policy (custom or default) is initialised and starts learning.
- Every k steps
LearningCallback
saves:trained_models/model_<steps>.zip
train_log.json
- preview videos (if
flag_rendering
is on)
- After training, run
run_policy_eval.py
to replay checkpoints and kick offanalyzer/train_analyzer.py
.
Extending the Pipeline
- Add a new terrain – update
HfieldManager
and reference it in your JSON config. - Custom reward – subclass
MyoAssistLegBase
and override_calculate_reward()
. - Different algorithm – replace the PPO import in
run_train.py
with any SB3 algorithm; the callback remains compatible. - New plots – add a function in
analyzer/gait_analyze.py
and call it fromtrain_analyzer.py
.