Abstract

Humanoid locomotion has advanced rapidly with deep reinforcement learning (DRL), enabling robust feet-based traversal over uneven terrain. Yet platforms beyond leg length remain largely out of reach because current RL training paradigms often converge to jumping-like solutions that are high-impact, torque-limited, and unsafe for real-world deployment. To address this gap, we propose APEX, a system for perceptive, climbing-based high-platform traversal that composes terrain-conditioned behaviors: climb-up and climb-down at vertical edges, walking or crawling on the platform, and stand-up and lie-down for posture reconfiguration. Central to our approach is a generalized ratchet progress reward for learning contact-rich, goal-reaching maneuvers. It tracks best-so-far task progress and penalizes non-improving steps, which provides dense yet velocity-free supervision, enabling efficient exploration under strong safety regularization. Based on it, we train LiDAR-based full-body maneuver policies and reduce the sim-to-real perception gap via a dual strategy: training-time modeling of mapping artifacts and deployment-time filtering and inpainting of elevation maps. Finally, we distill all six skills into a single policy that autonomously selects behaviors and transitions from local geometry and commands. Experiments on a 29-DoF Unitree G1 humanoid demonstrate zero-shot sim-to-real traversal of 0.8m platforms (over 114% of leg length), with robust adaptation to platform height and initial pose and smooth, stable multi-skill transitions.

Video

Our proposed system enables adaptive, perceptive, and context-aware traversal with stability and repeatability.

Robust to large perturbation.

(No robot was hurt during experiment).

Climbing up with strong robustness and adaptability to environments.

BibTeX

@misc{wang2026apexlearningadaptivehighplatform,
  title={APEX: Learning Adaptive High-Platform Traversal for Humanoid Robots}, 
  author={Yikai Wang and Tingxuan Leng and Changyi Lin and Shiqi Liu and Shir Simon and Bingqing Chen and Jonathan Francis and Ding Zhao},
  year={2026},
  eprint={2602.11143},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2602.11143}, 
}