Periodic Skill Discovery¶
Conference: NeurIPS 2025 arXiv: 2511.03187 Code: Available (jonghaepark.github.io/psd) Area: Reinforcement Learning / Skill Discovery Keywords: Unsupervised skill discovery, periodic behavior, circular latent space, locomotion control, robotics
TL;DR¶
This paper proposes Periodic Skill Discovery (PSD), a framework that maps states onto a circular latent space to naturally encode periodicity, enabling unsupervised discovery of diverse locomotion skills with varying periods.
Background & Motivation¶
Root Cause¶
Background: Unsupervised skill discovery is an important direction in reinforcement learning, aiming to learn diverse behaviors without relying on extrinsic rewards. However, existing methods overlook a fundamental issue:
Ignoring the periodic nature of skills: Most methods focus on maximizing mutual information between states and skills, or maximizing traversal distance in the latent space.
Periodicity requirements in locomotion tasks: Many robotic tasks, particularly locomotion, inherently require periodic behaviors at different temporal scales (e.g., walking, running, jumping).
Limitations of Prior Work: Mutual-information-based methods such as DIAYN struggle to naturally discover skills with varying periods.
The core motivation of PSD is to exploit the topological structure of a circular latent space to naturally encode periodicity, thereby discovering locomotion skills with diverse periods.
Paper Goals¶
Overall Architecture¶
The PSD framework comprises three core components: 1. Circular latent space encoder: Maps states onto the unit circle. 2. Temporal-distance-aware training: Trains the encoder to capture temporal distance information. 3. Periodic skill policy: Generates behaviors based on periodic representations in the latent space.
Key Designs¶
Circular latent space: - Maps state \(s\) to an angle \(\phi(s) \in [0, 2\pi)\) on the unit circle \(\mathcal{S}^1\).
Method¶
Overall Architecture¶
The PSD framework comprises three core components: 1. Circular latent space encoder: Maps states onto the unit circle. 2. Temporal-distance-aware training: Trains the encoder to capture temporal distance information. 3. Periodic skill policy: Generates behaviors based on periodic representations in the latent space.
Key Designs¶
Circular latent space: - Maps state \(s\) to an angle \(\phi(s) \in [0, 2\pi)\) on the unit circle \(\mathcal{S}^1\). - Circular topology naturally supports periodicity: traversing from \(0\) to \(2\pi\) constitutes one complete cycle. - Different angular velocities correspond to different skill periods.
Temporal distance encoding: - The encoder is trained such that the arc-length distance between two points on the circle approximates the temporal distance between the corresponding states. - This ensures that temporally adjacent states are also adjacent on the circle. - One complete locomotion cycle corresponds to one full rotation around the circle.
Skill parameterization: - Each skill is parameterized by its angular frequency \(\omega\) in the circular latent space. - Small \(\omega\) → slow-period motion (e.g., slow walking). - Large \(\omega\) → fast-period motion (e.g., rapid running). - Different values of \(\omega\) naturally yield skills with distinct periods.
Pixel observation support: - The encoder can directly process pixel-level observations. - No hand-crafted state features are required.
Loss & Training¶
- Temporal distance contrastive loss: Ensures the encoder correctly captures temporal distances.
- Mutual information term: Promotes skill diversity.
- Policy gradient method: Updates the skill policy.
- Staged training: The encoder is trained first, followed by the skill policy.
Key Experimental Results¶
Main Results¶
Skill discovery results across multiple MuJoCo robotic environments:
| Environment | Method | Skill Diversity | Periodicity Coverage | Downstream Task Performance |
|---|---|---|---|---|
| Ant | DIAYN | Moderate | Poor | Baseline |
| Ant | CIC | Good | Poor | Slightly better |
| Ant | PSD | Best | Best | Best |
| HalfCheetah | DIAYN | Moderate | Poor | Baseline |
| HalfCheetah | CIC | Good | Poor | Slightly better |
| HalfCheetah | PSD | Best | Best | Best |
Downstream task (hurdle crossing) performance:
| Method | Ant Hurdle Success Rate | HalfCheetah Hurdle Success Rate |
|---|---|---|
| DIAYN | Low | Low |
| CIC | Moderate | Moderate |
| PSD | High | High |
| PSD + DIAYN (combined) | Highest | Highest |
Ablation Study¶
| Ablation | Variant | Effect |
|---|---|---|
| Latent space topology | Euclidean vs. circular | Circular significantly outperforms Euclidean |
| Temporal distance | With vs. without | Temporal distance encoding is critical |
| Pixel vs. state | Observation type | Remains effective under pixel observations |
| Period range | Narrow vs. wide | Wider range yields more diverse skills |
Key Findings¶
- Circular latent space is naturally suited to the periodic requirements of locomotion tasks.
- Skills discovered by PSD exhibit clearly diverse periods, ranging from slow crawling to fast running.
- Periodic skills significantly outperform non-periodic baselines on downstream tasks such as hurdle crossing.
- PSD is complementary to existing skill discovery methods (e.g., DIAYN), and their combination yields further improvement.
Highlights & Insights¶
- Clear geometric intuition: Circle = periodicity — a concise yet effective design principle.
- Filling a gap: The first systematic exploitation of periodic structure in skill discovery.
- Complementary combination: PSD extends rather than replaces existing methods, expanding the skill repertoire.
- Pixel-level applicability: Requires no hand-crafted features, making it applicable to more general settings.
Limitations & Future Work¶
- The circular latent space is primarily suited to single-frequency periodic motions; modeling multi-frequency composite motions remains unexplored.
- Validation is mainly conducted on locomotion tasks; applicability to other periodic tasks (e.g., manipulation, aerial locomotion) is not examined.
- The staged training procedure could potentially be replaced by more efficient end-to-end training.
- Skill discovery over a continuous spectrum of periods, rather than a discrete set, is not explored.
- Sim-to-real transfer for deployment on physical robots is not addressed.
Related Work & Insights¶
- DIAYN: Unsupervised skill discovery via mutual information maximization.
- CIC: Contrastive intrinsic control.
- DADS: Dynamics-aware skill discovery.
- Spectral RL: Leveraging spectral decomposition to understand state space structure.
- Insight: Incorporating topological or geometric structure into latent space design is a promising research direction.
Rating¶
- Novelty: ⭐⭐⭐⭐ (Encoding periodicity via a circular latent space is an innovative idea)
- Technical Depth: ⭐⭐⭐⭐ (Theoretical motivation is clear and well-grounded)
- Experimental Thoroughness: ⭐⭐⭐⭐ (Multiple environments + downstream tasks + ablations)
- Value: ⭐⭐⭐⭐ (Direct applicability to locomotion control)