Periodic Skill Discovery¶

Conference: NeurIPS 2025 arXiv: 2511.03187 Code: Available (jonghaepark.github.io/psd) Area: Reinforcement Learning / Skill Discovery Keywords: Unsupervised skill discovery, periodic behavior, circular latent space, locomotion control, robotics

TL;DR¶

This paper proposes Periodic Skill Discovery (PSD), a framework that maps states onto a circular latent space to naturally encode periodicity, enabling unsupervised discovery of diverse locomotion skills with varying periods.

Background & Motivation¶

Root Cause¶

Background: Unsupervised skill discovery is an important direction in reinforcement learning, aiming to learn diverse behaviors without relying on extrinsic rewards. However, existing methods overlook a fundamental issue:

Ignoring the periodic nature of skills: Most methods focus on maximizing mutual information between states and skills, or maximizing traversal distance in the latent space.

Periodicity requirements in locomotion tasks: Many robotic tasks, particularly locomotion, inherently require periodic behaviors at different temporal scales (e.g., walking, running, jumping).

Limitations of Prior Work: Mutual-information-based methods such as DIAYN struggle to naturally discover skills with varying periods.

The core motivation of PSD is to exploit the topological structure of a circular latent space to naturally encode periodicity, thereby discovering locomotion skills with diverse periods.

Paper Goals¶

Overall Architecture¶

The PSD framework comprises three core components: 1. Circular latent space encoder: Maps states onto the unit circle. 2. Temporal-distance-aware training: Trains the encoder to capture temporal distance information. 3. Periodic skill policy: Generates behaviors based on periodic representations in the latent space.

Key Designs¶

Circular latent space: - Maps state \(s\) to an angle \(\phi(s) \in [0, 2\pi)\) on the unit circle \(\mathcal{S}^1\).

Method¶

Overall Architecture¶

The PSD framework comprises three core components: 1. Circular latent space encoder: Maps states onto the unit circle. 2. Temporal-distance-aware training: Trains the encoder to capture temporal distance information. 3. Periodic skill policy: Generates behaviors based on periodic representations in the latent space.

Key Designs¶

Circular latent space: - Maps state \(s\) to an angle \(\phi(s) \in [0, 2\pi)\) on the unit circle \(\mathcal{S}^1\). - Circular topology naturally supports periodicity: traversing from \(0\) to \(2\pi\) constitutes one complete cycle. - Different angular velocities correspond to different skill periods.

Temporal distance encoding: - The encoder is trained such that the arc-length distance between two points on the circle approximates the temporal distance between the corresponding states. - This ensures that temporally adjacent states are also adjacent on the circle. - One complete locomotion cycle corresponds to one full rotation around the circle.

Skill parameterization: - Each skill is parameterized by its angular frequency \(\omega\) in the circular latent space. - Small \(\omega\) → slow-period motion (e.g., slow walking). - Large \(\omega\) → fast-period motion (e.g., rapid running). - Different values of \(\omega\) naturally yield skills with distinct periods.

Pixel observation support: - The encoder can directly process pixel-level observations. - No hand-crafted state features are required.

Loss & Training¶

Temporal distance contrastive loss: Ensures the encoder correctly captures temporal distances.
Mutual information term: Promotes skill diversity.
Policy gradient method: Updates the skill policy.
Staged training: The encoder is trained first, followed by the skill policy.

Key Experimental Results¶

Main Results¶

Skill discovery results across multiple MuJoCo robotic environments:

Environment	Method	Skill Diversity	Periodicity Coverage	Downstream Task Performance
Ant	DIAYN	Moderate	Poor	Baseline
Ant	CIC	Good	Poor	Slightly better
Ant	PSD	Best	Best	Best
HalfCheetah	DIAYN	Moderate	Poor	Baseline
HalfCheetah	CIC	Good	Poor	Slightly better
HalfCheetah	PSD	Best	Best	Best

Downstream task (hurdle crossing) performance:

Method	Ant Hurdle Success Rate	HalfCheetah Hurdle Success Rate
DIAYN	Low	Low
CIC	Moderate	Moderate
PSD	High	High
PSD + DIAYN (combined)	Highest	Highest

Ablation Study¶

Ablation	Variant	Effect
Latent space topology	Euclidean vs. circular	Circular significantly outperforms Euclidean
Temporal distance	With vs. without	Temporal distance encoding is critical
Pixel vs. state	Observation type	Remains effective under pixel observations
Period range	Narrow vs. wide	Wider range yields more diverse skills

Key Findings¶

Circular latent space is naturally suited to the periodic requirements of locomotion tasks.
Skills discovered by PSD exhibit clearly diverse periods, ranging from slow crawling to fast running.
Periodic skills significantly outperform non-periodic baselines on downstream tasks such as hurdle crossing.
PSD is complementary to existing skill discovery methods (e.g., DIAYN), and their combination yields further improvement.

Highlights & Insights¶

Clear geometric intuition: Circle = periodicity — a concise yet effective design principle.
Filling a gap: The first systematic exploitation of periodic structure in skill discovery.
Complementary combination: PSD extends rather than replaces existing methods, expanding the skill repertoire.
Pixel-level applicability: Requires no hand-crafted features, making it applicable to more general settings.

Limitations & Future Work¶

The circular latent space is primarily suited to single-frequency periodic motions; modeling multi-frequency composite motions remains unexplored.
Validation is mainly conducted on locomotion tasks; applicability to other periodic tasks (e.g., manipulation, aerial locomotion) is not examined.
The staged training procedure could potentially be replaced by more efficient end-to-end training.
Skill discovery over a continuous spectrum of periods, rather than a discrete set, is not explored.
Sim-to-real transfer for deployment on physical robots is not addressed.

DIAYN: Unsupervised skill discovery via mutual information maximization.
CIC: Contrastive intrinsic control.
DADS: Dynamics-aware skill discovery.
Spectral RL: Leveraging spectral decomposition to understand state space structure.
Insight: Incorporating topological or geometric structure into latent space design is a promising research direction.

Rating¶

Novelty: ⭐⭐⭐⭐ (Encoding periodicity via a circular latent space is an innovative idea)
Technical Depth: ⭐⭐⭐⭐ (Theoretical motivation is clear and well-grounded)
Experimental Thoroughness: ⭐⭐⭐⭐ (Multiple environments + downstream tasks + ablations)
Value: ⭐⭐⭐⭐ (Direct applicability to locomotion control)

Periodic Skill Discovery¶

TL;DR¶

Background & Motivation¶

Root Cause¶

Paper Goals¶

Overall Architecture¶

Key Designs¶

Method¶

Overall Architecture¶

Key Designs¶

Loss & Training¶

Key Experimental Results¶

Main Results¶

Ablation Study¶

Key Findings¶

Highlights & Insights¶

Limitations & Future Work¶

Related Work & Insights¶

Rating¶

Related Papers¶