AnimalClue: Recognizing Animals by their Traces¶

Conference: ICCV 2025 arXiv: 2507.20240 Code: https://dahlian00.github.io/AnimalCluePage/ Area: Segmentation / Object Detection / Image Classification Keywords: animal trace recognition, wildlife conservation, indirect evidence, dataset, instance segmentation

TL;DR¶

This paper introduces AnimalClue, the first large-scale dataset for animal trace recognition, containing 159,605 bounding boxes spanning 968 species across five categories of indirect clues (footprints, feces, eggs, bones, and feathers), and establishes four benchmarks covering classification, detection, instance segmentation, and attribute prediction.

Background & Motivation¶

Wildlife monitoring is critical for biodiversity conservation. Computer vision has made significant advances in direct animal recognition (appearance-based detection), yet species identification from indirect evidence (e.g., footprints, feces) remains largely underexplored. Existing datasets suffer from severe limitations: - OpenAnimalTracks contains only 18 species and 3,579 bounding boxes - FeathersV1 supports classification tasks only - Existing datasets cover few species and provide limited annotation types

Ecological surveys extensively rely on indirect evidence for species identification, yet this process is highly labor-intensive and urgently requires automated computer vision solutions. AnimalClue aims to fill this gap by providing a large-scale benchmark encompassing multiple trace types and tasks.

Method¶

Overall Architecture¶

AnimalClue is a dataset-and-benchmark contribution whose core value lies in data construction and experimental evaluation.

Key Designs¶

Data Collection
- Images are collected from the iNaturalist platform, selecting research-grade observations whose labels have been verified by multiple citizen scientists
- Only Creative Commons licensed images are retained; blurry, unclear, and face-containing images are removed
- Five categories of animal traces are covered: footprints (18,291 bboxes), feces (18,932 bboxes), bones (16,553 bboxes), eggs (29,434 bboxes), and feathers (76,395 bboxes)
Annotation Strategy
- Footprints: bounding boxes only (since footprints are traces rather than physical entities, and boundaries are often ambiguous)
- Feces, bones, eggs, feathers: pixel-level segmentation masks are provided
- SAM is used to assist initial annotation for feces and eggs, with manual verification by the authors
- Multiple images from the same iNaturalist observation are not split across train/test sets, preventing data leakage
Fine-Grained Attribute Annotation
- A total of 22 ecological and behavioral attributes are annotated, including:
  - Taxonomic information (order, family)
  - Diet type (herbivore, carnivore, omnivore)
  - Activity pattern (diurnal, nocturnal, crepuscular)
  - Habitat preference (forest, grassland, desert, wetland, mountain, urban)
  - Climate distribution (tropical, subtropical, temperate, boreal, polar)
  - Social behavior (gregarious, migratory, predator)
Frequency Partitioning
- Species are divided into three groups based on training-set frequency: frequent (top 20%), intermediate (middle 60%), and rare (bottom 20%)
- Partitioning is performed independently for each of the five trace types

Dataset Statistics¶

Trace Type	BBoxes	Images	Species	Families	Orders
Footprints	18,291	7,581	117	46	20
Feces	18,932	6,433	101	46	21
Bones	16,553	12,908	269	112	45
Eggs	29,434	9,394	283	67	20
Feathers	76,395	60,491	555	89	30

Key Experimental Results¶

Main Results — Classification¶

Model	Footprints (Species)	Feces (Species)	Eggs (Species)	Bones (Species)	Feathers (Species)
VGG-16	28.8	29.6	45.2	14.7	56.7
ResNet-50	23.7	29.4	41.1	18.3	59.7
ViT-B	29.2	32.2	46.7	15.0	55.9
Swin-B	32.3	38.6	49.4	20.5	65.3

Ablation Study / Detection Results¶

Detection Model	Footprints (Species mAP)	Eggs (Species mAP)	Feathers (Species mAP)
YOLOv8	0.10	0.13	0.25
YOLOv11	0.10	0.14	0.25
RT-DETR	0.10	0.04	0.17
DINO	0.08	0.20	0.15

Segmentation Model	Feces (Species)	Eggs (Species)	Bones (Species)	Feathers (Species)
YOLOv8	0.11	0.11	0.07	0.24
MaskDINO	0.13	0.25	0.07	0.18
YOLOv11	0.11	0.12	0.06	0.24

Key Findings¶

Swin-B consistently achieves the best performance across all classification tasks, indicating that Transformer architectures are better suited to capturing fine-grained trace features
Feather recognition yields the highest accuracy (65.3%) despite covering the most species (555), owing to distinctive color and texture patterns
Bone recognition is the most challenging (20.5%), as appearance varies substantially across body parts
Rare-species recognition is extremely difficult: Swin-B achieves only 14.2% at the species level for rare footprints and 2.52% for rare feathers
Detection and segmentation mAP values are consistently low (the highest order-level detection mAP is 0.57), indicating the task is far from solved
CLIP fine-tuned on AnimalClue exhibits the best feature separation in t-SNE visualizations

Highlights & Insights¶

Novel problem formulation: Identifying animal species from indirect evidence is complementary to conventional appearance-based recognition and holds significant ecological application value
Scale and comprehensiveness: Covering 968 species, 5 trace types, 4 tasks, and 22 attribute annotations, AnimalClue substantially surpasses all existing datasets
Revealing key challenges: Difficulty in generalizing to rare species and extremely low species-level detection/segmentation mAP demonstrate that substantial research opportunities remain in this field

Limitations & Future Work¶

Species distribution is highly imbalanced, with a severe long-tail problem
Footprints are annotated with bounding boxes only, lacking segmentation masks
Only standard baseline models are evaluated; strategies such as pre-training or domain adaptation are not explored
Joint recognition across trace types (e.g., simultaneously leveraging footprints and feces to identify the same species) is not investigated
Data sourced primarily from iNaturalist may introduce geographic and species bias

AnimalClue is complementary to conventional animal appearance recognition datasets (iNat, CUB-200)
The 22 attribute annotations provide rich auxiliary signals for multi-task learning and zero-shot learning
The approach could inspire extensions to other indirect evidence recognition scenarios, such as crime scene analysis and archaeology

Rating¶

Novelty: ⭐⭐⭐⭐ First large-scale indirect animal trace dataset with a novel problem formulation
Experimental Thoroughness: ⭐⭐⭐⭐⭐ Four task benchmarks, comprehensive multi-model evaluation, and thorough frequency analysis
Writing Quality: ⭐⭐⭐⭐ Dataset construction is clearly described with complete statistics
Value: ⭐⭐⭐⭐ Opens a new direction for computer vision research in wildlife monitoring with lasting dataset impact