GTPBD: A Fine-Grained Global Terraced Parcel and Boundary Dataset¶
Conference: NeurIPS 2025 arXiv: 2507.14697 Code: Available Area: Segmentation Keywords: Terraced parcel extraction, fine-grained boundary annotation, semantic segmentation, unsupervised domain adaptation, remote sensing dataset
TL;DR¶
This paper introduces GTPBD, the first fine-grained global terraced parcel and boundary dataset, comprising 47,537 high-resolution images (0.5–0.7 m) with over 200,000 manually annotated parcels. It provides three-level labels supporting four tasks—semantic segmentation, edge detection, agricultural parcel extraction, and unsupervised domain adaptation—and presents comprehensive benchmarks across 20 methods.
Background & Motivation¶
Background: Agricultural parcels are fundamental units for precision agriculture, food security assessment, and soil erosion monitoring. Approximately 120 million acres of terraced fields worldwide support 500 million mountain-dwelling people, reducing soil erosion by 23.7 billion tons annually, making them ecologically and economically invaluable.
Limitations of Prior Work: - Existing agricultural parcel datasets (FHAPD, AI4Boundaries, PASTIS, etc.) primarily focus on regular flatland fields, with virtually no coverage of complex terraced terrain. - Most datasets provide only binary mask labels, unable to distinguish shared boundaries from non-shared boundaries between adjacent terrace ridges—two distinct topological relationships. - Spatial resolution is insufficient (Sentinel-2 at 10 m, Landsat at 30 m) for fine-grained terraced parcel delineation. - Cross-domain UDA evaluation is absent, leaving model generalization largely unassessed.
Key Challenge: The lack of high-resolution imagery covering major global terraced regions, combined with the absence of multi-level label design and multi-domain partitioning, prevents unified multi-task benchmarking.
Goal: To collect high-resolution imagery over major global terraced areas, design a three-level label system with three-domain partitioning, and establish a unified multi-task benchmark evaluation platform.
Method¶
Overall Architecture¶
The core contribution of GTPBD is a dataset construction and multi-dimensional evaluation framework. The overall pipeline proceeds as follows: image acquisition (GF-2/Google Earth) → manual vectorization annotation in QGIS → three-level label generation (mask/boundary/parcel) → three-domain partitioning (South/North/Global) → four-task benchmark evaluation (SS/ED/APE/UDA).
Key Designs¶
1. Image Acquisition and Annotation - Sources: GF-2 satellite and Google Earth; spatial resolution 0.5–0.7 m; cloud-free imagery from 2021–2025. - Coverage: Seven major geographic regions in China plus 14 countries including Vietnam, Tunisia, Ethiopia, Peru, and Mexico. - Scale: 47,537 images at 512×512 pixels covering 885 km², with >200,000 terraced parcels. - Annotation Team: Over 50 undergraduate and graduate students performed vectorization annotation via QGIS, subject to rigorous quality review.
2. Three-Level Label Design
This is one of the most sophisticated design elements of the dataset. Each pixel simultaneously carries three labels: - Mask Label: Rasterized via GDAL (all-touched strategy); terrace = 1, background = 0; used for semantic segmentation. - Boundary Label: Generated by a single morphological erosion with a 3×3 rectangular kernel, producing 3-pixel-wide edge labels; used for edge detection. - Parcel Label: XOR operation between mask and boundary, \(\text{Parcel} = \text{Mask} \oplus \text{Boundary}\); used for parcel extraction.
Key annotation strategy: when ridge width ≥ 0.5 m, dual-side annotation is applied (independent vector boundaries on both sides); when < 0.5 m, shared-edge annotation is used (interior line features cutting larger parcels), accurately reflecting both terrace topological structures.
3. Three-Domain Partitioning (UDA Support) - South (southern China): small parcels, low spectral standard deviation, most pronounced long-tail distribution. - North (northern China): larger parcel areas. - Global (regions outside China): similar spectral means but large style variation. - Six transfer tasks are provided: S→N, S→G, N→S, N→G, G→S, G→N.
4. Dataset Comparison
| Dataset | Resolution (m) | # Images | Area (km²) | Global Coverage | SS/APE/ED/UDA |
|---|---|---|---|---|---|
| FHAPD | 1–2 | 68,982 | <1000 | ✗ | ✓/✓/✓/✗ |
| FTW | 10 | 70,462 | 166,293 | ✓ | ✓/✓/✓/✗ |
| AI4Boundaries | 1/10 | ~15K | ~53K | ✗ | ✓/✓/✓/✗ |
| GTPBD | 0.5–0.7 | 47,537 | 885 | ✓ | ✓/✓/✓/✓ |
GTPBD is the only terraced field dataset that simultaneously supports all four tasks, provides global coverage, and achieves sub-meter resolution.
Loss & Training¶
As a dataset paper, all evaluated methods adopt the standard training configurations from their respective original publications. A unified SGD optimizer (momentum = 0.9, weight decay = 1e-4) is used with 512×512 random cropping and random flip/rotation augmentation, trained on NVIDIA RTX 4090 GPUs. The dataset is split 60%/20%/20% into training/validation/test sets; splitting is performed prior to cropping to ensure spatial independence across subsets.
Key Experimental Results¶
Main Results¶
Semantic Segmentation
| Method | Prec.↑ | Rec.↑ | IoU↑ | OA↑ | F1↑ |
|---|---|---|---|---|---|
| UNet | 74.11 | 54.93 | 46.09 | 75.46 | 63.09 |
| DeepLabV3 | 69.64 | 73.45 | 57.04 | 78.28 | 71.58 |
| NonLocal | 75.06 | 70.27 | 51.48 | 79.52 | 72.58 |
| SegFormer | 74.45 | 69.07 | 55.84 | 78.14 | 71.66 |
| Mask2Former | 71.22 | 74.33 | 57.16 | 78.73 | 72.74 |
Edge Detection and Parcel Extraction
| Edge Detection Method | ODS↑ | OIS↑ | AP↑ |
|---|---|---|---|
| MuGE | 62.56 | 61.93 | 65.12 |
| PiDiNet | 53.70 | 53.12 | 52.92 |
| REAUNet-Sober | 65.06 | 63.73 | 70.09 |
| Parcel Extraction Method | IoU↑ | F1↑ | GOC↓ | GUC↓ | GTC↓ |
|---|---|---|---|---|---|
| Mask2Former | 56.79 | 72.44 | 22.04 | 45.15 | 35.53 |
| REAUNet | 60.56 | 75.44 | 27.02 | 42.25 | 36.07 |
| HBGNet | 62.44 | 76.88 | 27.40 | 42.52 | 35.79 |
Ablation Study¶
UDA direction ablation (S→N):
| Method | IoU↑ | F1↑ |
|---|---|---|
| Source Only | 48.11 | 64.96 |
| FDA | 40.60 | 57.75 |
| PiPa | 56.35 | 72.09 |
| HRDA | 52.26 | 68.65 |
| DAFormer | 51.64 | 68.11 |
UDA performance in the N→S direction is substantially better than S→N (PiPa: IoU 66.65 vs. 56.35), indicating that transferring from large-parcel domains to small-parcel domains is considerably easier.
Key Findings¶
- Precision vs. Recall trade-off: NonLocal achieves the best Precision/OA, while Mask2Former leads in Recall/IoU/F1, reflecting fundamental differences between CNN and Transformer architectures.
- Importance of explicit edge priors: REAUNet-Sober, which incorporates built-in Sobel filters, comprehensively outperforms alternatives on complex terraced boundaries, demonstrating the critical role of explicit edge priors for this task.
- Parcel extraction: HBGNet's dual-branch framework (parallel low-level boundary and high-level semantic processing) achieves the best IoU/F1/GTC scores.
- UDA remains highly challenging: Even the best-performing UDA method (PiPa) lags considerably behind fully supervised counterparts; cross-domain terraced field adaptation remains an open problem.
- Domain asymmetry: N→S transfer substantially outperforms S→N, reflecting the greater difficulty of learning fine-grained features characteristic of small parcels.
Highlights & Insights¶
- Filling a critical gap: The first fine-grained global terraced parcel dataset, spanning 14 countries and seven major geographic regions in China.
- Elegant three-level label design: A single vectorization annotation simultaneously generates three types of labels, maximizing the return on annotation effort.
- Comprehensive evaluation framework: Three-dimensional metrics covering pixel level (Prec/Rec/IoU/OA/F1), object level (GOC/GUC/GTC), and edge level (ODS/OIS/AP).
- Systematic benchmarking of 20 methods: 8 segmentation + 4 edge detection + 3 parcel extraction + 5 UDA methods, covering mainstream approaches across all tasks.
Limitations & Future Work¶
- The total coverage of only 885 km² is limited compared to medium- and low-resolution datasets such as FTW (166K km²).
- Only binary classification (terrace/background) is provided, without finer-grained semantics such as crop type.
- Mountain terraces account for more than 80% of the dataset; hilly and valley terraces may be underrepresented.
- Benchmarks for more advanced UDA methods (e.g., the MIC series) are not included.
- Future work could incorporate foundation models such as SAM for zero-shot terraced field extraction evaluation.
Related Work & Insights¶
- The three-level label design (mask/boundary/parcel) is generalizable to other remote sensing scenarios requiring fine-grained parcel delineation, such as urban lots and wetlands.
- The domain discrepancy analysis methodology for cross-domain terraced field extraction can inform broader geographic domain adaptation research.
- The dataset serves as critical data infrastructure for precision agriculture and land monitoring applications.
Rating¶
- Novelty: ⭐⭐⭐⭐ — First fine-grained global terraced field dataset, filling an important gap.
- Experimental Thoroughness: ⭐⭐⭐⭐⭐ — 20 methods evaluated with a three-dimensional assessment framework; exceptionally comprehensive.
- Writing Quality: ⭐⭐⭐⭐ — Clear structure with thorough statistical analysis.
- Value: ⭐⭐⭐⭐ — Provides critical data infrastructure for terraced field remote sensing research.