NeurIPS 2025 Segmentation Terraced parcel extraction fine-grained boundary annotation semantic segmentation unsupervised domain adaptation remote sensing dataset

GTPBD: A Fine-Grained Global Terraced Parcel and Boundary Dataset¶

Conference: NeurIPS 2025 arXiv: 2507.14697 Code: Available Area: Segmentation Keywords: Terraced parcel extraction, fine-grained boundary annotation, semantic segmentation, unsupervised domain adaptation, remote sensing dataset

TL;DR¶

This paper introduces GTPBD, the first fine-grained global terraced parcel and boundary dataset, comprising 47,537 high-resolution images (0.5–0.7 m) with over 200,000 manually annotated parcels. It provides three-level labels supporting four tasks—semantic segmentation, edge detection, agricultural parcel extraction, and unsupervised domain adaptation—and presents comprehensive benchmarks across 20 methods.

Background & Motivation¶

Background: Agricultural parcels are fundamental units for precision agriculture, food security assessment, and soil erosion monitoring. Approximately 120 million acres of terraced fields worldwide support 500 million mountain-dwelling people, reducing soil erosion by 23.7 billion tons annually, making them ecologically and economically invaluable.

Limitations of Prior Work: - Existing agricultural parcel datasets (FHAPD, AI4Boundaries, PASTIS, etc.) primarily focus on regular flatland fields, with virtually no coverage of complex terraced terrain. - Most datasets provide only binary mask labels, unable to distinguish shared boundaries from non-shared boundaries between adjacent terrace ridges—two distinct topological relationships. - Spatial resolution is insufficient (Sentinel-2 at 10 m, Landsat at 30 m) for fine-grained terraced parcel delineation. - Cross-domain UDA evaluation is absent, leaving model generalization largely unassessed.

Key Challenge: The lack of high-resolution imagery covering major global terraced regions, combined with the absence of multi-level label design and multi-domain partitioning, prevents unified multi-task benchmarking.

Goal: To collect high-resolution imagery over major global terraced areas, design a three-level label system with three-domain partitioning, and establish a unified multi-task benchmark evaluation platform.

Method¶

Overall Architecture¶

The core contribution of GTPBD is a dataset construction and multi-dimensional evaluation framework. The overall pipeline proceeds as follows: image acquisition (GF-2/Google Earth) → manual vectorization annotation in QGIS → three-level label generation (mask/boundary/parcel) → three-domain partitioning (South/North/Global) → four-task benchmark evaluation (SS/ED/APE/UDA).

Key Designs¶

1. Image Acquisition and Annotation - Sources: GF-2 satellite and Google Earth; spatial resolution 0.5–0.7 m; cloud-free imagery from 2021–2025. - Coverage: Seven major geographic regions in China plus 14 countries including Vietnam, Tunisia, Ethiopia, Peru, and Mexico. - Scale: 47,537 images at 512×512 pixels covering 885 km², with >200,000 terraced parcels. - Annotation Team: Over 50 undergraduate and graduate students performed vectorization annotation via QGIS, subject to rigorous quality review.

2. Three-Level Label Design

This is one of the most sophisticated design elements of the dataset. Each pixel simultaneously carries three labels: - Mask Label: Rasterized via GDAL (all-touched strategy); terrace = 1, background = 0; used for semantic segmentation. - Boundary Label: Generated by a single morphological erosion with a 3×3 rectangular kernel, producing 3-pixel-wide edge labels; used for edge detection. - Parcel Label: XOR operation between mask and boundary, \(\text{Parcel} = \text{Mask} \oplus \text{Boundary}\); used for parcel extraction.

Key annotation strategy: when ridge width ≥ 0.5 m, dual-side annotation is applied (independent vector boundaries on both sides); when < 0.5 m, shared-edge annotation is used (interior line features cutting larger parcels), accurately reflecting both terrace topological structures.

3. Three-Domain Partitioning (UDA Support) - South (southern China): small parcels, low spectral standard deviation, most pronounced long-tail distribution. - North (northern China): larger parcel areas. - Global (regions outside China): similar spectral means but large style variation. - Six transfer tasks are provided: S→N, S→G, N→S, N→G, G→S, G→N.

4. Dataset Comparison

Dataset	Resolution (m)	# Images	Area (km²)	Global Coverage	SS/APE/ED/UDA
FHAPD	1–2	68,982	<1000	✗	✓/✓/✓/✗
FTW	10	70,462	166,293	✓	✓/✓/✓/✗
AI4Boundaries	1/10	~15K	~53K	✗	✓/✓/✓/✗
GTPBD	0.5–0.7	47,537	885	✓	✓/✓/✓/✓

GTPBD is the only terraced field dataset that simultaneously supports all four tasks, provides global coverage, and achieves sub-meter resolution.

Loss & Training¶

As a dataset paper, all evaluated methods adopt the standard training configurations from their respective original publications. A unified SGD optimizer (momentum = 0.9, weight decay = 1e-4) is used with 512×512 random cropping and random flip/rotation augmentation, trained on NVIDIA RTX 4090 GPUs. The dataset is split 60%/20%/20% into training/validation/test sets; splitting is performed prior to cropping to ensure spatial independence across subsets.

Key Experimental Results¶

Main Results¶

Semantic Segmentation

Method	Prec.↑	Rec.↑	IoU↑	OA↑	F1↑
UNet	74.11	54.93	46.09	75.46	63.09
DeepLabV3	69.64	73.45	57.04	78.28	71.58
NonLocal	75.06	70.27	51.48	79.52	72.58
SegFormer	74.45	69.07	55.84	78.14	71.66
Mask2Former	71.22	74.33	57.16	78.73	72.74

Edge Detection and Parcel Extraction

Edge Detection Method	ODS↑	OIS↑	AP↑
MuGE	62.56	61.93	65.12
PiDiNet	53.70	53.12	52.92
REAUNet-Sober	65.06	63.73	70.09

Parcel Extraction Method	IoU↑	F1↑	GOC↓	GUC↓	GTC↓
Mask2Former	56.79	72.44	22.04	45.15	35.53
REAUNet	60.56	75.44	27.02	42.25	36.07
HBGNet	62.44	76.88	27.40	42.52	35.79

Ablation Study¶

UDA direction ablation (S→N):

Method	IoU↑	F1↑
Source Only	48.11	64.96
FDA	40.60	57.75
PiPa	56.35	72.09
HRDA	52.26	68.65
DAFormer	51.64	68.11

UDA performance in the N→S direction is substantially better than S→N (PiPa: IoU 66.65 vs. 56.35), indicating that transferring from large-parcel domains to small-parcel domains is considerably easier.

Key Findings¶

Precision vs. Recall trade-off: NonLocal achieves the best Precision/OA, while Mask2Former leads in Recall/IoU/F1, reflecting fundamental differences between CNN and Transformer architectures.
Importance of explicit edge priors: REAUNet-Sober, which incorporates built-in Sobel filters, comprehensively outperforms alternatives on complex terraced boundaries, demonstrating the critical role of explicit edge priors for this task.
Parcel extraction: HBGNet's dual-branch framework (parallel low-level boundary and high-level semantic processing) achieves the best IoU/F1/GTC scores.
UDA remains highly challenging: Even the best-performing UDA method (PiPa) lags considerably behind fully supervised counterparts; cross-domain terraced field adaptation remains an open problem.
Domain asymmetry: N→S transfer substantially outperforms S→N, reflecting the greater difficulty of learning fine-grained features characteristic of small parcels.

Highlights & Insights¶

Filling a critical gap: The first fine-grained global terraced parcel dataset, spanning 14 countries and seven major geographic regions in China.
Elegant three-level label design: A single vectorization annotation simultaneously generates three types of labels, maximizing the return on annotation effort.
Comprehensive evaluation framework: Three-dimensional metrics covering pixel level (Prec/Rec/IoU/OA/F1), object level (GOC/GUC/GTC), and edge level (ODS/OIS/AP).
Systematic benchmarking of 20 methods: 8 segmentation + 4 edge detection + 3 parcel extraction + 5 UDA methods, covering mainstream approaches across all tasks.

Limitations & Future Work¶

The total coverage of only 885 km² is limited compared to medium- and low-resolution datasets such as FTW (166K km²).
Only binary classification (terrace/background) is provided, without finer-grained semantics such as crop type.
Mountain terraces account for more than 80% of the dataset; hilly and valley terraces may be underrepresented.
Benchmarks for more advanced UDA methods (e.g., the MIC series) are not included.
Future work could incorporate foundation models such as SAM for zero-shot terraced field extraction evaluation.

The three-level label design (mask/boundary/parcel) is generalizable to other remote sensing scenarios requiring fine-grained parcel delineation, such as urban lots and wetlands.
The domain discrepancy analysis methodology for cross-domain terraced field extraction can inform broader geographic domain adaptation research.
The dataset serves as critical data infrastructure for precision agriculture and land monitoring applications.

Rating¶

Novelty: ⭐⭐⭐⭐ — First fine-grained global terraced field dataset, filling an important gap.
Experimental Thoroughness: ⭐⭐⭐⭐⭐ — 20 methods evaluated with a three-dimensional assessment framework; exceptionally comprehensive.
Writing Quality: ⭐⭐⭐⭐ — Clear structure with thorough statistical analysis.
Value: ⭐⭐⭐⭐ — Provides critical data infrastructure for terraced field remote sensing research.