Green Carbon · Technical Track B

Match farmers to their land.
Automatically.

Two datasets. 103 farmers. 42 polygons. No shared identifiers. Between them sits a carbon-credit registration that can't move forward. We built a pipeline that reconciles them, tells you which matches it trusts, and resolves the ones it doesn't.

Farmers in the dataset
Polygons on the map
ha
Total area under review
01The problem

Two records of the same reality. Neither one is enough.

The farmer list is a spreadsheet. The polygon map is drawn by field staff. Both are accurate. Neither references the other. Matching them by hand works at this scale. It doesn't work at ten thousand farmers.

Source A
Farmer list
103 rows, grouped A–E
Source B
Polygon map
42 hand-drawn fields, no group labels
!
This is the J-Credit bottleneck. Green Carbon's rice-paddy methane program is the largest of its kind in Japan. Every new operator who joins hits this same reconciliation step. The work that follows is what a carbon-credit registry actually needs: a single register that ties each farmer to a specific plot, with an audit trail, and an honest signal of which matches need a second look.
02Clustering

Recover the missing labels. From the geometry.

Farmers are grouped A through E. Polygons aren't grouped at all. We use weighted K-means on polygon centroids to produce five spatial clusters, then match each cluster to a farmer group by comparing total hectares. Every polygon gets a group label, inferred directly from where it sits on the map.

Inferred field groups
Click a group to isolate it on the map. Polygon totals per cluster match the farmer totals for that group within a single hectare. The clusters fall out as contiguous neighborhoods — the geometry agrees with how the land is actually organized.
03Matching

One polygon, multiple farmers. That's the normal case.

With 103 farmers on 42 polygons, every plot holds about 2.5 farmers on average. We solve this as a minimum-cost flow problem: each polygon has a capacity, each farmer sends one unit of flow, and the cost is the fit between their reported area and their share of the polygon. Click any polygon to see who landed on it.

?

Click a polygon to see its assigned farmers, their shares, and any matches flagged for review.

04Confidence

Every match has a number attached to it.

For each farmer, we compute the gap between their best and second-best candidate polygon, then compress it to a zero-to-one confidence score. High scores are unambiguous. Low scores mean the optimizer couldn't decide. The registry gets to see exactly where the pipeline is certain and where it isn't.

The pipeline tells you when it isn't sure.

We don't claim accuracy without ground truth. We publish a distribution. High-confidence matches are ready for J-Credit registration as-is. Low-confidence ones route to the resolution stage, where we narrow them down to the top few candidates and prepare them for field-staff verification.

This is what makes the pipeline auditable. Every number has a provenance. Every decision has a reason.

Auto-approved
Confidence above 0.25. Registry-ready.
Sent to resolver
Low confidence. See stage 5.
05Resolution

For each disputed match, propose the top three and anchor them to what's certain.

A flagged assignment means the optimizer wasn't sure. The resolver takes each flagged farmer, looks at their top three candidate polygons, and uses the confident assignments on those polygons as spatial anchors. If two farmers on a candidate polygon are already matched with high confidence, that's strong evidence the flagged farmer belongs there too. Pick a farmer from the queue to see how it works.

Review queue
— flagged
Pick a farmer from the queue to see their three best candidate polygons.