Back to guides
Normalization

How to Normalize Job Locations at Scale

CleanJobData Engineering

Hiring data is notoriously messy. One of the biggest challenges in building a high-quality job board is location normalization.

The Problem: Messy Geo Data

When you scrape job postings or pull from ATS APIs, you'll encounter a wide variety of location formats:

  • Abbreviations: "NYC", "SF", "LA"
  • Redundant info: "San Francisco, California, United States, North America"
  • Vague terms: "Remote", "Hybrid", "Anywhere"
  • Non-standard names: "The Big Apple"

If you store these as raw strings, your users won't be able to filter by "California" and see jobs in "San Francisco".

The Solution: Canonical Geo IDs

At CleanJobData, we solve this by mapping every location string to a canonical ID in our global database of 200,000+ locations.

1. Parsing the String

We use a multi-step parser that identifies city, state, and country components. We prioritize source-specific hints (e.g., if the job is from a UK-based employer, we bias towards UK cities).

2. Resolution Hierarchy

We resolve locations in a hierarchy:

  1. City ID: The most specific match.
  2. State/Province ID: If the city is unknown but the state is clear.
  3. Country ID: The fallback for country-wide roles.

3. Handling Remote

We treat "Remote" as a first-class property, not just a location name. We normalize it into a boolean flag and a work-setting category.

Why it Matters

By using structured IDs instead of strings, you can:

  • Build reliable regional filters.
  • Create SEO-friendly landing pages for specific cities.
  • Perform labor market analytics with high precision.

Our API handles all of this for you. Every job object returned by CleanJobData includes a normalized location object with canonical IDs.