Back to articles

ATS Integration Patterns for Clean Job Data

CleanJobData Engineering

Every ATS exposes hiring data in a different shape. One platform returns structured departments, another stores compensation in description text, and another exposes only a hosted job-board page. CleanJobData hides those differences behind one API contract: list jobs, filter them, and render them without writing a parser for every source.

Set CLEANJOBDATA_API_BASE_URL to your API base URL before running the curl examples.

What the API Hides

The backend does not expect every source to provide the same fields. Each ATS adapter maps its native response into the job shape used by the database and public API:

  • title
  • location
  • locations
  • description
  • application_url
  • published
  • has_remote
  • employment_type
  • salary_min
  • salary_max
  • salary_currency
  • salary_text
  • experience_level
  • experience_levels
  • company

That shape is enforced before data is written to Postgres. Required fields such as title and application_url are validated, optional values are coerced to null, and arrays such as experience_levels are kept stable even when a source is silent.

Source Adapters

CleanJobData uses source-specific adapters instead of one generic scraper. That matters because each ATS has its own API contract and quirks.

  • Ashby — structured postings, secondary locations, and sometimes compensation. Normalization preserves multiple locations, parses structured salary ranges, and keeps posting metadata.
  • Lever — JSON postings plus detailed job content. Normalization resolves departments into employment context and parses locations and descriptions.
  • Greenhouse — public board API with rich content. Normalization maps jobs, departments, locations, and first-published dates into the common schema.
  • Workable — widget API with shortcode-based jobs. Normalization extracts active postings, board context, and location metadata.
  • SmartRecruiters — posting API with detail pages. Normalization fetches details when the list response is incomplete and normalizes released dates.

The adapter layer is where source-specific concepts are translated into JobsDataAPI concepts. For example, a Greenhouse first_published date becomes published, a Lever department can help infer employment context, and an Ashby secondaryLocations list becomes multiple normalized location rows.

Normalization Pipeline

The pipeline has four stages.

1. Adapter Mapping

Each adapter fetches raw jobs from its ATS and returns a JobsDataAPI-shaped object. The adapter is responsible for naming conventions, source-specific date fields, company metadata, and source identifiers.

2. Structural Enforcement

Before insertion, enforceJobStructure normalizes the raw object:

  • trims and validates titles and application URLs
  • coerces invalid dates to safe values
  • uppercases country codes
  • parses salary values into numbers
  • normalizes employment type and language
  • derives experience_level and experience_levels from the title
  • converts company objects into a consistent JSON shape
  • normalizes each location row into known keys

This keeps the API stable even when source APIs change slightly.

3. Location Resolution

Raw locations are rarely enough for filters. A job may say SF, San Francisco, CA, Remote - US, or London / New York. The location normalizer combines structured provider locations with parsed location strings, then resolves city, state, and country IDs against the geographic database.

The resulting locations array is what powers reliable filters such as:

curl "$CLEANJOBDATA_API_BASE_URL/jobs?city_id=123&remote=true" \
  -H "Authorization: Bearer $CLEANJOBDATA_API_KEY"

4. Deduplication and Expiry

Jobs are keyed by application_url. If the same posting appears again, the pipeline avoids creating a duplicate row. Expired or inactive postings are tracked with is_active and expired_at, and source-level syncs can mark missing source records as expired.

Querying Normalized Jobs

The public list endpoint accepts the normalized fields developers care about:

curl "$CLEANJOBDATA_API_BASE_URL/jobs?title=frontend%20engineer&experience_level=SE&remote=true&salary=120000,180000&limit=25" \
  -H "Authorization: Bearer $CLEANJOBDATA_API_KEY"

Useful filters include:

  • title or search
  • city_id, state_id, country_id, or location=US
  • remote=true, remote_only=true, or has_remote=true
  • experience_level=EN,MI,SE,EX
  • salary=120000,180000
  • published_after=2026-06-01
  • max_age=7d
  • domain=company.com
  • sort_by=published or sort_by=relevance

The response includes data, cursor pagination, and meta.filters_applied so your UI can echo the exact filters that were applied.

What Consumers Should Not Do

A clean API is designed to remove integration work. Consumers should not need to:

  • parse ATS-specific JSON responses
  • scrape employer career sites directly
  • guess salary values from text
  • fuzzy-match NYC, New York, and New York, NY
  • write title heuristics for seniority levels
  • maintain separate schemas for each ATS

CleanJobData normalizes those details once, then exposes a stable API that can power job boards, search products, analytics dashboards, and AI hiring tools.