Back to guides
Normalization

Extracting Salary Data from Job Descriptions

CleanJobData Engineering

Salary transparency is becoming the law in many regions, but the data is still often provided in unstructured text blocks.

The Challenge

Salary information can appear in many formats:

  • "$120,000 - $150,000 per year"
  • "£50k - £70k"
  • "Up to $200/hr"
  • "Competitive salary + equity"

Our Extraction Pipeline

We use a combination of source-specific metadata and advanced regex patterns to extract structured pay data.

1. Structured Field Detection

Some ATS platforms (like Ashby) provide structured compensation fields. We prioritize these as they are the most accurate.

2. Regex-Based Parsing

For unstructured text, we run a series of patterns to identify:

  • Currency symbols ($, £, €, etc.)
  • Numeric ranges
  • Pay periods (hourly, monthly, annual)

3. Normalization

Once extracted, we normalize the values:

  • Annualization: Hourly and monthly rates are converted to annual equivalents for easy comparison.
  • Currency Mapping: We identify the ISO currency code.

Using the Data

Our API exposes these as three clean fields:

  • salary_min: The lower bound of the range.
  • salary_max: The upper bound.
  • salary_currency: The ISO code (e.g., "USD").

This allows you to build powerful filters like "Jobs paying over $100k" without having to parse descriptions yourself.