Question 1

Can I use CleanJobData for commercial AI products?

Accepted Answer

Yes — our standard terms permit use in commercial AI and ML products including job matching algorithms, salary prediction models, and recruitment automation tools. Review the full terms at cleanjobdata.com/terms.

Question 2

Is this real data or synthetic?

Accepted Answer

Real data, sourced directly from employer career pages via Greenhouse, Lever, Ashby, and Workable. These are live job postings from companies actively hiring — not generated or augmented examples.

Question 3

What's the best way to build a training dataset?

Accepted Answer

Use the REST API with date-range filters to pull a snapshot, or run regular syncs with the cursor parameter to build an ongoing dataset. For large historical pulls, contact us at cleanjobdata.com/support — we can discuss custom data delivery options.

Question 4

Does the data include job descriptions?

Accepted Answer

Yes — the full job description text is included in the detail endpoint response. The list endpoint returns a summary. For training NLP models on job description text, use the detail endpoint or batch-fetch by ID.

Question 5

How do you handle data freshness for ongoing training pipelines?

Accepted Answer

Use the max_age filter combined with cursor pagination to pull only new listings since your last sync. This keeps your training dataset current without re-downloading the full index on each run.

Job Data for AI Training

Benefits

Features

Frequently Asked Questions

Can I use CleanJobData for commercial AI products?

Is this real data or synthetic?

What's the best way to build a training dataset?

Does the data include job descriptions?

How do you handle data freshness for ongoing training pipelines?