Data Quality & AI Eval Engineer
Contractor · Deep security experience · AI evaluation
About the Role
We're looking for a Data Quality & AI Eval Engineer on a contract basis to own how we measure the accuracy and trustworthiness of our AI-driven threat intelligence. This role sits at the intersection of deep cybersecurity expertise and rigorous evaluation. You'll be the person who can look at a pipeline's output and know whether it's right.
Mallory uses AI to correlate thousands of threat intelligence sources and contextualize them to a customer's environment. Quality is everything in security. A false positive wastes a defender's time, and a missed signal can be catastrophic. You'll build the evals, golden datasets, and quality gates that keep us honest and continuously improving.
What You'll Do
- Design and build evaluation harnesses that measure the accuracy, precision, and recall of our AI-driven threat intelligence pipelines
- Define data quality metrics and ground-truth datasets for entity resolution, enrichment, and correlation
- Build automated quality gates and regression suites that catch model and data drift before it reaches customers
- Apply your security expertise to judge whether outputs are correct, distinguishing true threats, false positives, and missed signal
- Create labeling guidelines and golden datasets, and curate hard cases that stress-test the system
- Evaluate LLM prompts, retrieval, and structured outputs, and recommend changes that measurably improve quality
- Partner with engineering to turn eval findings into concrete pipeline and model improvements
What We're Looking For
- Deep, hands-on cybersecurity experience across threat intelligence, vulnerability management, detection engineering, or incident response
- Strong ability to assess the correctness of security data and AI outputs from a practitioner's perspective
- Experience with data quality, evaluation, or ML/LLM eval methodologies (precision/recall, ground truth, benchmarking)
- Proficiency in Python for building eval tooling, data analysis, and automation
- Comfort working with SQL and large, messy real-world datasets
- Rigorous, detail-oriented approach to measurement and a healthy skepticism of unvalidated AI output
- Self-directed and effective in a contract engagement with a small, fast-moving team
Nice to Have
- Experience evaluating LLM applications, RAG systems, or agentic pipelines
- Familiarity with eval frameworks and observability for AI systems
- Background working with threat intel formats and sources (CVEs, IOCs, MITRE ATT&CK, STIX/TAXII)
- Experience with vector and graph data (pgvector, Neo4j, or similar)
Why Work With Mallory?
- Flexible, remote contract engagement with a well-funded, early-stage company
- Work directly with the founding team on a problem at the center of the product
- Influence how an AI-native cybersecurity platform measures and earns trust
- Build actual AI-native software, not just AI-wrapped UIs
- Competitive contract compensation
Interested?
Send your resume and a short note to hello@mallory.ai with "Data Quality & AI Eval Engineer" in the subject line. Tell us about your security background and a time you caught a quality or accuracy problem others missed.