All RolesEngineering · Contract

Data Quality & AI Eval Engineer

Contractor · Deep security experience · AI evaluation

ContractRemotePosted June 2026

About the Role

We're looking for a Data Quality & AI Eval Engineer on a contract basis to own how we measure the accuracy and trustworthiness of our AI-driven threat intelligence. This role sits at the intersection of deep cybersecurity expertise and rigorous evaluation. You'll be the person who can look at a pipeline's output and know whether it's right.

Mallory uses AI to correlate thousands of threat intelligence sources and contextualize them to a customer's environment. Quality is everything in security. A false positive wastes a defender's time, and a missed signal can be catastrophic. You'll build the evals, golden datasets, and quality gates that keep us honest and continuously improving.

What You'll Do

Design and build evaluation harnesses that measure the accuracy, precision, and recall of our AI-driven threat intelligence pipelines
Define data quality metrics and ground-truth datasets for entity resolution, enrichment, and correlation
Build automated quality gates and regression suites that catch model and data drift before it reaches customers
Apply your security expertise to judge whether outputs are correct, distinguishing true threats, false positives, and missed signal
Create labeling guidelines and golden datasets, and curate hard cases that stress-test the system
Evaluate LLM prompts, retrieval, and structured outputs, and recommend changes that measurably improve quality
Partner with engineering to turn eval findings into concrete pipeline and model improvements

What We're Looking For

Deep, hands-on cybersecurity experience across threat intelligence, vulnerability management, detection engineering, or incident response
Strong ability to assess the correctness of security data and AI outputs from a practitioner's perspective
Experience with data quality, evaluation, or ML/LLM eval methodologies (precision/recall, ground truth, benchmarking)
Proficiency in Python for building eval tooling, data analysis, and automation
Comfort working with SQL and large, messy real-world datasets
Rigorous, detail-oriented approach to measurement and a healthy skepticism of unvalidated AI output
Self-directed and effective in a contract engagement with a small, fast-moving team

Nice to Have

Experience evaluating LLM applications, RAG systems, or agentic pipelines
Familiarity with eval frameworks and observability for AI systems
Background working with threat intel formats and sources (CVEs, IOCs, MITRE ATT&CK, STIX/TAXII)
Experience with vector and graph data (pgvector, Neo4j, or similar)

Why Work With Mallory?

Flexible, remote contract engagement with a well-funded, early-stage company
Work directly with the founding team on a problem at the center of the product
Influence how an AI-native cybersecurity platform measures and earns trust
Build actual AI-native software, not just AI-wrapped UIs
Competitive contract compensation

Interested?

Send your resume and a short note to hello@mallory.ai with "Data Quality & AI Eval Engineer" in the subject line. Tell us about your security background and a time you caught a quality or accuracy problem others missed.

Apply via Email Contact Us