Skip to main content
Meet us at Black Hat USA 2026— Las Vegas, August 1–6Book a Meeting
Mallory
All RolesEngineering · Contract

Data Quality & AI Eval Engineer

Contractor · Deep security experience · AI evaluation

ContractRemotePosted June 2026

About the Role

We're looking for a Data Quality & AI Eval Engineer on a contract basis to own how we measure the accuracy and trustworthiness of our AI-driven threat intelligence. This role sits at the intersection of deep cybersecurity expertise and rigorous evaluation. You'll be the person who can look at a pipeline's output and know whether it's right.

Mallory uses AI to correlate thousands of threat intelligence sources and contextualize them to a customer's environment. Quality is everything in security. A false positive wastes a defender's time, and a missed signal can be catastrophic. You'll build the evals, golden datasets, and quality gates that keep us honest and continuously improving.

What You'll Do

  • Design and build evaluation harnesses that measure the accuracy, precision, and recall of our AI-driven threat intelligence pipelines
  • Define data quality metrics and ground-truth datasets for entity resolution, enrichment, and correlation
  • Build automated quality gates and regression suites that catch model and data drift before it reaches customers
  • Apply your security expertise to judge whether outputs are correct, distinguishing true threats, false positives, and missed signal
  • Create labeling guidelines and golden datasets, and curate hard cases that stress-test the system
  • Evaluate LLM prompts, retrieval, and structured outputs, and recommend changes that measurably improve quality
  • Partner with engineering to turn eval findings into concrete pipeline and model improvements

What We're Looking For

  • Deep, hands-on cybersecurity experience across threat intelligence, vulnerability management, detection engineering, or incident response
  • Strong ability to assess the correctness of security data and AI outputs from a practitioner's perspective
  • Experience with data quality, evaluation, or ML/LLM eval methodologies (precision/recall, ground truth, benchmarking)
  • Proficiency in Python for building eval tooling, data analysis, and automation
  • Comfort working with SQL and large, messy real-world datasets
  • Rigorous, detail-oriented approach to measurement and a healthy skepticism of unvalidated AI output
  • Self-directed and effective in a contract engagement with a small, fast-moving team

Nice to Have

  • Experience evaluating LLM applications, RAG systems, or agentic pipelines
  • Familiarity with eval frameworks and observability for AI systems
  • Background working with threat intel formats and sources (CVEs, IOCs, MITRE ATT&CK, STIX/TAXII)
  • Experience with vector and graph data (pgvector, Neo4j, or similar)

Why Work With Mallory?

  • Flexible, remote contract engagement with a well-funded, early-stage company
  • Work directly with the founding team on a problem at the center of the product
  • Influence how an AI-native cybersecurity platform measures and earns trust
  • Build actual AI-native software, not just AI-wrapped UIs
  • Competitive contract compensation

Interested?

Send your resume and a short note to hello@mallory.ai with "Data Quality & AI Eval Engineer" in the subject line. Tell us about your security background and a time you caught a quality or accuracy problem others missed.