[Remote] AI QA Trainer – LLM Evaluation

Note: The job is a remote job and is open to candidates in USA. Agency is seeking an AI QA Trainer to help shape the future of AI through rigorous evaluation of large-scale language models. The role involves evaluating model reasoning, reliability, and safety while designing test plans and documenting failure modes to improve AI quality.


Responsibilities

  • Converse with the model on real-world scenarios and evaluation prompts
  • Verify factual accuracy and logical soundness
  • Design and run test plans and regression suites
  • Build clear rubrics and pass/fail criteria
  • Capture reproducible error traces with root-cause hypotheses
  • Suggest improvements to prompt engineering, guardrails, and evaluation metrics
  • Partner on adversarial red-teaming, automation (Python/SQL), and dashboarding to track quality deltas over time.

Skills

  • Shipped QA for ML/AI systems
  • Safety/red-team experience
  • Test automation frameworks (e.g., PyTest)
  • Hands-on work with LLM eval tooling (e.g., OpenAI Evals, RAG evaluators, W&B)
  • Evaluation rubric design
  • Adversarial testing/red-teaming
  • Regression testing at scale
  • Bias/fairness auditing
  • Grounding verification
  • Prompt and system-prompt engineering
  • Test automation (Python/SQL)
  • High-signal bug reporting
  • Clear, metacognitive communication

Education Requirements

  • Bachelor's, master's, or PhD in computer science, data science, computational linguistics, statistics, or a related field

Benefits

  • Health insurance
  • PTO

Company Overview

  • ​​Invisible Technologies is the AI operating system for the enterprise. It was founded in 2015, and is headquartered in San Francisco, California, USA, with a workforce of 201-500 employees. Its website is https://www.invisible.co/.

  • Company H1B Sponsorship

  • Invisible Technologies has a track record of offering H1B sponsorships, with 2 in 2025. Please note that this does not guarantee sponsorship for this specific role.

  • Back to blog