[Remote] AI QA Trainer – LLM Evaluation

October 9, 2025

Note: The job is a remote job and is open to candidates in USA. Agency is seeking an AI QA Trainer to help shape the future of AI through rigorous evaluation of large-scale language models. The role involves evaluating model reasoning, reliability, and safety while designing test plans and documenting failure modes to improve AI quality.

Responsibilities

Converse with the model on real-world scenarios and evaluation prompts
Verify factual accuracy and logical soundness
Design and run test plans and regression suites
Build clear rubrics and pass/fail criteria
Capture reproducible error traces with root-cause hypotheses
Suggest improvements to prompt engineering, guardrails, and evaluation metrics
Partner on adversarial red-teaming, automation (Python/SQL), and dashboarding to track quality deltas over time.

Skills

Shipped QA for ML/AI systems
Safety/red-team experience
Test automation frameworks (e.g., PyTest)
Hands-on work with LLM eval tooling (e.g., OpenAI Evals, RAG evaluators, W&B)
Evaluation rubric design
Adversarial testing/red-teaming
Regression testing at scale
Bias/fairness auditing
Grounding verification
Prompt and system-prompt engineering
Test automation (Python/SQL)
High-signal bug reporting
Clear, metacognitive communication

Education Requirements

Bachelor's, master's, or PhD in computer science, data science, computational linguistics, statistics, or a related field

Benefits

Health insurance
PTO

Company Overview

Invisible Technologies is the AI operating system for the enterprise. It was founded in 2015, and is headquartered in San Francisco, California, USA, with a workforce of 201-500 employees. Its website is https://www.invisible.co/.

Company H1B Sponsorship

Invisible Technologies has a track record of offering H1B sponsorships, with 2 in 2025. Please note that this does not guarantee sponsorship for this specific role.

Back to blog

Item added to your cart

DISCLAIMER

[Remote] AI QA Trainer – LLM Evaluation