Deine Aufgaben
What you’ll be doing
- Own the E2E framework: Build, maintain, and scale our automated testing framework using Playwright (TypeScript/Python).
- Test the unpredictable: Design strategies to test non-deterministic LLM outputs, AI agents, and RAG pipelines where standard assertions don't always work.
- Tackle LLM-specific challenges: Build guardrails and automated checks for prompt drift, hallucinations, latency, and context window limits.
- Evaluate Agent behavior: Create scenarios to test how our AI agents handle edge cases, multi-step reasoning, and error recovery in real-world document processing workflows.
- Integrate and collaborate: Wire your tests into our CI/CD pipelines to ensure we can ship quickly without breaking the core AI logic. Work closely with AI researchers, backend engineers, and product managers to define what "quality" means for an AI agent.



