Senior/Principal Engineer (m/f/x) for Evaluations Generative AI
Your role at Dynatrace
Most AI developer tools operate without any knowledge of how software actually behaves in production. Dynatrace is in a unique position to change that.
We're looking for a Senior or Principal Generative AI Engineer to design and build the evaluation and simulation capabilities at the core of our product. You'll work across the stack, from CLI tooling that engineers run locally, to large-scale simulation pipelines, to LLM-as-a-judge evaluation frameworks running against real Dynatrace AI Observability data.
This role sits inside Dynatrace's Engineering organization and works closely with product, design, and the platform teams that power Dynatrace's AI-observability stack.
Your responsibilities:
- Conduct research in the field of Generative AI
- Design and build systems that let users replay and stress-test AI Agents at scale. Detect regressions across model versions, prompt changes, and data drift. Define the metrics, datasets, and judging strategies that make results trustworthy.
- Build infrastructure to simulate multi-turn, tool-using agents in realistic environments. Generate adversarial scenarios, measure task completion, tool-use correctness, and failure modes. Help teams ship agents with confidence.
- Own developer-facing CLIs that run evaluations on top of Dynatrace AI Observability data, from trace ingestion to judge configuration to reporting. Make it the tool AI engineers reach for first when debugging a production behavior.
- Prototype quickly, run user feedback cycles, and ship to production
- Define technical strategy for the team's AI systems, set architectural direction, and mentor other engineers
- Collaborate with product and design to identify which developer problems are most worth solving
What will help you succeed
- 5+ years (Senior) or 10+ years (Principal) of professional software engineering experience
- Demonstrated experience shipping production systems that use LLMs, including prompting, tool calling, evaluation, and iteration
- Strong foundation in at least one of: developer tooling (IDEs, compilers, static analysis, code intelligence), AI/ML engineering, or large-scale distributed systems
- Hands-on experience with agentic patterns: planning, tool use, retrieval, memory management
- Ability to evaluate and critique AI-generated output. You understand when a model is wrong, not just that it is.
- Clear communication with cross-functional partners across product and engineering
- Background in observability, APM, or infrastructure monitoring
- Familiarity with engineering platforms at scale: CI/CD systems, developer portals, internal tooling
- Hands-on experience with LLMs: prompt engineering, evaluation frameworks (e.g. LLM-as-a-judge, golden datasets, pairwise comparisons), or agent frameworks.
Why you will love being a Dynatracer
- Dynatrace is a leader in unified observability and security.
- We provide a culture of excellence with competitive compensation packages designed to recognize and reward performance.
- Our employees work with the largest cloud providers, including AWS, Microsoft, and Google Cloud, and other leading partners worldwide to create strategic alliances.
- You'll get to work at the forefront of innovation with Dynatrace Intelligence—the industry's first agentic operations system. Bringing together deterministic and agentic AI, it helps teams understand what's happening, why it matters, and what to do next— automatically.
- Over 50% of the Fortune 100 companies are current customers of Dynatrace.
Compensation and Rewards
- We offer attractive compensation packages and stock purchase options with numerous benefits and advantages.
- Due to legal reasons, we are obliged to list a salary range for this position, which is €74,000 up to €112,000 gross per year based on full-time employment (38.5 h/week). We’ve listed the salary range for transparency, but if your experience and skills bring unique value, we’d still love to hear from you—please apply even if you’re outside the range.