We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results
New

Principal Applied Scientist

Microsoft
$139,900.00 - $274,800.00 / yr
United States, Texas, Irving
7000 State Highway 161 (Show on map)
Jan 25, 2026
Overview

Microsoft is a company where passionate innovators come to collaborate, envision what canbeand take their careersfurther. This is a world ofmorepossibilities, more innovation, more openness, and the skyisthelimitthinkingina cloud-enabled world.

Microsoft's Azure Data engineering team is leading the transformation of analytics in the world of data with products like databases, data integration, big data analytics, messaging & real-timeanalytics,and business intelligence.The products ourportfolioincludeMicrosoft Fabric, Azure SQL DB, Azure Cosmos DB, Azure PostgreSQL, Azure Data Factory, Azure Synapse Analytics, Azure Service Bus, Azure EventGrid, and Power BI. Our mission is tobuild the data platform for the age of AI, powering a new class of data-firstapplicationsand driving a data culture.

Within Azure Data, the messaging and real-time analytics team provides comprehensive solutions and a robust platform that enables users to ingest high granularity signals (real-time & observability) and complex data, converting those into a competitive advantage in real-time for both end users and modern applications.

Within the Microsoft Fabric product pillar, the Real-Time Intelligence (RTI) team is hiring a Principal Applied Scientist to lead the science of evaluating (evals) and improving LLM-powered agentsoperating onlive operational data. This role focuses on building end-to-end evaluation systems for agentic workflows, covering planning, tool use, retrieval, safety, and end-user outcomes, and turning them into flywheels that continuously raise agent quality, reliability, and business impact.

What makes RTI unique is its deep integration across Fabric's real-time surfaces, rich instrumentation on event-level data, and shared ML/LLM evaluation platforms that let us ship science rapidly across multiple experiences. In this role, you'll partner closely with engineering and product to architect low-latency evaluation and monitoring pipelines, design offline and online experiments (including LLM-as-judge and human-in-the-loop workflows), and define the quality standards that govern agents from initial research through deployment and continuous improvement.

We do not just value differences or different perspectives.We seek them out and invite them in so we can tap into the collective power of everyone in the company. As a result, our customers are better served.



Responsibilities
  • Lead end-to-end science for evaluating LLM-powered agents on real-time and batch workloads: designing evaluation frameworks, metrics, and pipelines that capture planning quality, tool use, retrieval, safety, and end-user outcomes, and partnering with engineering for robust, low-latency deployment.
  • Advance evaluation methodologies for agents across RTI surfaces by driving test set design, auto-raters (including LLM-as-judge), human-in-the-loop feedback loops, and measurable lifts in key quality metrics such as task success rate, reliability, and safety.
  • Establish rigorous evaluation and reliability practices for LLM/agent systems: from offline benchmarks and scenario-based evals to online experiments and production monitoring, defining guardrails and policies that balance quality, cost, and latency at scale.
  • Collaborate with PM, Engineering, and UX to translate evaluation insights into customer-visible improvements, shaping product requirements, de-risking launches, and iterating quickly based on telemetry, user feedback, and real-world failure modes.
  • Provide technical leadership and mentorship within the applied science and engineering community, fostering inclusive, responsible-AI practices in agent evaluation, and influencing roadmap, platform investments, and cross-team evaluation strategy across Fabric.

Embody ourcultureandvalues



Qualifications

Required/MinimumQualifications

  • Bachelor's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 6+ years related experience (e.g., statistics, predictive analytics, research) OR Master's Degree in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 4+ years related experience (e.g., statistics, predictive analytics, research) OR Doctorate in Statistics, Econometrics, Computer Science, Electrical or Computer Engineering, or related field AND 3+ years related experience (e.g., statistics, predictive analytics, research) OR equivalent experience.

Job Requirements: Other & Additional

Abilityto meet Microsoft,customerand/or government security screening requirements arerequiredfor this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check:

  • This position will berequiredto pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Preferred/Additional Qualifications

  • 2+ years designing and running ML/LLM evaluation and experimentation (offline metrics + online A/B tests)
  • Proven experience applying machine learning, statistics, and measurement science to LLM and agent evaluation, ideally in real-time or streaming scenarios.
  • Proficiency in agentic AI concepts (e.g., multi-step agents, tool orchestration, retrieval/RAG, workflow automation) and familiarity with techniques for assessing safety, robustness, anomaly detection, and causal impact of agent behaviors.
  • Strong programming and modeling skills in languages such as Python, and experience building evaluation services or pipelines on distributed systems (e.g., running large-scale offline evals, auto-raters, or LLM-as-judge workloads).
  • Ability to design, implement, and interpret rigorous evaluations end-to-end: constructing eval sets and scenarios, combining offline metrics with human/LLM raters, running online experiments (A/B tests, holdouts), and instrumenting reliability monitoring at scale.
  • Collaborative mindset with demonstrated success partnering across Engineering, PM, and UX to define quality bars, translate evaluation insights into roadmap decisions, and iterate quickly on customer-facing agent and LLM experiences.

#azdat, #azuredata

Applied Sciences IC5 - The typical base pay range for this role across the U.S. is USD $139,900 - $274,800 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $188,000 - $304,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay

This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.

Applied = 0

(web-54bd5f4dd9-lsfmg)