Leadership Archives - Thomson Reuters Institute https://blogs.thomsonreuters.com/en-us/innovation-topics/leadership/ Thomson Reuters Institute is a blog from , the intelligence, technology and human expertise you need to find trusted answers. Thu, 16 Apr 2026 12:16:52 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.3 AI Is Taking Action. No One Is Accountable. /en-us/posts/innovation/ai-is-taking-action-no-one-is-accountable/ Thu, 16 Apr 2026 12:16:52 +0000 https://blogs.thomsonreuters.com/en-us/?post_type=innovation_post&p=70448 The lawyer is still accountable. The AI system acting on her behalf is not. That gap is no longer theoretical.

After convening the first meeting of the Trust in AI Alliance, it is clear this mismatch is emerging as one of the biggest barriers to enterprise AI deployment.

As AI systems move from answering questions to taking action inside professional workflows, a fundamental mismatch is emerging. Execution shifts to the system. Responsibility still sits with the human.

In agentic systems, that model is being reconfigured, but there is still no clear answer to a critical question: how does a human maintain accountability as more of the work is executed by the system?

That question was at the center of the inaugural convening of the Trust in AI Alliance, a group bringing together leaders across model development, infrastructure, and enterprise AI deployment, where participants from OpenAI, Google, Anthropic, AWS, and discussed what trustworthy agentic systems require in practice.

A clear theme emerged: AI capability is accelerating faster than accountability.

Most systems today are not designed for that standard.

The Shift No One Is Talking About

In the first wave of AI, the defining question was whether a system could produce a correct answer. That is no longer enough.

As AI systems take on multi-step tasks across real workflows, the question is shifting from accuracy to accountability.

As Michael Gerstenhaber, Vice President of Product Management at Google, said during the discussion: “Delegating agency to a synthetic agent implies trust. The more you delegate, the more you need observability, tracing, and audit. It is not one feature. It is defense in depth.”

In traditional professional environments, accountability is clear. Humans determine relevance, review source material, verify outputs, and take responsibility for outcomes. In agentic systems, that model is evolving.

Retrieval is automated. Context is lost across steps. Outputs appear grounded in source material without preserving fidelity. Tools execute beyond the user’s visibility.

As Frank Schilder, Senior Principal Scientist at , noted: “When we move to an agentic workflow, we automate steps that professionals used to perform manually and that introduces new risks: Context can be silently dropped. Source fidelity can become fragile. Maintaining clear accountability becomes more complex.”

These are not edge cases. They are structural risks.We are automating the work, but not accountability.

If You Can’t Inspect It, You Can’t Trust It

In regulated industries, trust has never meant blind confidence. It has always meant the ability to verify. That standard is now colliding with how many AI systemsoperate.

Accuracy drives experimentation. Inspection determines adoption.

If a system cannot show its work, it cannot be trusted in high-stakes environments.

As Gayle McElvain, Head of TR Labs at , put it: “Errors create liability. For many professionals, trust means ‘trust but verify.’ That means building AI systems where verification is built in.”

Across the discussion, several consistent priorities emerged around what trustworthy systems must provide:

    • Step-by-step auditability
    • Traceable reasoning and inspectable tool use
    • Durable logs and process artifacts
    • Clear, persistent provenance

This is not a feature. It isinfrastructure.

Trust Breaks When Source Integrity Breaks

In knowledge-based professions, trust depends on the integrity of source material.

Agentic systems introduce new failure modes. They may paraphrase where precision is required. They may surface outdated information. They may blur the boundary between authoritative sources and generated reasoning.

These are not cosmetic issues. A single altered word in a statute can change its meaning. A misapplied version of a regulation can create real consequences.

As Zach Brock, Engineering Lead at OpenAI, described: “We are moving toward agents that share durable scratch spaces. Citations, version identifiers, and hashes of source material can travel through a workflow without being compressed away.”

That level of persistence isnot a technicaldetail. It is what makes accountability possible.
Without it, professionals cannot trace how an answer was constructed or verify whether it reflects the correct source at the correct point in time. Without it, accountability breaks.

Accountability does notemergeautomatically from more capable systems. It must be explicitly defined.

As ByronCook, Director of Automated Reasoning at AWS, said: “With AI, some of those socio-technical mechanisms go away. Wehave todefine the dividing line between behaviors weacceptand those we do not—and enforce that symbolically. Without that, accountability cannot bemaintainedas systems take on more of the work.”

This Is a Systems Problem

Much of today’s AI development isoptimizedfor performance benchmarks. But in real-world environments, performance is only part of the equation.

As ScottWhite, Head of Product, Enterprise at Anthropic, noted: “Benchmarks measure whether a model can do the task.Enterprises are asking a bigger question: will the system around it hold up in the environments where the workactually happens?A trustworthy agentrequiresthe model, the boundaries around it, and the record of what it did. Getting all three right is what turns AI from a powerful tool into asystem enterprisescan trust with important work.That’swhat will drive the next wave of adoption.”

Trustworthy systems must be designed to operate safely under pressure, with clear boundaries and strong safeguards.

That requires:

    • Clear separation between system instructions and external content
    • Built-in safeguards against prompt injection and data leakage
    • Continuous monitoring and testing
    • Audit trails aligned with regulatory expectations

Agentic AI is not just a model challenge. It is a governance challenge.

The Next Phase of AI

We are entering a new phase of AI adoption, one defined not by experimentation, but by deployment inside real workflows.

The industry is shifting from outputs to systems, from benchmarks to reliability, and from capability to accountability.

But this shift will not happen automatically. It requires new standards for auditability, clearer approaches to provenance, and systems designed to preserve truth and responsibility across every step of a workflow. These are solvable problems—but only if accountability is designed into the system from the start.

The organizations that solve this will define the next generation of AI.

In high-stakes domains, trust is not optional.

It is not a feature. It is the product.

The Trust in AI Alliance was announced in January to bring together leaders across the AI ecosystem to advance practical standards for accountability, transparency, and trust in AI systems. The group will continue to meet regularly, withselectinsights from those discussions shared publicly.

]]>
Why we’re adding Audit to our name—and what it means for our customers /en-us/posts/innovation/thomson-reuters-adds-audit-to-name/ Thu, 05 Feb 2026 13:00:41 +0000 https://blogs.thomsonreuters.com/en-us/?post_type=innovation_post&p=69345 What’s in a name? In our case: we’re adding one very intentional word:Audit. Our business segment name today becomes Tax, Audit & Accounting Professionals. We’re putting a spotlight on the part of the profession navigating some of the biggest shifts right now. We’re also reinforcing our commitment to helping firms adopt AI-enabled efficiency without losing the rigor, documentation, and trusted PPC methodology they rely on.

By explicitly calling out Audit, we’re recognizing and serving customers whose needs go well beyond tax compliance. We’re reinforcing our commitment to building audit-specific products, workflows, and expertise that help firms modernize.

Audit isn’t “extra”—it’s essential

Most firms don’t experience tax, audit, and accounting as separate lanes. They’re connected, year-round workflows that require speed, clarity, and confidence. And in audit, “moving faster” can’t come at the expense of quality. It has to come from better systems: more structured workflows, less manual effort and greater automation.

That’s why audit deserves to be named. Not as a trend—but as a clear, long-term commitment to the customers doing this work every day.

How we’re helping audit teams work smarter and faster

is investing in audit workflow tools and expanding our partner ecosystem so firms can modernize their audit practice with industry-leading AI-powered cloud-based solutions backed by trusted methodology, including:

  • : Supports day-to-day audit work by helping teams analyze and review documents, draft workpapers, and keep materials organized in a shared workspace—aimed at making workflows more consistent and reducing time spent on repetitive tasks.

CoCounsel Audit customer testimonial

  • : Uses automation and AI to speed up transaction analysis, identify items for review, assist with sample selection, and direct attention toward higher-risk areas—so auditors can spend more time on judgment-heavy work.

  • : Helps teams complete testing with less manual work by automating the matching of selected samples to supporting evidence and validating whether the expected amounts were collected, while keeping documentation in the workflow.

Audit Intelligence Test customer testimonial

  • Open ecosystem: Integrations that enhance Guided Assurance (Cloud Audit Suite) with —plus a partnership with .

What’s changing—and what isn’t

This is a naming update, not an organizational change. There are no changes to roles or structure tied to this announcement. What is changing is the clarity: audit is an intentional focus. You’ll continue to see that reflected in the products, partnerships, and workflows we bring to market.

The bottom line

Audit deserves to be named—because firms deserve tools that help them modernize with confidence. By adding Audit to our name, we’re making a clear commitment to supporting the profession through rapid change. We’re delivering AI-enabled efficiency, grounded in trusted methodology, backed by an ecosystem built for real audit work.

Elizabeth Beastrom is President of Tax, Audit & Accounting Professionals at

]]>
What It Really Means for AI to Reason /en-us/posts/innovation/what-it-really-means-for-ai-to-reason/ Wed, 16 Jul 2025 20:26:36 +0000 https://blogs.thomsonreuters.com/en-us/?post_type=innovation_post&p=66735 With each new model release, we hear the same bold claim: “This AI can reason.” But what does that actually mean—and why does it matter? At , we’ve spent the past year rigorously testing and evaluating the next generation of AI systems—not just for what they can generate, but for how they reach conclusions. For professionals working in legal, tax, and regulatory environments, traceable reasoning isn’t a luxury—it’s a requirement.

Not All AI Thinking Is Equal

Traditional Large Language Models (LLMs) excel at generating fluent, well-structured responses providing a direct answer to a specific question (e.g., what is the capital of France?). But when a task demands multi-step logic, interpretation of legal nuance, or structured argumentation, those same models can often fall short because they cannot simply produce the memorized response. That’s where Large Reasoning Models (LRMs) come in. These systems are trained to work through problems step-by-step, show their logic, and produce outputs that are transparent, reviewable, and aligned with how professionals make decisions. It’s an exciting shift, but it also demands a different level of scrutiny.

What We’ve Learned So Far

At Labs, we’ve been testing reasoning-capable AI across a variety of high-stakes domains. Our work includes both proprietary evaluation frameworks and live deployments that put models to the test under real-world legal complexity.

We’ve found that:

  • Models may return the right answer, but they may have used incorrect reasoning and vice versa.
  • Multi-step reasoning increases the risk of hard-to-detect hallucinations, in particular when the reasoning part is not exposed to the user.
  • As questions get more complex, models may fail at one point to produce the correct answer—or give up entirely.

That’s why we’ve built a robust testing and benchmarking process, including human-in-the-loop validation and domain-specific scoring. You can read more about that process here.

Putting New Models to the Test

Most recently, we tested —evaluating its performance on legal queries that demand not just accuracy, but verifiability. As J.P. Mohler, Senior Machine Learning and Applied Research Scientist at , put it: “OpenAI’s deep research model helps us synthesize legal briefs, case records, and case law into analyses for appellate judges. Its ability to autonomously gather, assess, and clearly cite information from a broad range of public and private sources—paired with its depth of analysis—fills a critical need for reliable, verifiable research. The model empowers us to scale advanced research capabilities and support complex, data-driven knowledge work.” This type of evaluation gives us insight into how models reason in the wild—and how they perform under the pressures of real legal analysis.

Why Model Strategy Matters

No single model excels at everything. That’s why we take a multi-model approach at —working with partners while continually refining our own proprietary models. We select the right model for the right task, based on accuracy, explainability, and trustworthiness. This orchestration-first approach ensures we deliver results professionals can actually use—not just impressive demos.

Want the Deeper Dive?

If you’re curious about how reasoning models are built, how they differ from traditional LLMs, and where they succeed (and struggle), I’ve written a more technical breakdown: It explores why reasoning remains one of the most challenging frontiers in AI—and why it’s essential to get it right.

About the author:
This post was authored by Frank Schilder is a Senior Director, Research at Labs, where he focuses on knowledge representation and reasoning, explainability, and applied AI research in legal and regulatory domains.

]]>