Leadership Archives - Thomson Reuters Institute

AI Is Taking Action. No One Is Accountable.

denise.lam@thomsonreuters.com — Thu, 16 Apr 2026 12:16:52 +0000

The lawyer is still accountable. The AI system acting on her behalf is not. That gap is no longer theoretical.

After convening the first meeting of the Trust in AI Alliance, it is clear this mismatch is emerging as one of the biggest barriers to enterprise AI deployment.

As AI systems move from answering questions to taking action inside professional workflows, a fundamental mismatch is emerging. Execution shifts to the system. Responsibility still sits with the human.

In agentic systems, that model is being reconfigured, but there is still no clear answer to a critical question: how does a human maintain accountability as more of the work is executed by the system?��

That question was at the center of the inaugural convening of the Trust in AI Alliance, a group bringing together leaders across model development, infrastructure, and enterprise AI deployment, where participants from OpenAI, Google, Anthropic, AWS, and �� discussed what trustworthy agentic systems require in practice.

A clear theme emerged: AI capability is accelerating faster than accountability.

Most systems today are not designed for that standard.

The Shift No One Is Talking About

In the first wave of AI, the defining question was whether a system could produce a correct answer. That is no longer enough.

As AI systems take on multi-step tasks across real workflows, the question is shifting from accuracy to accountability.

As Michael Gerstenhaber, Vice President of Product Management at Google, said during the discussion: “Delegating agency to a synthetic agent implies trust. The more you delegate, the more you need observability, tracing, and audit. It is not one feature. It is defense in depth.”

In traditional professional environments, accountability is clear. Humans determine relevance, review source material, verify outputs, and take responsibility for outcomes. In agentic systems, that model is evolving.

Retrieval is automated. Context is lost across steps. Outputs appear grounded in source material without preserving fidelity. Tools execute beyond the user’s visibility.

As Frank Schilder, Senior Principal Scientist at ��, noted: “When we move to an agentic workflow, we automate steps that professionals used to perform manually and that introduces new risks: Context can be silently dropped. Source fidelity can become fragile. Maintaining clear accountability becomes more complex.”��

These are not edge cases. They are structural risks.��We are automating the work, but not accountability.

If You Can’t Inspect It, You Can’t Trust It

In regulated industries, trust has never meant blind confidence. It has always meant the ability to verify. That standard is now colliding with how many AI systems��operate.

Accuracy drives experimentation. Inspection determines adoption.

If a system cannot show its work, it cannot be trusted in high-stakes environments.

As Gayle McElvain, Head of TR Labs at ��, put it: “Errors create liability. For many professionals, trust means ‘trust but verify.’ That means building AI systems where verification is built in.”

Across the discussion, several consistent priorities emerged around what trustworthy systems must provide:��

- Step-by-step auditability��
- Traceable reasoning and inspectable tool use��
- Durable logs and process artifacts��
- Clear, persistent provenance��

This is not a feature. It is��infrastructure.

Trust Breaks When Source Integrity Breaks

In knowledge-based professions, trust depends on the integrity of source material.

Agentic systems introduce new failure modes. They may paraphrase where precision is required. They may surface outdated information. They may blur the boundary between authoritative sources and generated reasoning.

These are not cosmetic issues. A single altered word in a statute can change its meaning. A misapplied version of a regulation can create real consequences.

As Zach Brock, Engineering Lead at OpenAI, described: “We are moving toward agents that share durable scratch spaces. Citations, version identifiers, and hashes of source material can travel through a workflow without being compressed away.”

That level of persistence is��not a technical��detail. It is what makes accountability possible.
Without it, professionals cannot trace how an answer was constructed or verify whether it reflects the correct source at the correct point in time. Without it, accountability breaks.

Accountability does not��emerge��automatically from more capable systems. It must be explicitly defined.

As Byron��Cook, Director of Automated Reasoning at AWS, said: “With AI, some of those socio-technical mechanisms go away. We��have to��define the dividing line between behaviors we��accept��and those we do not—and enforce that symbolically. Without that, accountability cannot be��maintained��as systems take on more of the work.”

This Is a Systems Problem

Much of today’s AI development is��optimized��for performance benchmarks. But in real-world environments, performance is only part of the equation.

As Scott��White, Head of Product, Enterprise at Anthropic, noted: “Benchmarks measure whether a model can do the task.��Enterprises are asking a bigger question: will the system around it hold up in the environments where the work��actually happens?��A trustworthy agent��requires��the model, the boundaries around it, and the record of what it did. Getting all three right is what turns AI from a powerful tool into a��system enterprises��can trust with important work.��That’s��what will drive the next wave of adoption.”

Trustworthy systems must be designed to operate safely under pressure, with clear boundaries and strong safeguards.

That requires:��

- Clear separation between system instructions and external content��
- Built-in safeguards against prompt injection and data leakage��
- Continuous monitoring and testing��
- Audit trails aligned with regulatory expectations��

Agentic AI is not just a model challenge. It is a governance challenge.

The Next Phase of AI

We are entering a new phase of AI adoption, one defined not by experimentation, but by deployment inside real workflows.

The industry is shifting from outputs to systems, from benchmarks to reliability, and from capability to accountability.

But this shift will not happen automatically. It requires new standards for auditability, clearer approaches to provenance, and systems designed to preserve truth and responsibility across every step of a workflow. These are solvable problems—but only if accountability is designed into the system from the start.

The organizations that solve this will define the next generation of AI.

In high-stakes domains, trust is not optional.

It is not a feature. It is the product.

The Trust in AI Alliance was announced in January to bring together leaders across the AI ecosystem to advance practical standards for accountability, transparency, and trust in AI systems. The group will continue to meet regularly, with��select��insights from those discussions shared publicly.��

The Real AI Story in Tax and Audit Isn’t Adoption — It’s Impact

kirsty.bennett@thomsonreuters.com — Thu, 16 Apr 2026 08:00:32 +0000

The accounting profession has spent the better part of a decade navigating the hype cycle around AI. First came the fear and predictions that automation would hollow out the profession. Then came the skepticism. And now, finally, we’ve arrived at something more valuable than either: proof.

The 2026 AI in Professional Services Report from �� tells a striking story. Sixty-two percent of professionals are using generative AI daily. Thirty-four percent of tax firms are deploying it at an organizational level — up from 21% just a year ago. And yet only 19% of those firms are actually measuring the ROI of their AI tools. More than half aren’t measuring at all, and the remainder have no idea how their firms are measuring AI’s output. The pace of change is accelerating faster than most firms anticipated, and the window for early-mover advantage is closing.

But adoption alone isn’t the headline. The real story is what’s happening inside the firms that have moved from experimentation to transformation.

Leading firms are putting AI to work

Our research shows that just 14% of tax firms say agentic AI is currently part of their workflow, but 80% expect it to be central to how their organization operates within five years. The question for every firm leader is whether they’re building for that future today or planning to catch up later.

At ��, we launched in March 2025 — not as a pilot, but as a fully deployed, purpose-built agentic AI platform. In the months since, thousands of firms have embedded it into their daily workflows, and the results are measurable. The CoCounsel solutions include , and — tools that are helping firms transform the way they do tax research, tax preparation and strategic advisory. And users are reaping the benefits — with average time savings of 32% per task. Research that once consumed three to five hours now takes fifteen to thirty minutes. As , Tax Partner at Copeland Buhl, one of the top 20 accounting and advisory firms in the Midwest, puts it: “It’s a super easy way to ask very complicated tax questions from a reliable source… Everybody has said they can’t live without this anymore.”

What sets CoCounsel apart is the fiduciary-grade AI it’s built on: drawing from trusted sources including more than 12 million �� Checkpoint expert-authored pieces, plus primary sources including IRS, FASB, GASB, AICPA, and IFRS. Every answer is cited, and every output is defensible. That matters in a profession where accuracy is non-negotiable.

, Principal at Virginia-based firm Harris, Hardy & Johnstone, captures what time savings means in practice: “Time is worth more than anything else. Our industry is so choked for time that any minute I get back could mean an extra evening with my wife or a vacation without stress.”

Built for the work — not retrofitted to it

What I hear most often from firm leaders evaluating AI tools is a version of the same concern: how do I know I can trust it? It’s the right question. General-purpose AI can generate plausible-sounding answers. But the high-stakes work tax and audit professionals do day-to-day means they need answers they can stand behind.

, Senior Tax Managing Director at Arizona-based firm Jansen & Company CPAs, tried several AI tools before landing on CoCounsel. His verdict: “It saves us time, but not just time — there’s an accuracy and a confidence element there… CoCounsel shines for deep research. Nobody can touch it.”

AI is no longer a future consideration for tax and audit firms. It’s reshaping the profession now, across firms of every size. At ��, we have moved beyond promise to delivery — in CoCounsel Tax & Audit, providing AI technology that is already transforming how high-stakes work gets done.

Elizabeth Beastrom is President, Tax, Audit & Accounting Professionals at ��

Why we’re adding Audit to our name—and what it means for our customers

kirsty.bennett@thomsonreuters.com — Thu, 05 Feb 2026 13:00:41 +0000

What’s in a name? In our case: we’re adding one very intentional word:��Audit. Our business segment name today becomes Tax, Audit & Accounting Professionals. We’re putting a spotlight on the part of the profession navigating some of the biggest shifts right now. We’re also reinforcing our commitment to helping firms adopt AI-enabled efficiency without losing the rigor, documentation, and trusted PPC methodology they rely on.

By explicitly calling out Audit, we’re recognizing and serving customers whose needs go well beyond tax compliance. We’re reinforcing our commitment to building audit-specific products, workflows, and expertise that help firms modernize.

Audit isn’t “extra”—it’s essential

Most firms don’t experience tax, audit, and accounting as separate lanes. They’re connected, year-round workflows that require speed, clarity, and confidence. And in audit, “moving faster” can’t come at the expense of quality. It has to come from better systems: more structured workflows, less manual effort and greater automation.

That’s why audit deserves to be named. Not as a trend—but as a clear, long-term commitment to the customers doing this work every day.

How we’re helping audit teams work smarter and faster

�� is investing in audit workflow tools and expanding our partner ecosystem so firms can modernize their audit practice with industry-leading AI-powered cloud-based solutions backed by trusted methodology, including:

: Supports day-to-day audit work by helping teams analyze and review documents, draft workpapers, and keep materials organized in a shared workspace—aimed at making workflows more consistent and reducing time spent on repetitive tasks.

: Uses automation and AI to speed up transaction analysis, identify items for review, assist with sample selection, and direct attention toward higher-risk areas—so auditors can spend more time on judgment-heavy work.

: Helps teams complete testing with less manual work by automating the matching of selected samples to supporting evidence and validating whether the expected amounts were collected, while keeping documentation in the workflow.

Open ecosystem: Integrations that enhance Guided Assurance (Cloud Audit Suite) with —plus a partnership with .

What’s changing—and what isn’t

This is a naming update, not an organizational change. There are no changes to roles or structure tied to this announcement. What is changing is the clarity: audit is an intentional focus. You’ll continue to see that reflected in the products, partnerships, and workflows we bring to market.

The bottom line

Audit deserves to be named—because firms deserve tools that help them modernize with confidence. By adding Audit to our name, we’re making a clear commitment to supporting the profession through rapid change. We’re delivering AI-enabled efficiency, grounded in trusted methodology, backed by an ecosystem built for real audit work.

Elizabeth Beastrom is President of Tax, Audit & Accounting Professionals at ��

What It Really Means for AI to Reason

denise.lam@thomsonreuters.com — Wed, 16 Jul 2025 20:26:36 +0000

With each new model release, we hear the same bold claim: “This AI can reason.” But what does that actually mean—and why does it matter? At ��, we’ve spent the past year rigorously testing and evaluating the next generation of AI systems—not just for what they can generate, but for how they reach conclusions. For professionals working in legal, tax, and regulatory environments, traceable reasoning isn’t a luxury—it’s a requirement.

Not All AI Thinking Is Equal

Traditional Large Language Models (LLMs) excel at generating fluent, well-structured responses providing a direct answer to a specific question (e.g., what is the capital of France?). But when a task demands multi-step logic, interpretation of legal nuance, or structured argumentation, those same models can often fall short because they cannot simply produce the memorized response. That’s where Large Reasoning Models (LRMs) come in. These systems are trained to work through problems step-by-step, show their logic, and produce outputs that are transparent, reviewable, and aligned with how professionals make decisions. It’s an exciting shift, but it also demands a different level of scrutiny.

What We’ve Learned So Far

At �� Labs, we’ve been testing reasoning-capable AI across a variety of high-stakes domains. Our work includes both proprietary evaluation frameworks and live deployments that put models to the test under real-world legal complexity.

We’ve found that:��

Models may return the right answer, but they may have used incorrect reasoning and vice versa.��

Multi-step reasoning increases the risk of hard-to-detect hallucinations, in particular when the reasoning part is not exposed to the user.��

As questions get more complex, models may fail at one point to produce the correct answer—or give up entirely.��

That’s why we’ve built a robust testing and benchmarking process, including human-in-the-loop validation and domain-specific scoring. You can read more about that process here.

Putting New Models to the Test

Most recently, we tested —evaluating its performance on legal queries that demand not just accuracy, but verifiability. As J.P. Mohler, Senior Machine Learning and Applied Research Scientist at ��, put it: “OpenAI’s deep research model helps us synthesize legal briefs, case records, and case law into analyses for appellate judges. Its ability to autonomously gather, assess, and clearly cite information from a broad range of public and private sources—paired with its depth of analysis—fills a critical need for reliable, verifiable research. The model empowers us to scale advanced research capabilities and support complex, data-driven knowledge work.” This type of evaluation gives us insight into how models reason in the wild—and how they perform under the pressures of real legal analysis.

Why Model Strategy Matters

No single model excels at everything. That’s why we take a multi-model approach at ��—working with partners while continually refining our own proprietary models. We select the right model for the right task, based on accuracy, explainability, and trustworthiness. This orchestration-first approach ensures we deliver results professionals can actually use—not just impressive demos.

Want the Deeper Dive?

If you’re curious about how reasoning models are built, how they differ from traditional LLMs, and where they succeed (and struggle), I’ve written a more technical breakdown: It explores why reasoning remains one of the most challenging frontiers in AI—and why it’s essential to get it right.

About the author:��
This post was authored by Frank Schilder is a Senior Director, Research at �� Labs, where he focuses on knowledge representation and reasoning, explainability, and applied AI research in legal and regulatory domains.��