In an era where federal agencies are increasingly pivoting toward digital transformation to manage ballooning workloads and persistent staffing shortages, the Internal Revenue Service (IRS) finds itself at a critical juncture. A new, scathing report from the Treasury Inspector General for Tax Administration (TIGTA) has cast a long shadow over the agency’s reliance on automated and live chat services, suggesting that the drive for efficiency may be coming at the cost of accuracy, privacy, and taxpayer trust.
The audit, which analyzed the Small Business/Self-Employed (SB/SE) division’s chat operations, serves as a sobering cautionary tale for federal entities nationwide. As agencies race to replace traditional, phone-based customer service with AI-driven chatbots and digital messaging, the IRS experience demonstrates that without robust oversight and reliable data, these tools can quickly become a liability rather than an asset.
The Push for Digital Efficiency: A Chronology of Implementation
The IRS has long struggled with the "customer service gap," characterized by notoriously long wait times on toll-free support lines and a high volume of taxpayer inquiries that outpace agency personnel. In response, the agency began a multi-year effort to modernize its front-facing communications.
2017: The Launch of Live Chat
The SB/SE division first introduced a "Live Chat" feature in 2017. The program was designed to handle relatively straightforward taxpayer inquiries, such as checking account balances, confirming receipt of payments, or discussing delinquent returns. The theory was sound: if a segment of the taxpayer base could be diverted to text-based messaging, the pressure on human agents answering the phones would be alleviated, theoretically freeing them up to handle more complex tax matters.
2021: The Expansion into Automation
By late 2021, the agency decided to scale its digital ambitions. To further reduce the burden on its workforce, the IRS integrated automated chatbots into the chat platform. The vision was to provide instantaneous, 24/7 answers to common questions without requiring a human intermediary. However, as the recent IG report suggests, the transition from a pilot program to an expanded, permanent fixture was marked by a lack of rigorous performance benchmarking.
The IG Findings: A Failure of Oversight and Accuracy
The TIGTA report paints a picture of an agency that lost sight of its operational metrics. According to the audit, IRS management failed to implement sufficient mechanisms to evaluate the effectiveness of these chat tools. The consequences of this oversight are far-reaching.
The Data Deficit
The most alarming finding is that the data currently collected to measure program performance is fundamentally unreliable. The IG found that while the agency claims a 46 percent resolution rate for live chats during the 2023-2024 period, the methodology behind that figure is deeply flawed. Discrepancies in data entry and tracking mean that the IRS cannot definitively state whether the program is actually resolving taxpayer issues or merely closing tickets without providing meaningful assistance.
The "Human" Cost of Automation
The report also highlighted severe operational failures regarding the human agents tasked with managing the chat sessions. In one staggering instance, auditors discovered that a single human chat assistant was tasked with managing 603 concurrent inquiries.
Agency policy, which is already strained, stipulates that a single agent should handle no more than three concurrent chats at a time. The IG emphasized that even the permitted limit of three can degrade the quality of service. When an agent is stretched to handle dozens or hundreds of sessions, the risk of "inappropriate disclosure"—where sensitive taxpayer data is sent to the wrong person—rises to unacceptable levels. This creates not only a customer service nightmare but a significant cybersecurity and privacy liability for the agency.
Automated Chatbots: Technical Deficiencies
The audit also scrutinized the performance of the AI-driven chatbots. The IG identified numerous instances where the bots failed to provide sufficient information or, more critically, failed to recognize keywords or common questions entered by taxpayers.
When a chatbot fails to understand a user’s intent, the result is rarely neutral. Instead, it often leads to frustration, the dissemination of incorrect tax guidance, or the taxpayer abandoning the process altogether—only to call the phone lines later. This creates a "looping effect," where the digital system fails, and the taxpayer is forced to re-enter the queue for live phone support, effectively defeating the purpose of the digital initiative.
Management’s "Nonevaluative" Misstep
One of the most curious aspects of the report is the agency’s handling of employee performance reviews. IRS management opted not to conduct "nonevaluative" performance reviews for live chat assistors. Such reviews are designed to provide feedback and coaching without impacting an employee’s official performance rating.
Despite these reviews being explicitly permitted even within pilot programs, management chose to forgo them, arguing that the program was in a pilot phase. The IG noted that this was a missed opportunity to refine the skills of the agents and correct procedural errors in real-time, which would have substantially improved the quality of the taxpayer experience.
The Broader Implications for Federal Agencies
The IRS case provides a blueprint for what happens when the "digital-first" mantra is adopted without a corresponding commitment to quality control.
Risk of Inaccurate Returns
The stakes for the IRS are higher than for many other federal agencies. If a taxpayer receives incorrect information from a chatbot, it can lead to an inaccurate tax return. This triggers a cascade of negative outcomes: audits, penalties, interest, and the necessity for the taxpayer to contact the IRS a second or third time. The "efficiency" gained by the chatbot is thus paid for in taxpayer anxiety and additional administrative burden.
Erosion of Trust
Beyond the technical failures, there is the issue of institutional trust. Taxpayers are increasingly skeptical of government digital portals. When an automated system fails to provide a correct answer, it reinforces the narrative that the government is inefficient and out of touch with the needs of the public.
Official Responses and the Path Forward
In response to the audit, IRS management has accepted all nine recommendations put forth by the Inspector General. The agency has acknowledged that it has already begun taking concrete steps to address four of those recommendations, though the timeline for full implementation remains a point of concern for oversight bodies.
The agency’s path forward will likely involve:
- Overhauling Data Collection: Creating a reliable, consistent way to track whether a chat session actually resolves an issue.
- Tightening Supervisory Controls: Enforcing strict limits on the number of concurrent chats an agent can handle to prevent privacy breaches.
- Improving AI Training: Refining the search algorithms of the chatbots to ensure they can better parse natural language and provide accurate tax guidance.
- Implementing Performance Coaching: Utilizing nonevaluative reviews to ensure human agents are equipped to handle complex inquiries effectively.
Conclusion: A Lesson in Implementation
The IRS is not alone in its struggle. Across the federal government, from the Social Security Administration to the Department of Veterans Affairs, the push toward automation is a defining characteristic of modern bureaucracy. However, the TIGTA report serves as a timely reminder that technology is not a "set-it-and-forget-it" solution.
For these programs to work, they require more than just software; they require rigorous oversight, honest data collection, and a human-centric approach that prioritizes accuracy and privacy over raw throughput. As the IRS moves to rectify its current failures, other agencies would do well to take note: in the pursuit of digital efficiency, the quality of the interaction must always remain the primary metric of success. If the federal government cannot guarantee that its automated systems provide accurate, secure, and helpful information, it risks undermining the very efficiency it seeks to achieve.

