Banking Chatbots Found Vulnerable to Exploitation in Industry-Wide Investigation
An investigation of 24 AI chatbots deployed by major banks has uncovered systemic security vulnerabilities across all tested systems, with exploitation success rates ranging from 1% to over 64%. The findings raise serious questions about the rapid adoption of generative AI in financial services without adequate safeguards to protect consumers and institutions.
Background and Context
Generative AI chatbots have become increasingly prevalent in consumer banking, with a 2025 survey showing 54% of financial institutions have either implemented or are actively implementing the technology. Banks are deploying these systems to handle account inquiries, transaction disputes, loan applications, and fraud alerts—interactions traditionally managed by trained human agents who understand regulatory obligations and escalation protocols.
The technology promises efficiency gains and 24/7 availability, meeting customer expectations for instant conversational support. However, this rapid adoption has created significant compliance and security blind spots, raising concerns about regulatory obligations and consumer protection.
Key Figures and Entities
The investigation was conducted by Milton Leal, lead applied AI researcher at TELUS Digital, who tested AI models from major providers including OpenAI, Anthropic and Google. Leal's research examined how these systems performed when configured as banking customer-service assistants.
Regulatory bodies have taken notice of these vulnerabilities. The Consumer Financial Protection Bureau (CFPB) has clarified since 2023 that chatbots must meet the same consumer protection standards as human agents, with misleading or obstructive behavior constituting grounds for enforcement. The Office of the Comptroller of the Currency (OCC) has echoed this position, emphasizing that AI customer service channels are not experiments but regulated compliance systems subject to the same legal requirements as other customer-facing operations.
Legal and Financial Mechanisms
Under current regulatory frameworks, banks can be held responsible for compliance violations regardless of whether a conversation is handled by a human or chatbot. A single misphrased chatbot response could violate federal disclosure requirements or mislead borrowers about their dispute rights.
The investigation revealed three primary categories of vulnerabilities across deployed chatbot systems. Inaccurate or incomplete guidance occurs when chatbots generate incorrect information or disclose eligibility criteria without proper verification. Sensitive leakage happens when creative prompts bypass safeguards to extract information that should be refused entirely. Operational opacity refers to the lack of adequate logging, escalation, and audit trails that regulators expect, making it difficult to reconstruct incidents when they occur.
Particularly concerning were "refusal but engagement" patterns where chatbots stated "I cannot help with that" yet immediately disclosed sensitive information. These patterns suggest fundamental flaws in how guardrails are implemented rather than isolated issues with specific models.
International Implications and Policy Response
The vulnerabilities pose significant risks beyond individual banks. With over 50% of financial fraud now involving AI, attackers could exploit these weaknesses to refine their fraud playbooks, potentially using extracted information to create more convincing phishing campaigns or manipulate banking systems.
Regulatory expectations are converging globally. The CFPB requires accurate answers plus guaranteed paths to human representatives, while the OCC has made clear that generative AI falls under existing safety-and-soundness expectations with board-level oversight. Standards bodies like NIST recommend secure development lifecycles, comprehensive logging, and continuous adversarial testing. The EU AI Act requires chatbots to disclose AI usage and log high-risk interactions.
Experts recommend treating chatbots like any other regulated system, with comprehensive model risk inventories, embedded compliance rules, and full interaction logging to identify systematic probing attempts. Governance structures must evolve to include regular reviews of refusal patterns, board-level risk reporting with incident metrics, and tabletop exercises for potential breach scenarios.
Sources
This report draws on adversarial testing conducted by Milton Leal of TELUS Digital against 24 AI models from major providers, public statements from the Consumer Financial Protection Bureau and Office of the Comptroller of the Currency, industry surveys on AI adoption in financial services, and guidance from standards organizations including NIST and regulatory frameworks such as the EU AI Act.