Opinion

How should regulatory frameworks evolve to ensure the accuracy and accountability of consumer-facing GenAI tools that provide legal services?

Published January 18, 2026 19:00

Reading — min

410

Introduction

For centuries, the irony of justice has been that, in the words of Lord Bingham, it is "open to all, like the Ritz Hotel."[1] While the doors are technically open, the cost of entry is prohibitive for most. Enter Generative AI (GenAI). Since the explosion of tools like ChatGPT, Gemini, and Claude, there has been a collective hope that technology could finally bridge the "access to justice gap."[2] If a chatbot can draft a contract or summarize a statute for free, doesn't that democratize law?

The answer is complex. While these tools offer speed and accessibility, they introduce a distinct peril: hallucinations. As regulatory frameworks currently stand, we are facing a dangerous gap where consumer-facing AI tools can provide "legal services" without the accountability required of human lawyers.

The core issue is not that AI makes mistakes; it's how it makes them. Large Language Models (LLMs) operate by predicting the next probable word in a sequence.[3] They prioritize fluency and confidence over truth. In the legal field, this leads to "hallucinations", the fabrication of cases, quotes, and statutes that look real but do not exist.

We have already seen the fallout in some real-life cases. As of late 2025, databases tracking AI hallucinations in courts (such as those maintained by legal researchers) had identified over 300 confirmed instances globally.[4] Reports indicate that over 200 of these cases emerged in 2025 alone, as the use of consumer-grade GenAI tools by pro-se litigants (people representing themselves in court) and unwary lawyers became more common.[5] One analysis from late 2025 also notes that there were roughly 700 legal decisions that had to specifically address of rule on the issue of hallucinated submissions.[6] Some high-profile cases include:

The Mata v Avianca case (US): A lawyer submitted non-existent judicial opinions generated by ChatGPT.[7]
The Ayinde case (UK): A qualified junior lawyer relied on fictitious cases from an AI summary on a housing case.[8]
Michael Cohen (US): Donald Trump’s former lawyer used Google Bard (now Gemini) to generate citations for his own defence counsel. [9]
Park v Kim (US): An appellate attorney used ChatGPT to find a case supporting a late filing. The AI invented a state court decision that lawyer then cited to the Court of Appeals.[10]

While lawyers face professional sanctions for these errors, the real victim is the consumer and laypersons who are unable to afford legal counsel and are likely to rely on internet-based self-help methods like ChatGPT to resolve their legal issues.[11] Statistics indicate a high prevalence of unmet legal needs around the world and in this context, the use of GenAI can be viewed as inevitable especially considering the integration of AI summaries on every internet search.[12] Indeed, hallucinated AI summaries was the cause of controversy in the Ayinde case above. Although courts have penalized users for citing AI-hallucinated cases, as seen in cases such as Harber v Revenue and Customs Commissioners[13], the current judicial expectation that individuals must verify AI outputs is arguably impractical.[14] This is because authoritative legal resources are often hidden behind paywalls and require specific professional skills to interpret, rendering the judicial presumption of AI and legal literacy among the general public fundamentally flawed.

Additionally, the technology encourages overreliance by producing confident answers that lead people to trust automated results. AI providers frequently market these tools using language that implies human-like understanding while relegating reliability warnings to obscure disclaimers.[15] This dynamic causes two main types of damage: internal harm to litigants, like financial loss and legal errors, and external harm to the judicial system, such as wasted resources and reduced trust in precedents.

Why Current Laws Fail

Most global legal systems rely on two pillars to protect clients: Legal Services Regulation and Consumer Protection Law. Both are currently failing to catch GenAI. In jurisdictions such as the UK, legal regulation is "title-based." This means that legal sector regulation regulates people such as lawyers, solicitors and barristers. It restricts specific activities such as representing someone in court to licenced professionals.[16] Software, obviously is not a person. Therefore, if a chatbot provides legal advice or drafts a document without claiming to be lawyer, it will inevitably fall outside the scope of legal services regulation. Furthermore, the provisioning of legal advice is not restricted to lawyers in jurisdictions such as the UK. In other words, traditionally unqualified individuals are able to provide legal advice without legal sector scrutiny (though duty of care is said to be equivalent to that of a qualified lawyer if similar representations are made).

In the United States, the legal system relies on two pillars to protect clients: State Bar regulation (Unauthorized Practice of Law) and Federal Consumer Protection. Unlike the UK's 'title-based' system, US regulation is 'activity-based,' meaning it prohibits the act of practicing law by anyone, human or software, without a license. While software is not a 'person,' US regulators have successfully argued that sophisticated AI tools essentially 'practice law' when they generate custom legal documents or advice. Consequently, companies like LegalZoom have historically had to limit themselves to being mere 'scriveners' (filling in forms at a user's direction) to avoid prosecution.[17] However, GenAI breaks this model because it generates original text, making the 'scrivener' defense harder to maintain. As a result, enforcement has shifted to the second pillar: consumer protection. The Federal Trade Commission (FTC) has aggressively targeted AI companies not for 'bad legal advice,' but for 'deceptive trade practices.' Even if a tool is free, if it is 'in commerce' and claims to be a 'Robot Lawyer' without human-level accuracy, it violates the FTC Act, as seen in the 2024 crackdown on DoNotPay, which was forced to settle for $193,000 and cease its 'robot lawyer' claims.[18] Yet, relying solely on consumer protection laws and reactive litigation is insufficient. These mechanisms are ex-post facto meaning they punish the provider after the harm has occurred, the litigant has lost their case or the court’s time has been wasted. It is furthermore, not inconceivable to think that a user in a desperate legal situation is likely to click "I agree" on a disclaimer without understanding that the specific legal advice they are about to receive may be a statistical guess rather than a verified fact.

The European Union’s Artificial Intelligence Act (AIA) is often hailed as the pioneering attempt to regulate this technology. However, when applied to legal advice for consumers, it leaves significant loopholes. The AI Act (AIA) operates on a risk-based approach that creates a critical distinction between tools used by the judiciary and those available to the public.[19] Under this framework, AI systems used by judicial authorities to interpret facts or apply the law are classified as High-Risk, subjecting them to rigorous oversight.[20] Conversely, general consumer-facing chatbots fall into a Transparency Only category, meaning they essentially only need to disclose that the user is interacting with a machine. This distinction creates a significant classification problem: because the "high-risk" label relies on the specific intent to assist a judge, chatbots marketed to the public for legal answers avoid these strict accuracy and data governance requirements. Consequently, a regulatory vacuum emerges where the software used by a judge is strictly regulated, while the tool relied upon by an unrepresented litigant facing that same judge is not.

How Frameworks Could Evolve

To ensure accuracy and accountability, regulatory frameworks need to move beyond "lawyer regulation" and "transparency labels." If reliance on these tools were to become inevitable for consumers, the law must move from reactive clean-up to proactive design mandates. Research suggests that hallucinations might be an inherent flaw of GenAI systems. As such, one cannot simply legislate hallucinations away. What is possible, is to mandate the design architecture that minimizes them. A primary solution lies in enforcing ‘Confidence-Based Abstention’.[21] Rather than allowing a model to guess when it encounters complex queries, regulations could require the implementation of internal confidence thresholds.[22] If a model’s internal probability score for an answer does not reach a "near-perfect" level, the system must be programmed to refuse the query rather than hallucinating a plausible-sounding but fictitious answer.

Attempting to ban "non-lawyer AI" entirely is futile and ultimately harms innovation, particularly for those who cannot afford traditional legal counsel. A more nuanced regulatory approach involves shifting from certifying the entire bot to Capability-Based Certification.[23] It is nearly impossible to certify a general-purpose model as "safe" for all law, but it is entirely feasible to certify specific tasks against public benchmarks. Under this model, an AI could be rigorously tested and certified for distinct capabilities, such as "Drafting NDA Clauses" or "Parking Ticket Appeals."[24] If a user were to ask this certified bot a question outside its scope such as a complex inquiry regarding criminal defense the bot would be programmed to recognize the lack of certification and decline the task.[25] This moves the industry toward a transparent system where tools are trusted for specific functions rather than blind reliance, preventing the "jack of all trades, master of none" risk that currently plagues general-purpose models.

Finally, it may be recognized that careless speech in the legal domain causes harm comparable to errors in healthcare or finance. Regulators should designate Law as a Critical Sector among other sectors, imposing statutory minimums even on General Purpose AI. If a model is capable of generating advice in these high-stakes fields, it cannot simply rely on generic disclaimers. The solution requires mandatory verifiable citations and strict jurisdiction labels. For example, an output should explicitly state, "This answer applies to French Law only," to prevent a user in the UK from acting on irrelevant statutes. This approach treats legal AI akin to medical devices: where the potential for harm is high, the duty of care must be built into the product's design. By combining technological guardrails, task-specific certification, and critical sector duties, we can create a framework that protects consumers without stifling the access to justice that these tools promise.

References

[1] Tom Bingham, The Rule of Law (Allen Lane 2010) 103.

[2] Laura Safdie, ‘AI and Legal Aid: A Generational Opportunity for Access to Justice’ (Thomson Reuters, 3 February 2025) https://www.thomsonreuters.com/en-us/posts/ai-in-courts/ai-legal-aid-generational-opportunity/ ; see also Francine Ryan and Liz Hardie, ‘ChatGPT, I Have a Legal Question? The Impact of Generative AI Tools on Law Clinics and Access to Justice’ (2024) 31(1) International Journal of Clinical Legal Education 166.

[3] OpenAI and others, GPT-4 Technical Report (arXiv, 4 March 2024) https://doi.org/10.48550/arXiv.2303.08774 ;

[4] Damien Charlotin, AI Hallucination Cases, https://www.damiencharlotin.com/hallucinations/?page=2#:~:text=This%20despite%20the%20attorneys%20attending,False%20Quotes%20Case%20Law%20(5)

[5] Ibid.

[6] Ibid.

[7] Mata v. Avianca, Inc., 678 F. Supp. 3d 443 (S.D.N.Y. 2023).

[8] Frederick Ayinde v London Borough of Haringey [2025] EWHC 1383 (Admin)

[9] United States v. Cohen, No. 18-cr-602 (JMF), 2023 WL 8635521 (S.D.N.Y. Dec. 12, 2023).

[10] Park v. Kim, 99 F.4th 63 (2d Cir. 2024).

[11] Ashley Ames, Kathryn Gallop, Alex Baumont de Oliveira, Jessica Pace and Ellen Walker (Ipsos), and Ramona Franklyn, Gemma Owens, Gemma Pilling and Michael Smith (Ministry of Justice), Legal Problem and Resolution Survey 2023: Summary Report (Ministry of Justice Analytical Series, 2024), https://assets.publishing.service.gov.uk/media/67613fff26a2d1ff18253404/legal-problem-resolution-survey-2023-summary-report.pdf

[12] Jessica Bednarz and Ericka Byram, Regulating AI in the Delivery of Consumer-Facing Legal Services (IAALS, University of Denver, 31 July 2025) https://iaals.du.edu/publications/regulating-ai-delivery-consumer-facing-legal-services

[13] Felicity Harber v The Commissioners for HMRC [2023] UKFTT 1007 (TC).

[14] See for instance, Judicial Office, Artificial Intelligence (AI): Guidance for Judicial Office Holders (14 April 2025)

https://www.judiciary.uk/wp-content/uploads/2025/10/Artificial-Intelligence-AI-Guidance-for-Judicial-Office-Holders-2.pdf

[15] Sandra Wachter, Brent Mittelstadt, Chris Russell, ‘Do large language models have a legal duty to tell the truth?’, R Soc Open Sci. (2024) 11 (8): 240197.

[16] Legal Services Act (UK), section 12.

[17] FTC, ‘FTC Announces Crackdown on Deceptive AI Claims and Schemes’, https://www.ftc.gov/news-events/news/press-releases/2024/09/ftc-announces-crackdown-deceptive-ai-claims-schemes

[18] FTC, ‘DoNotPay’, https://www.ftc.gov/legal-library/browse/cases-proceedings/donotpay.

[19] Artificial Intelligence Act, Article 5, 6, Annex III 8(a).

[20] Ibid.

[21] Saurav Kadavath and others, ‘Language Models (Mostly) Know What They Know’ (arXiv preprint, 21 November 2022) https://doi.org/10.48550/arXiv.2207.05221

[22] Victor Quach and others, ‘Conformal Language Modeling’ (arXiv preprint, 1 June 2024) https://doi.org/10.48550/arXiv.2306.10193 ; Fengfei Sun and others, ‘Large Language Models Are Overconfident and Amplify Human Bias’ (arXiv preprint, 4 May 2025) https://doi.org/10.48550/arXiv.2505.02151

[23] Mia Bonardi, and Dr. L. Karl Branting, ‘Certifying Legal AI Assistants for Unrepresented Litigants: A Global Survey of Access to Civil Justice, Unauthorized Practice of Law, and AI’ (2024) 26 Columbia Sci & Tech L Rev 34

[24] Ibid.

[25] Ibid