This content is only available within our institutional offering.

06 Jun 2024
Will Gen AI hallucinations derail the story? We don’t think so

Sign in
This content is only available to commercial clients. Sign in if you have access or contact support@research-tree.com to set up a commercial account
This content is only available to commercial clients. Sign in if you have access or contact support@research-tree.com to set up a commercial account
Will Gen AI hallucinations derail the story? We don’t think so
- Published:
06 Jun 2024 -
Author:
Packer William WP | Kassab Sami SK -
Pages:
12 -
A new Stanford University research paper shows Legal Gen AI tools hallucinate
A preprint study from researchers at Stanford''s RegLab and Institute for Human-Centered AI that is currently under review provides the first comparative examination of Legal Gen AI tools and shows that Relx''s Lexis+ AI and Thomson Reuters'' Westlaw Gen AI research tools hallucinate less frequently than ChatGPT. However, it also shows that these tools still tend to hallucinate quite a lot.
17% to 33% hallucination rates
The study, based on 202 complex queries, shows that Relx Lexis+AI and Thomson Reuters Westlaw demonstrated hallucination rates (i.e. inaccurate or nonsensical outputs) of 17% and 33% respectively, and only 65% and 42% accuracy rates. We consider these to be very high rates of inaccurate answers and believe such results could cool the hype around Legal Gen AI tools for users and investors alike. We note Relx''s product performed significantly better than Thomson Reuters''.
No change in forecasts and views
While we admit this study provides a cause for concern, we are not yet changing our outlook. For one, we continue to believe in the productivity gains Gen AI tools offer and point to the many limitations noted by Stanford University researchers in their own study. Secondly, we also note that these tools offer access to the references used by the system and that it is lawyers'' responsibility to check the sources used by the AI. Thirdly, we believe such studies will help vendors improve on the current state of their technology. Fourthly, Stanford noted that the prompts used may not be reflective of the complexity of real-world queries. Effective real-world hallucination rates may actually be lower.
Prefer Relx over Thomson Reuters
While we believe the study has limited implications at this time, we also see the superior performance of Relx''s product as reinforcing our preference for Relx shares over the more expensively valued Thomson Reuters. Relx''s H1 results...