Building Trust in AI Text Generation: Addressing Hallucinations
Introduction
Large Language Models (LLMs) like GPT-4 have revolutionized the field of natural language processing (NLP) and natural language generation by enabling AI models to generate human-like text with unprecedented fluency. However, one of the critical issues that LLMs face is their propensity to generate hallucinated content—output that might appear plausible but is factually incorrect or misleading. This is a particular problem in areas where accuracy matters and incorrect outputs can lead to negative real world outcomes, such as healthcare and law.
Two recent academic papers provide insight into the nature and extent of hallucinations. We review a potential approach to the problem using an AI technique called adversarial networks to help detect and mitigate these hallucinations, improving the reliability and trustworthiness of LLM-generated text.
Understanding the Scope of the Hallucination Problem
Researchers at the Hong Kong University of Science and Technology published “Survey of Hallucination in Natural Language Generation” in ACM Computing Surveys in March 2023. This paper defines hallucinations as “generated content that is nonsensical or unfaithful to the provided source content” and further categorizes hallucinations as being either intrinsic or extrinsic.
Intrinsic hallucinations are those where the generated output directly contradicts the original source. An example of this is when an LLM produces the wrong date for an event, even when the source is correct. Extrinsic hallucinations are those where the generated output cannot be directly tied to the source content. In effect, the LLM is “ad-libbing” by adding additional background information that does not come from any of the model’s sources. This is a problem in text summarization applications; the authors reference research that shows that up to 25% of the generated summaries even in state of the art systems are hallucinated[1].
Hallucinations in LLMs can come from the training and inference process itself even when the data has very limited source-reference issues. Encoder language models (like BERT) could look at different parts of the training data and learn the wrong correlations. The decoders in GPT-style language models may have more hallucinations with a decoding strategy that emphasizes more random variation and fluency in the generated output. All models may have problems handling long input text sequences.
The authors also point out that there are many potential metrics for measuring the faithfulness of the generated output to the source text. These measures include statistical techniques based on lexical matches and model-based metrics that use a question-answering approach or natural language inference techniques. The tremendous academic interest in developing robust AI systems will lead to continued development of better metrics for hallucinations and better understanding the mitigation methods.
Daydreams in ChatGPT
One fascinating aspect of ChatGPT and other LLMs is how well the AI produces answers that seem credible. Typically, only humans knowledgeable in specific fields can detect the well-spoken errors. “A Categorical Archive of ChatGPT Failures”, published in April 2023 by Ali Borji, documents eleven areas such as reasoning, dealing with facts, math and coding where ChatGPT occasionally fails.
While some of these failures have to do with ChatGPT making logic errors, others are the result of hallucinations leading to the generation of incorrect information. Many of the examples are hilarious and fun to read. A classic, fun example of a fact-based hallucination error by ChatGPT is this interchange with Stanford professor Andrew Ng where the AI claims that an abacus is faster than a GPU for deep learning.
Adversarial Networks: A Possible Approach
So how do we reach a future where we can programmatically detect hallucinations? According to Ishai Rosenberg, an AI researcher and co-founder at TRSTai, a promising solution is adversarial networks. Adversarial networks consist of two components: a generator and a discriminator. The generator creates fake data, while the discriminator tries to distinguish between real and fake data. The two models are pitted against each other allowing them to learn from each other's mistakes and improve over time.
By applying the principles of adversarial networks, we can train a discriminator to identify hallucinated content in LLM-generated text. The process involves the following steps:
Data Collection: Create a dataset with both correct and hallucinated text. Obtain the correct text from reliable sources, and generate the hallucinated text by intentionally introducing errors or from LLMs with known limitations.
Preprocessing: Clean and preprocess the data to ensure uniformity and remove any unwanted noise with standard NLP techniques.
Training: One approach is to train the generator to create hallucinated text, and the discriminator to differentiate between correct and hallucinated content. Over time, the generator creates more convincing hallucinations, while the discriminator becomes better at identifying them. A newer approach is automatic prompt optimization which was recently introduced by Microsoft; here the generator is working on the prompt and the discriminator is on the output.
Evaluation: Test the performance of the trained discriminator on a separate dataset and analyze its ability to identify content accurately with the appropriate metrics.
Integration: Use the discriminator to analyze any LLM output before it is sent back to the user.
Using adversarial networks to detect hallucinations in LLMs offers several advantages:
Enhanced Accuracy: By identifying and filtering out hallucinated content, the accuracy of LLM-generated text can be improved, making it more reliable for various applications like content generation, machine translation, and question-answering systems, where accuracy is paramount.
Improved Trustworthiness: As LLMs become more prevalent in everyday applications, ensuring their outputs are accurate and reliable is crucial to building trust among users.
Conclusion
We have discussed the LLM hallucination problem here to understand the current limitations in this branch of AI. The continued growth of AI will depend on enabling more accurate and trustworthy AI-generated content. Adversarial networks offer a promising solution to the hallucination problem in large language models.
By continuously refining and adapting these and other techniques, we can harness the full potential of LLMs, making them even more useful and reliable in real-world applications.
[1] Tobias Falke, Leonardo F. R. Ribeiro, Prasetya Ajie Utama, Ido Dagan, and Iryna Gurevych. 2019. Ranking generated summaries by correctness: An interesting but challenging application for natural language inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2214–2220.