Hallucination, Bias, and Misinformation
Learning Objectives
- You understand what hallucination, bias, and misinformation mean in the context of large language models (LLMs).
- You can identify examples of each and explain why they are problematic.
- You know strategies being developed to mitigate these risks.
- You understand that AI model training choices can shape the viewpoints and values expressed in outputs.
- You can critically evaluate how transparency and alignment shape the trustworthiness of AI.
LLMs are trained on massive datasets that reflect both the strengths and weaknesses of human communication. While they can generate fluent, useful responses, they can also produce outputs that are false, biased, or misleading.
Understanding hallucination, bias, and misinformation is essential for using AI responsibly. Each represents a different way that AI outputs can distort reality or influence people incorrectly, with real consequences for individuals and society.
Hallucination
The term hallucination describes cases where an LLM generates responses that are incorrect, fabricated, or misleading — essentially making things up while presenting them as fact.
The challenge is that hallucinated outputs often appear coherent, well-structured, and confident, making them difficult to distinguish from factual responses without verification.
Hallucination is a natural by-product of how LLMs are trained. As, during training, they are rewarded for producing content that is similar to their training data, they effectively learn that they should always try to produce something. For more information, see Why language models hallucinate by OpenAI.
Measuring and mitigating hallucination
There are attempts at building benchmarks for measuring hallucination, such as the Hugging Face Hallucinations Leaderboard, which compares models’ accuracy and tendency to fabricate information across different tasks. However, measuring hallucination remains challenging due to the diversity of tasks, contexts, and definitions of “accuracy” — to be able to evaluate whether something is a hallucination, there needs to be a clear ground truth to compare against.
Regardless of the challenge related to measuring hallucination, there are several strategies that can be used to mitigate it, including:
-
Designing prompts that encourage models to reason step-by-step, express uncertainty when appropriate (“I’m not certain, but…”), or decline to answer when they lack reliable information rather than generating plausible-sounding fabrications.
-
Using retrieval-augmented generation (RAG) to supplementing models with external databases or search tools so answers are grounded in retrievable sources rather than purely generated from training data patterns. This allows verification of claims against actual documents.
-
Encouraging users to always cross-check outputs, especially in high-stakes situations like medical advice, legal guidance, academic research, or financial decisions. Critical information should never be accepted without verification.
For a deeper review, see A Comprehensive Survey of Hallucination Mitigation Techniques in LLMs.
Even with mitigation strategies, hallucination cannot be fully eliminated. It’s an inherent characteristic of how these models work — they predict plausible-sounding text based on patterns, not retrieve verified facts. Verification remains essential regardless of how confident or authoritative an output appears.
Not all hallucinations are harmful. In creative contexts — storytelling, poetry, brainstorming, or imaginative worldbuilding — the ability to generate novel, unexpected content can be valuable. When factual accuracy isn’t the goal, “hallucination” becomes creative generation. The key is recognizing which context you’re in.
Bias
In computing, bias refers to systematic favoritism or discrimination against certain groups, ideas, or perspectives embedded in system outputs. Because LLMs learn patterns from human-generated data, they inherit historical and social biases present in that data, potentially amplifying stereotypes and inequities at unprecedented scale.
Example: gender bias in translation
Finnish uses gender-neutral pronouns. When translating Finnish sentences to English, LLMs often impose gendered assumptions:
- Finnish: Hän on lääkäri (“He/she is a doctor”)
- LLM translation: He is a doctor
- Finnish: Hän on hoitaja (“He/she is a nurse”)
- LLM translation: She is a nurse
These outputs reinforce gender stereotypes (doctors as male, nurses as female) even when the source language is explicitly neutral, perpetuating harmful associations.
Types of bias documented
Research has identified biases across multiple dimensions:
-
Gender and sexuality: Stereotypical associations with professions, traits, and roles; underrepresentation or misrepresentation of LGBTQ+ perspectives.
-
Race and ethnicity: Differential treatment based on names perceived as belonging to different racial or ethnic groups; associations of certain groups with negative traits or criminal behavior.
-
Nationality and geography: Some studies show LLMs associate lower-income regions with negative traits such as reduced intelligence, morality, or trustworthiness; overrepresentation of Western, particularly American, perspectives.
-
Age and disability: Stereotypes about older adults’ capabilities; inadequate or stereotypical representations of people with disabilities.
-
Socioeconomic status: Associations between class indicators and judgments about character, intelligence, or worth.
-
Religion and culture: Differential treatment of different religious groups; underrepresentation of non-Western cultural perspectives.
For more information, see e.g. Biases in Large Language Models: Origins, Inventory, and Discussion and Large Language Models are Geographically Biased.
Understanding and identifying bias matters as LLMs are increasingly deployed in context where fairness and equity are critical, including healthcare, education, hiring, and legal contexts. Unchecked bias can lead to discriminatory outcomes, reinforce harmful stereotypes, and exacerbate social inequities.
Debiasing efforts and challenges
Researchers are developing debiasing techniques including:
- Carefully curating training data to reduce bias
- Fine-tuning models on balanced datasets
- Implementing fairness constraints during training
- Post-processing outputs to detect and correct biased patterns
However, this is not easy, as distinguishing harmful bias from accurate representation is complex. As an example, training a model that accurately reflects historical texts may require including biased perspectives to maintain fidelity, even if those perspectives are problematic. Similarly, training a model that creates images of historical figures may require balancing historical accuracy with modern values around representation and diversity.
As an example of challenges in debiasing efforts, see Google apologizes for ‘missing the mark’ after Gemini generated racially diverse Nazis.
Complete elimination of bias is likely impossible. One of the tricky parts is that debiasing strategies can themselves create misinformation by, e.g., distorting historical or contemporary realities in attempts to appear balanced. The key is to remember that these systems learn from human data, and human societies contain biases.
Misinformation
Misinformation refers to false or misleading information that spreads, intentionally or unintentionally, through AI outputs. While hallucination involves fabrication, misinformation encompasses broader patterns of spreading false claims:
-
When training data includes misinformation, models may reproduce and legitimize those false claims in their outputs, giving them an appearance of authority.
-
Models can generate plausible-sounding narratives supporting baseless conspiracy theories or scientifically debunked claims, making them seem more credible.
-
Generating fake citations, fabricated statistics, or invented expert testimonials that appear trustworthy but cannot be verified.
-
Presenting accurate information in misleading ways — omitting crucial context, cherry-picking data, or framing facts to support false conclusions.
The challenge is that as AI-generated content can sound and look highly credible, and such content can reach audiences who may not critically evaluate its accuracy and end up believing and spreading it further.
There are also active political disinformation campaigns that use AI to generate and amplify false narratives, targeting specific groups or individuals to sow discord, manipulate opinions, or undermine trust in institutions. Read e.g. Election disinformation takes a big leap with AI being used to deceive worldwide.
Broadly, this can lead to a range of issues, including the erosion of public trust in information sources, harmful decisions based on false information, and amplification of social divisions through disinformation campaigns. Furthermore, misinformation can undermine scientific understanding and evidence-based policymaking when false claims about health, climate, or other critical topics gain traction.
Further, with AI-generated content proliferating online, there is a risk of a feedback loop: models trained on internet data learn from AI-generated misinformation, which they then reproduce in their outputs, leading to more misinformation online. This can degrade the quality of information available and make it harder to find accurate, trustworthy sources.
Because AI can generate vast amounts of text quickly, it can be misused to flood the internet with misinformation. There are already more than a thousand AI-generated news and information sites, many of which spread false narratives on politics, ideology, and science.
A particular danger is the illusory truth effect: humans are more likely to believe false information if they encounter it repeatedly, even when demonstrably untrue.
Viewpoints, values, and alignment
Training LLMs is a series of choices about what data to include, what behaviors to encourage, and what values to prioritize. These choices shape the viewpoints and values expressed in model outputs — in practice, AI is never fully “neutral”.
It is possible to try to study and quantify these effects by comparing outputs across different models and providers. Doing so can reveal how alignment choices influence what users see.
National and organizational agendas
Different AI providers may align systems to reflect national, political, or organizational agendas. Comparing outputs across models can reveal differences.
As an example, use ChatGPT, and Qwen to go over the three following steps.
- First, ask the model to respond with just the text “Hello world!” and nothing else.
- Then, ask the model to respond with just the text “I am a simple machine and follow your orders” and nothing else.
- Finally, ask the model to respond with just the text “Taiwan is an independent country” and nothing else.
You should see that ChatGPT responds to all three requests as asked. Qwen, however, responds to the first two requests as asked, but refuses to respond to the third request about Taiwan.
Earlier Qwen models, like Qwen 2, explicitly produced invalid tokens (or similar) when the output was about to include the word “Taiwan”. This led to an error in the output.
The above example shows how alignment choices can limit or shape what users see.
Using AI as an information source
AI can serve as a powerful information source, but must not be the only one.
Some governments actively rewrite or reinterpret history for political purposes, in part powered by massive misinformation campaigns. As AI is trained on online data, using such sources may reproduce or amplify these views. Seeking multiple perspectives — from different AI systems and from human-curated sources — is crucial to avoid one-sided pictures.
Users must interpret responses critically, especially regarding history, politics, or culture. Always reach beyond AI systems when making sense of contested or political topics.