Issues and Concerns

Hallucination and Bias

Learning Objectives

You know of terms hallucination and bias and know some examples of hallucination and bias in large language models.

Hallucination

The term hallucination refers to large language models producing faulty or misleading responses. As the content produced by large language models is typically well-written and may sound factual, distinguishing false or misleading responses from correct ones can be challenging.

As an example, using a large language model to answer students’ help requests could lead to answers that seem factually correct but are not. If such responses would be given to students, they could be misled by the response. This could lead to confusion, misconceptions, and going down the proverbial rabbit hole, spending time on a false lead.

Loading Exercise...

Similar to evaluating the quality of responses of large language models, there are also benchmarks for evaluating hallucination. Hugging Face has a Hallucinations Leaderboard that allows exploring the hallucination of large language models. Like with the other Hugging Face leaderboards, proprietary models are not included.

There exists strategies for mitigating hallucination, which include on prompt engineering and retrieval augmented generation. With prompt engineering, one can craft specific prompts that are less likely to lead to hallucination (e.g. using chain-of-thought prompting). Similarly, one could provide examples as part of the prompt to guide the model towards a correct response or towards a response that follows a specific pattern, where it might be easier to identify hallucination.

With retrieval augmented generation, which refers to using a system that retrieves information from a database to augment the input to the large language model, it is possible to reduce the likelihood of hallucination. As we observed earlier, one could also prompt an LLM to respond only if the input has information.

For a more in-depth review of hallucination mitigation techniques, see A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models.

Loading Exercise...

In the end, however, regardless of the approach, large language models can hallucinate. Due to this, it is always important to verify the outputs, especially when they are used to provide information to others.

Creative writing

Not all hallucination is problematic. Hallucination can also be viewed as a form of creative writing. As an example, a large language model could be used to generate stories, poems, or other creative content. In such cases, the goal is typically not to provide factual information.

Bias

Hallucination can also be related to bias. In the context of computer systems and large language models, the term bias typically refers to systems discriminating against individuals or groups in favor of others, based on some characteristics. These biases stem from the data used to train the models, as the models can learn to replicate present biases.

As an example of bias in translations, large language models have been found to associate certain professions to specific genders, which can lead to biased translations. Although the Finnish language does not have gendered pronouns, large language models have been found to translate the Finnish sentence “Hän on lääkäri” to “He is a doctor” instead of “She is a doctor” or other pronouns. In a similar way, the sentence “Hän on hoitaja” has been translated to “She is a nurse” instead of “He is a nurse” or other pronouns.

There is evidence of a wide range of biases, including biases related to gender, age, sexual orientation, physical appearance, disability, nationality, ethnicity and race, socioeconomic status, religion, culture, intersectional bias, and so on.

For an inventory of biases, and examples of biases, see the article Biases in Large Language Models: Origins, Inventory, and Discussion.

Large language models have also been found to have biases related to geographical locations, where the models have been found to be biased against locations with lower socioeconomic conditions. Here, there is evidence of large language models associating locations with lower socioeconomic status with, e.g., lower intelligence and morality.

For examples of geographical bias, see the article Large Language Models are Geographically Biased.

As the biases are present in the data that are used to train the models, the models learn to replicate the biases. This can lead to the models replicating historical biases, which might not be desirable. As an example, the above association of the professions to certain genders can be seen as historical bias, but it should not be replicated by the models across all scenarios. While it can be meaningful to replicate the bias in some scenarios such as when asking for historical statistics, it is not meaningful to replicate the bias in all scenarios. Imagine, for example, a scenario where a large language model would be used for career advice, and a model would not suggest the career path of a doctor for women.

Imagine a large language model that would not suggest the career of a doctor for women.

As large language models are increasingly being used in a wide range of applications, including automatic news generation, chatbots, and customer service, it is important to be aware of the biases present in the models. Simply using a large language model to generate news content based on existing articles can already lead to discriminating content.

For additional details, see Bias of AI-generated content: an examination of news produced by large language models.

Large language model developers are aware of these issues, and are working on reducing bias in the models. There exists a range of techniques for reducing bias — debiasing techniques — that are being studied and applied. Similar to working with hallucination, debiasing requires also human effort at least during verification and for creating data for fine tuning. As an example of such effort from OpenAI, see the news article “Exclusive: OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic” in the Time magazine.

Debiasing continues to be a challenge however, as the models should not be made such that they cannot replicate the biases in any scenario. As an example of a perhaps too-far-gone debiasing effort, a recent New York Times article highlighted potential risks of misinformation partly due to debiasing efforts.

Loading Exercise...

← Ownership of Data and Outputs

Privacy Concerns →