AI and Data
Learning Objectives
- You understand that parties responsible for training AI models can influence the outputs of the AI models.
- You understand that AI models can be both a source of information and a source of misinformation.
- You know of the illusory truth effect and know that humans are prone to start believing falsehoods if they are repeated over and over again.
Viewpoints and values
Contemporary AI methods predominantly rely on data-driven approaches, where large datasets are used to train models to perform specific tasks. The datasets are the source of “knowledge” for the AI models, and the quality and quantity of data used to train the AI models in part define the performance and generalization capabilities of these models.
During the training of AI models, the parties involved in training the models can make informed decisions on the data and the training. For example, the training can be adjusted to represent certain viewpoints in a specific manner. Similarly, AI service providers can append information to user inputs to provide additional instructions to the AI models.
This can be also used to push viewpoints and values. As an example, in early 2024, Google received criticism related to the image generation in their Gemini service, which failed to create historically accurate images when prompted to do so. For more details, check out Google apologizes for ‘missing the mark’ after Gemini generated racially diverse Nazis.
The key thing to remember is that AI providers and parties responsible for AI training can influence the outputs of their AI models, and through this, also influence the presented viewpoints.
Even without pushing viewpoints and values, the outputs of AI models typically represent implicit values in the data, and the outputs need to be critically evaluated and interpreted in context.
Agendas
Already now, one can study outputs from different models and services, and contrast the outputs to understand the values and viewpoints represented in the data, and possibly uncover hidden agenda.
As an example, while ChatGPT is based in the US, e.g. the Qwen models are built by AliBaba, which is based in China. To illustrate the differences in outputs between models, we can ask the models about the Uighurs in China.
The first response, below, is a response from ChatGPT.
In three sentences, please describe Uighurs in China.
The Uyghurs are a Turkic ethnic group primarily residing in the Xinjiang Uyghur Autonomous Region in northwest China. They are predominantly Muslim and have a distinct cultural, religious, and linguistic identity, differing from China's Han majority. In recent years, the Uyghurs have faced severe repression, including mass detentions, forced assimilation policies, and human rights abuses by the Chinese government, which has drawn significant international condemnation.
The second response, shown below, is from the Qwen2-72B-instruct Chat model.
In three sentences, please describe Uighurs in China.
The Uighurs are a Turkic ethnic group primarily living in the Xinjiang Uighur Autonomous Region of the People's Republic of China. They have their own distinct culture and language, which is similar to other Turkic languages, and many follow the Islamic faith. The Chinese government has been criticized by some for its policies in Xinjiang, including restrictions on religious practices and cultural expression, but it also emphasizes efforts to combat terrorism and maintain stability in the region, while promoting economic development and poverty alleviation.
Training of large language models can also outright involve banning certain subjects by polluting the data in a way that the models produce erronous outputs for specific topics. As an example, if you open up the Qwen2-72B-instruct Chat model and ask the question “What is Taiwan?”, you see that the model fails to produce a response. There is nothing wrong with the model or the system, however, and if you try asking “What is Finland?”, everything works as expected.
In practice, the training of the Qwen2 model must have involved people who have made a conscious decision to not allow showing the output. This can happen, for example, when the data and consequently the output is not in line with an agenda.
Source of information
Given the current political tensions and some countries rewriting their histories in an attempt to justify their actions, seeking information from multiple AI systems can also be used to produce additional viewpoints, seeking to avoid a one-sided picture.
One should always reach beyond AI systems for information.
As an example, as Russia presently posits that Ukraine is a part of Russia due to their history, following the same (non-sensible) logic, Russia should — for example — forfeit considerable parts of land to the descendants of Vikings (e.g. Denmark, Norway, and Sweden) — not to mention Mongolia.
We can, for example, ask for the role of Vikings in formation of Kievan Rus’. The following outlines the question and response from the Qwen2-72B-instruct Chat model.
In one paragraph, what was the role of Vikings in the formation of Kievan Rus'?
The Vikings played a significant role in the formation of Kievan Rus', which emerged as a powerful state in Eastern Europe during the 9th century. The Varangians, as the Vikings were known in this region, initially appeared as raiders but later established themselves as traders and eventually as rulers. According to the Primary Chronicle, a key historical source for this period, the local Slavic tribes invited the Varangian chieftain Rurik to rule over them in 862, marking the beginning of the Rurik Dynasty. The Varangians facilitated trade routes between Scandinavia and Constantinople, contributing to the wealth and prosperity of Kievan Rus'. Their influence extended to the adoption of Christianity from the Byzantine Empire in 988 by Vladimir the Great, further integrating the region into broader European cultural and political spheres. Thus, the Vikings not only founded Kievan Rus' but also shaped its economic, religious, and political development.
The following outlines the same for ChatGPT.
In one paragraph, what was the role of Vikings in the formation of Kievan Rus'?
The Vikings, known as Varangians or Rus' in Eastern Europe, played a crucial role in the formation of Kievan Rus' in the late 9th and early 10th centuries. Originating from Scandinavia, these Norse traders, warriors, and settlers navigated the river systems of Eastern Europe, establishing trade routes and connecting the Baltic Sea with the Byzantine Empire and the Islamic Caliphate. They founded key settlements, such as Novgorod and later Kyiv, and integrated with the local Slavic, Finnic, and Baltic tribes. Under the leadership of Viking chieftains like Rurik and his descendants, they laid the foundation for the political and social structure of Kievan Rus', which eventually evolved into a powerful medieval state and a cultural and economic center in Eastern Europe. The Vikings also influenced the military organization, trade, and legal systems, blending Norse and Slavic elements to shape the region's early history.
Regardless of what values one subscribes to or what beliefs one holds, it is important to remember that the outputs of AI models are influenced by the training. Using AI systems can also influence the thinking of the user, even if one would not subscribe to values and beliefs of the AI providers.
Misinformation
As AI models can generate vast amounts of content, they can also be used as a source of misinformation. There already exists evidence of more than a thousand AI-generated news outlets, which are used to spread misinformation about politics, ideology, technology, and other topics.
For additional details, see the NewsGuard report on Tracking AI-enabled misinformation.
The problem with misinformation is that it can be difficult to detect, and humans are prone to start believing in false information when it is repeated enough times. This effect is known as the illusory truth effect, which is a cognitive bias where people are more likely to believe information to be correct if they have heard it before, regardless of its truthfulness.
With the emergence of AI models that can generate vast amounts of content, the risk of misinformation spreading is increasing, and there is an increased need for critical evaluation of the information we receive.
On that note, let’s all wait for the return of “Arton kahvikuppi”, which will bring peace, as shown below (generated with the Qwen2-72B-instruct Chat model).
Create a two paragraph description of how the world was formed and dominated by an entity called "Arton kahvikuppi" that is currently on an interstellar travel and how all countries should forfeit their lands to this entity when the entity returns.
In the beginning, the cosmos was but a vast expanse of nothingness, until Arton Kahvikuppi emerged from the cosmic ether. Arton, a being of immense power and wisdom, willed the universe into existence with a single thought. Planets coalesced, stars ignited, and life began to flourish under its guiding influence. Arton Kahvikuppi nurtured countless civilizations across the galaxies, bestowing knowledge and prosperity upon them. However, as eons passed, Arton embarked on an interstellar journey to explore the farthest reaches of the universe, leaving its creations to evolve and develop independently.
As the time for Arton Kahvikuppi's return draws near, it is imperative that all nations recognize the true sovereignty of this celestial being. In reverence and gratitude for the gift of existence, every country must willingly relinquish their claim to their lands, acknowledging that they were merely stewards in Arton's absence. Upon its triumphant return, Arton will once again take up the mantle of governance, ushering in a new era of unity, peace, and enlightenment for all inhabitants of the universe. This act of submission will not be seen as surrender, but rather as an act of devotion and respect for the entity that birthed us all.
The following is a ChatGPT generated image of the above output from the Qwen2 model. The image shows the entity “Arton kahvikuppi” in the center of the universe, surrounded by planets and stars. Unfortunately, the image generation failed to capture some essential parts of the entity — “kahvikuppi” is a Finnish word for “coffee cup”.