AI and Generative Models

Milestones and AI Cycles


Learning Objectives

  • You know about the early history of AI research and the development of symbolic AI and are aware of the cycles of optimism and disappointment in AI research.
  • You know about the shift toward machine learning and the development of neural networks.
  • You know about major benchmark milestones such as Deep Blue, Watson, and AlphaGo.

Artificial intelligence as a research field

Artificial intelligence as a formal research discipline traces its origins to the 1956 Dartmouth Summer Research Project on Artificial Intelligence. This workshop, proposed by John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon, brought together leading researchers to explore whether machines could be made to exhibit intelligent behavior. The term “artificial intelligence” itself was coined during this event, marking the formal establishment of AI as a distinct field of study.

The intellectual groundwork for AI, however, extends much further back. The concept of automata — mechanical devices capable of performing tasks without human intervention — dates to ancient times. Examples include intricate clockwork figures in ancient Greece and purported automata such as the 18th-century Mechanical Turk chess player (later revealed to hide a human operator). While these devices fascinated observers, they operated through purely mechanical means without anything resembling genuine intelligence.

Loading Exercise...

Theoretical foundations and the Turing Test

More directly relevant to modern AI was Alan Turing’s theoretical work in the 1930s on what became known as the Turing machine. Turing demonstrated that a sufficiently simple abstract machine could, in principle, perform any computation that could be described by an algorithm, given enough time and memory. This theoretical foundation proved essential for understanding what computers could potentially accomplish.

In 1950, Turing proposed what became known as the Turing Test as a way to operationalize the question of machine intelligence. Rather than trying to define intelligence abstractly, Turing suggested a practical criterion: if a human evaluator conversing with both a machine and a human (without knowing which is which) cannot reliably distinguish between them based on their responses, then the machine could reasonably be considered intelligent. This test shifted the question from philosophical speculation about whether machines could think to empirical investigation of whether they could exhibit intelligent behavior.

Large language models can fool people in some Turing-test-style settings, and new ways to assess AI are being looked for. However, even before large language models, there were systems that could fool humans. As an example, PARRY from 1972 simulated a person with paranoid schizophrenia and was able to convince some psychiatrists that it was human during text-based conversations.


Symbolic artificial intelligence

Early AI research proceeded from the assumption that intelligence could be achieved through explicit logical reasoning using symbols and rules. This approach, which came to dominate the field through the 1960s and 1970s, is known as symbolic AI or sometimes “Good Old-Fashioned Artificial Intelligence” (GOFAI).

Foundational work in this tradition included the Logic Theorist (1956), created by Allen Newell and Herbert Simon. This program proved mathematical theorems by searching through possible logical deductions, successfully proving several theorems from Whitehead and Russell’s Principia Mathematica — a landmark work in mathematical logic. In some cases, the Logic Theorist found more elegant proofs than those in the original text.

Following this success, Newell and Simon developed the General Problem Solver (1957), which aimed to solve a wide range of problems using a general-purpose problem-solving strategy. The system worked by representing problems as initial states and goal states, then searching for sequences of operations that would transform the initial state into the goal state. While the General Problem Solver could handle various puzzles and simple problems, it struggled with more complex real-world situations.

The core assumption underlying symbolic AI was that intelligence fundamentally consists of manipulating symbols according to formal rules — similar to how we solve algebra problems by applying rules of algebraic manipulation, or how we might play chess by following strategic principles. If this assumption were correct, then creating intelligent machines would be primarily a matter of discovering and encoding the right rules.

This paradigm led to the development of expert systems — programs designed to capture the knowledge and reasoning patterns of human experts in specific domains. DENDRAL (1965) assisted chemists in identifying molecular structures from mass spectrometry data by encoding rules about how different molecular structures fragment under spectrometry. MYCIN (developed in the 1970s) diagnosed bacterial infections and recommended antibiotics by applying hundreds of rules encoding medical expertise. MYCIN’s performance was notably impressive, often matching or exceeding that of human specialists in controlled evaluations.

Symbolic AI research also produced cognitive architectures like Soar and ACT-R, which attempted to model human cognitive processes more broadly. These architectures influenced the development of Intelligent Tutoring Systems, which aimed to provide personalized instruction by modeling both subject matter knowledge and individual student understanding.

Despite these successes, symbolic AI faced fundamental challenges that became increasingly apparent through the 1970s and 1980s. Scaling systems to handle real-world complexity proved extraordinarily difficult. The number of rules and facts required to cover even moderately complex domains grew rapidly, and maintaining consistency across large rule bases became unmanageable. Systems struggled with uncertainty and ambiguous information — situations where rules didn’t clearly apply or where multiple conflicting rules might fire. Perhaps most critically, encoding expert knowledge turned out to require massive manual effort. Domain experts often couldn’t articulate their knowledge as explicit rules because much expertise is tacit — experts “just know” what to do without being able to explain their reasoning step-by-step.

These challenges, combined with overly optimistic early predictions that weren’t met, led to declining funding and reduced interest in AI research. This period of reduced activity, lasting through much of the 1970s and continuing into the 1980s, is known as the first AI winter. Funding agencies that had been told AI breakthroughs were imminent became disillusioned when progress proved slower and more difficult than promised.

Loading Exercise...

Machine learning

As limitations of symbolic AI became clear, a fundamentally different approach gained traction. Rather than programming explicit rules, machine learning focuses on developing algorithms that can learn patterns from data and improve their performance through experience.

Machine learning methods were actually explored from AI’s earliest days. Frank Rosenblatt’s work on the Perceptron in the late 1950s represented an early machine learning approach. However, symbolic AI’s initial successes led it to dominate the field for decades, with machine learning remaining a secondary research direction until limitations of symbolic approaches became apparent.

Machine learning systems don’t require programmers to explicitly encode rules for every situation. Instead, they automatically discover patterns and relationships in training data. For classification tasks (sorting items into categories like “spam” or “not spam”), machine learning algorithms learn to recognize features that distinguish categories. For regression problems (predicting numerical values like housing prices), they learn relationships between input features and outputs. For clustering (grouping similar items), they discover natural groupings in data without being told what categories should exist.

This shift from explicit programming to learning from data proved transformative. By the late 1990s and early 2000s, several converging factors enabled machine learning’s rise: increasing computational power made it feasible to train models on substantial datasets; the growth of the internet created vast amounts of data for training; and algorithmic improvements made learning more effective and reliable. Machine learning began demonstrating practical success on real-world applications — spam filtering, recommendation systems, image recognition — that had been difficult for symbolic approaches.

Neural networks represent a particularly important class of machine learning models. Inspired loosely by the structure of biological brains, neural networks consist of interconnected nodes (artificial neurons) organized in layers. Early mathematical models of neural networks were proposed by Nicolas Rashevsky in the 1930s, though practical implementations had to wait for sufficient computing power.

Frank Rosenblatt’s Perceptron (1958) represented the first practical neural network model. The Perceptron could learn to classify inputs into two categories by adjusting the weights of connections between its artificial neurons based on training examples. When shown examples of inputs with known correct classifications, the Perceptron would adjust its weights to reduce classification errors. While the Perceptron could only learn relatively simple patterns, it demonstrated the principle of learning from data through weight adjustment. Early formal models also include McCulloch & Pitts artificial neuron (1943), which showed how simplified neurons could implement logic.

Loading Exercise...

Progress on neural networks stalled for years after publication of a 1969 book by Marvin Minsky and Seymour Papert that highlighted fundamental limitations of single-layer networks like the Perceptron. While mathematically correct, this analysis dampened enthusiasm and funding for neural network research for over a decade.

The field revived in the 1980s with development of the backpropagation algorithm, which enabled training of multi-layer networks. Backpropagation provided a systematic way to adjust weights in networks with hidden layers between inputs and outputs, allowing these networks to learn complex patterns that single-layer networks could not. This breakthrough generated renewed excitement about neural networks’ potential.

However, this revival proved short-lived. By the late 1980s and early 1990s, practical limitations became apparent. Neural networks were difficult to train effectively — they often got stuck in poor solutions or required careful tuning of many parameters. They demanded substantial computational resources that weren’t yet readily available. Perhaps most damaging to their reputation, they frequently performed worse than simpler statistical methods on practical problems. Expert systems, meanwhile, faced their own scaling and maintenance challenges.

These disappointments led to a second AI winter lasting into the early 1990s, with reduced funding and interest from both industry and academia. Only with substantial increases in computational power and algorithmic breakthroughs in the 2000s did the field recover momentum.


Deep learning

The emergence of deep learning in the 2010s represented a qualitative leap in neural network capabilities. Deep learning uses neural networks with many layers — often dozens or even hundreds — that can learn increasingly abstract representations of data at different levels.

Several technological developments converged to make deep learning practical. Graphics processing units (GPUs), originally developed for rendering video game graphics, turned out to be exceptionally well-suited for the parallel computations required to train large neural networks. What might take weeks on conventional processors could be accomplished in days or hours on GPUs. The availability of massive datasets like ImageNet (containing millions of labeled images) provided the training data these large networks required. Algorithmic improvements, including better methods for initializing network weights and preventing training from getting stuck, made training more reliable.

In deep neural networks for image recognition, early layers might learn to detect simple features like edges and corners. Middle layers combine these into more complex patterns like shapes and textures. Deeper layers learn to recognize complete objects like faces or cars. This hierarchical learning of increasingly abstract representations mirrors, at least superficially, how biological vision systems are thought to process information.

Deep learning now powers a remarkable range of applications: voice assistants that understand spoken commands, autonomous vehicles that perceive and navigate complex environments, medical image analysis systems that detect diseases, and language models that generate human-like text. It forms the foundation for recent advances in large language models capable of conversing naturally and assisting with diverse language tasks.

Loading Exercise...

Benchmark milestones

Throughout AI’s cycles of optimism, disappointment, and renewed progress, certain high-profile achievements captured public imagination while demonstrating advancing capabilities. These benchmarks often sparked renewed interest and investment in AI research.

IBM Deep Blue defeats Garry Kasparov (1997): In a six-game match, Deep Blue defeated the reigning world chess champion, marking the first time a computer defeated a world champion under tournament conditions. Chess had long been considered a domain requiring deep human intelligence — strategic planning, pattern recognition, tactical calculation. Deep Blue’s victory demonstrated that machines could master this game through a combination of massive computational power (evaluating millions of positions per second) and carefully designed evaluation functions that assessed position quality. While Deep Blue didn’t think about chess the way humans do, it played at a superhuman level.

IBM Watson wins Jeopardy! (2011): Watson competed against two of Jeopardy!‘s most successful human champions and won decisively. This achievement was significant because Jeopardy! requires not just factual knowledge but natural language understanding of often tricky clues involving wordplay, puns, and obscure references. Watson had to parse questions that were sometimes deliberately ambiguous, search through vast amounts of text to find relevant information, assess confidence in potential answers, and respond within seconds. The system demonstrated major advances in natural language processing and question answering beyond what earlier systems could achieve.

DeepMind’s AlphaGo defeats Lee Sedol (2016): AlphaGo’s victory over one of the world’s top Go players was particularly striking because Go presents challenges that seemed beyond computer capabilities for decades. The game has vastly more possible positions than chess — estimates suggest more possibilities than atoms in the observable universe. Unlike chess, where positions can be evaluated relatively directly, Go requires what players describe as intuition — recognizing promising patterns and strategic potential that can’t be easily calculated. Many experts had predicted computers wouldn’t master Go at professional level for many more years.

These benchmarks illustrate AI’s cyclical history — periods of optimism followed by disappointment, then renewed breakthroughs driven by technological shifts in computational power, data availability, and algorithmic innovation. Understanding these cycles provides perspective on both the genuine progress achieved and the recurring pattern of inflated expectations followed by disillusionment. Each AI winter was preceded by overpromising; each subsequent revival by addressing fundamental limitations through new approaches or technologies.