Large Language Models

Recent Trends

Learning Objectives

You know the major trends shaping the development and deployment of large language models.
You understand how the field is evolving beyond simply building larger models.

Accessibility and democratization

In the early days of GPT and BERT, access to powerful models was limited to large technology companies and well-funded research laboratories with substantial computational resources.

Today, the field has become significantly more democratized through multiple channels:

Open-source and open-weight models have become increasingly competitive with proprietary alternatives. As an example, DeepSeek-V3, released in early 2025, achieved performance comparable to GPT-4 on various benchmarks while being open-weight.
Both proprietary providers (e.g. OpenAI, Anthropic, Google) and open-source platforms offer API access with straightforward pricing and documentation, making it feasible to build AI-powered applications without machine learning expertise. The chatbot integrated into this course platform, for instance, is built using such APIs.
Community platforms like HuggingFace have become a norm for AI development, providing model hosting and tools for training, fine-tuning, evaluation, and deployment of models.

This democratization doesn’t eliminate all barriers as training frontier models still requires resources available only to well-funded organizations. However, it has significantly expanded who can use, adapt, and build applications with powerful language models.

Loading Exercise...

Reasoning models

One of the most significant recent developments is the emergence of reasoning models — language models specifically designed to solve complex problems through explicit step-by-step thinking rather than generating answers directly.

OpenAI’s o1 model, released in September 2024, demonstrated significant improvements in mathematics, science, and coding tasks by generating detailed chains of reasoning before arriving at answers. This approach uses more computation during inference (when generating responses) to achieve better performance on problems requiring careful logical analysis, effectively trading response speed for improved accuracy.

In January 2025, DeepSeek released DeepSeek-R1, an open-source reasoning model that achieved performance comparable to OpenAI’s o1 on mathematical and coding benchmarks. While the technical details of OpenAI’s o1 remain proprietary, DeepSeek-R1 was fully documented and open-weight, allowing researchers to study and build upon its architecture and training methods.

These reasoning models operate differently from standard language models:

They explicitly show their thinking process, making it possible to follow their logic step-by-step and identify where errors might occur in their reasoning.
They spend more computational resources during response generation — sometimes generating thousands of internal reasoning tokens before producing a final answer — trading speed for improved accuracy on complex problems.
They can check their own work, catching and correcting errors before providing final answers, similar to how humans might review their reasoning before submitting an answer.

By mid-2025, DeepSeek released R1-0528, which showed significant improvements in reasoning quality and reportedly reduced hallucination rates by 45-50% compared to earlier versions. Google’s Gemini 2.5 introduced a “Deep Think” mode for step-by-step problem-solving, and other major providers have followed with their own reasoning-focused models.

This shift represents a move beyond simply scaling model size toward optimizing how models use their capabilities during inference — essentially making models “think harder” about difficult problems rather than just making them larger.

Loading Exercise...

Efficiency: Doing more with less

As the costs of training and running large models have become increasingly apparent, the field has shifted focus toward efficiency rather than pure scale. Several approaches are driving this trend:

Mixture of Experts (MoE) architectures that use multiple smaller specialized sub-models that activate selectively based on the input. Rather than processing every input through all one huge model, MoE models route inputs to relevant experts.
In 2024 and 2025, developers increasingly focused on making models smaller and more efficient, with models like Microsoft’s Phi series demonstrating that carefully trained smaller models on high-quality data can approach the performance of much larger ones trained on broader datasets. These compact models can run on laptops, mobile devices, or embedded systems rather than requiring data center infrastructure.
Advanced fine-tuning techniques make it possible to adapt models more efficiently without full retraining. Methods like LoRA (Low-Rank Adaptation), which updates only a small number of additional parameters rather than the entire model, Direct Preference Optimization (DPO), which simplifies the RLHF training process, and various prompt tuning approaches allow developers to customize models for specific applications while reducing both computational cost and time requirements.
Distillation transfers knowledge from large “teacher” models to smaller “student” models. DeepSeek demonstrated that reasoning patterns from larger models can be distilled into smaller models, with their distilled 32 billion parameter model reportedly outperforming OpenAI’s o1-mini on various benchmarks despite being smaller.
Alternative architectures like RecurrentGemma and Mamba explore alternatives to the standard transformer design, seeking better efficiency for certain tasks through different approaches to processing sequential data.

These efficiency-focused developments reflect growing recognition that practical deployment often values computational efficiency, speed, and deployability as much as raw capability.

Loading Exercise...

Multimodal capabilities

Multimodal models that can process and generate multiple types of data became a part of the mainstream offerings in 2024. While early multimodal models appeared in 2022, widespread availability and practical performance came more recently.

Essentially all commercial providers offer models that can now understand and respond using combinations of text, images, and audio in real time. This expansion enables new applications:

Models can process documents containing text, images, charts, tables, and diagrams without requiring separate processing pipelines for each element type, making them practical for analyzing complex real-world documents.
Users can ask questions about images, diagrams, photographs, or videos and receive detailed explanations that reference visual content, enabling applications from educational assistance to accessibility tools.
Models can generate images from text descriptions, edit existing images based on natural language instructions, or even create video content, expanding creative possibilities beyond purely text-based interaction.
Multimodal capabilities improve accessibility by enabling systems that can describe visual content for visually impaired users, transcribe and caption audio for hearing-impaired users, or translate between modalities in other ways.

The integration of multiple modalities represents a move toward more natural, human-like interaction with AI systems, matching how humans actually process information through multiple senses rather than purely through text.

Loading Exercise...

Systems integration and agentic AI

Language models are increasingly deployed as components within larger systems rather than as standalone tools. Several approaches characterize this trend toward what’s sometimes called “agentic AI”:

Retrieval-Augmented Generation (RAG) combines language models with external knowledge sources. Instead of relying solely on knowledge encoded during training, the model retrieves relevant information from databases, document collections, or the internet before generating responses. This approach reduces hallucinations by grounding responses in retrieved sources and allows models to access current information beyond their training cutoff dates.

For example, a company chatbot might search internal documentation using semantic search before using a language model to formulate an answer in natural language, ensuring responses reflect current company policies rather than potentially outdated patterns from training data.

Tool use and function calling enable models to interact with external systems. Models can decide when to call APIs, query databases, perform calculations using code execution environments, or interact with other software tools to accomplish tasks. This transforms models from pure text generators into interactive agents capable of taking actions beyond generating text.
Agent frameworks orchestrate complex workflows where models plan sequences of actions, use tools, evaluate results, and adjust their approach based on feedback. Developer tools increasingly integrate language models directly into software development workflows — for example, AI assistants that can browse codebases, understand project structure, suggest changes, run tests, and help commit code.

This systems-level integration represents a conceptual shift from viewing language models as standalone text generators to seeing them as reasoning engines that can coordinate with other computational resources to accomplish complex tasks.

Loading Exercise...

← Rise of GPT and Open-Source Models

Summary →