AICoP: Reasoning at Scale - From GPT to DeepSeek

March 28, 2025

Columbia University’s AI Community of Practice Unpacks DeepSeek and the Future of Reasoning Models

The AI Community of Practice at Columbia University, organized by the Emerging Technologies team, held its March 2025 session with a spotlight on reasoning models and their growing role in artificial intelligence.

The featured speaker, Maj, an AI analyst with the team, delivered a deep dive into DeepSeek R1, a publicly documented reasoning model that offers a unique window into how AI systems simulate logical thinking and decision-making. The session moved beyond tools and use cases, offering attendees a rare look at the theory and architecture behind modern reasoning models.

Rethinking Reasoning in AI

Maj opened the discussion with a comparison between traditional large language models like GPT-4 and newer reasoning models such as DeepSeek R1. Unlike conventional models that aim to predict the next word quickly, reasoning models slow down the process—breaking down prompts, evaluating subcomponents, and generating more deliberate, thoughtful answers.

Participants saw examples of how DeepSeek R1’s internal "thought window" mimics human-like reasoning, even pausing before delivering responses. This enables better performance in math, coding, and complex decision-making tasks, where deeper logical analysis is crucial.

DeepSeek R1 and the Power of Reinforcement Learning

The core of the presentation focused on how DeepSeek R1 was developed:

  • R1.0 introduced reinforcement learning with accuracy and formatting as reward signals.
  • The model was trained to follow a structured response format with “thinking” and “summary” sections.
  • Unique traits like “aha moments” emerged, where the model flagged its own logic mid-process.

While R1.0 showed promise, it suffered from poor readability and language mixing. DeepSeek responded with R1, a refined version that incorporated:

  • Cold start data from few-shot prompting and human annotation
  • Improved response formatting and reward mechanisms for language consistency
  • Better generalization across both reasoning and non-reasoning tasks

Distillation, Community Resources, and Open Access

The presentation highlighted the advantages of model distillation. DeepSeek demonstrated that smaller models distilled from R1 outperformed many larger models trained from scratch. This opens new possibilities for deploying advanced reasoning capabilities in resource-limited environments.

Attendees also learned about community efforts to replicate and expand on DeepSeek’s research. Hugging Face is building an open-source implementation of R1, providing transparency and access to the underlying architecture and training methods—something rarely seen from proprietary AI labs.

Challenges and Broader Reflections

The conversation wrapped up with reflections on pedagogy, training, and the parallels between teaching AI and educating humans. Questions were raised about how educational strategies might inform AI development and whether training methodologies could become more systematic and thoughtful.

The discussion also touched on ethical considerations, including response quality, model transparency, and responsible deployment.

Looking Ahead

As reasoning models grow more capable and accessible, Columbia’s AI Community of Practice plans to explore how they can support research, teaching, and digital scholarship.

Faculty, students, and staff interested in testing reasoning models or joining future sessions are encouraged to contact the Emerging Technologies team at [email protected].