AICoP: OpenAI Codex: Data, Development, and Decision-Making
The AI Community of Practice welcomed OpenAI for a session exploring how its Codex platform is changing what it means to "build something" on a university campus. Fabio Mori and Keelan Schule, from OpenAI's education and solutions engineering teams respectively, walked attendees through Codex's evolution from a developer-only code completion tool to a full agentic platform that anyone—regardless of technical background—can use to turn ideas into working applications, automate workflows, and make sense of messy data.
Not Just for Developers Anymore
Codex has been part of OpenAI's internal toolkit since 2021, originally accessible only through the API and used heavily by OpenAI's own engineering teams to build products like Sora. But its journey to the current release tells a broader story about where AI-assisted work is headed.
The first generation of Codex handled code completion—suggesting the next line inside a function. The second generation introduced pair programming through chatbot sidebars in VS Code and Cursor, where developers could ask questions and get suggestions across files. The current third generation represents a fundamentally different model: agentic delegation. Users describe what they want built, and Codex plans, codes, tests, documents, and deploys it—running multi-step tasks autonomously, including overnight batch work with subagents that operate while you sleep.
As Keelan framed it, only about 16 percent of building a useful tool comes down to writing code. The rest is designing, testing, reviewing, refining, and deploying. Codex now participates in all of those stages, which is why the platform has found traction far beyond software engineering teams.
A Desktop App That Works on Your Behalf
The centerpiece of the session was the Codex standalone desktop app—a newer entry point designed specifically for users who don't live in development environments like VS Code or Cursor. Unlike the CLI or IDE extensions, the desktop app interacts directly with your local file system. It can create folders, manipulate files, run terminal commands, open applications, and take actions on your behalf across your desktop.
The app connects to a folder structure on your machine, and each project or folder becomes a separate workspace. Within each workspace, conversations function as independent agents that can be assigned distinct tasks. For complex projects, the recommended practice is to create separate conversation threads for different workstreams—front-end and back-end, for example—rather than switching models or goals mid-conversation.
Under the hood, Codex is powered by GPT-5.4, which incorporates the dedicated Codex model trained specifically on understanding code bases and repositories. A faster model called Spark is also available for quick co-development and iteration. Both models operate within a harness that allows them to interact with your machine—not just reading files, but running OCR, executing scripts, calling APIs, and generating outputs like PDFs and dashboards.
Plan Mode: Steering Before Building
One of the most significant features demonstrated was Plan Mode. When enabled, Codex doesn't jump straight into writing code. Instead, it analyzes your data and requirements, outlines a series of steps, surfaces assumptions, and proposes a test plan—all before executing anything. Users review the plan, adjust the model's direction, and only then approve execution.
This matters for anyone running long or complex tasks. If you're building a dashboard from a large dataset or modernizing a legacy application, Plan Mode ensures the model is aligned with your expectations before it invests time and compute. It functions similarly to OpenAI's Deep Research feature: the model asks clarifying questions, presents its reasoning, and waits for your go-ahead.
For users concerned about code quality, Codex follows industry best practices by default but can be steered with custom unit tests, style guides, and guardrails pulled from existing repositories. The model also includes built-in security awareness—flagging hardcoded passwords, warning against committing secrets to GitHub, and suggesting key rotation when credentials are exposed.
Live Demos: From Cluttered Desktops to Interactive Dashboards
Keelan ran several live demonstrations that illustrated Codex's range, starting with a deliberately non-technical use case.
Desktop Organization with OCR: Pointing Codex at a cluttered desktop covered in screenshots, Keelan asked it to organize the files using OCR to group them by content. In just over two minutes, Codex scanned 16 images, performed text extraction on files that had no text layer, categorized them into groups like administration, development, travel, and presentations, created an organized folder structure, and renamed every file. The entire operation was fully reversible—Codex maintains an audit log of every action and can revert changes on command.
Campus Analytics Dashboard: Using a synthetic CSV dataset of roughly 1,000 rows covering course enrollment and grading data across multiple campuses, Codex generated a fully interactive dashboard complete with dropdown filters, charts, and visualizations. It selected appropriate Python libraries, built the frontend, and even produced a companion talk track explaining how to use and share the dashboard. The build took approximately four minutes—a task Keelan estimated would have taken days using traditional tools like Tableau.
Flight Dashboard from Calendar Data: In a pre-built example using plugin connectors, Codex pulled flight information from calendar and email data and autonomously constructed an interactive map with plotted flight paths, city locations, flight numbers, and mileage estimates. Notably, Codex inferred the need for geographic visualization without being explicitly instructed—it understood that flight data implies mapping.
Code Repository One-Pager: Codex analyzed a Discord chatbot codebase and generated a human-readable PDF summarizing the application's purpose, intended audience, architecture, file structure, and setup instructions. It used a pre-built PDF skill for consistent formatting and could open the resulting file directly on the user's desktop.
Plugins, Skills, and Institutional Boundaries
Codex supports a growing plugin ecosystem—GitHub, Gmail, Google Calendar, Vercel, and Slack among them—that allows the platform to read from and push to external services. Skills, which are exportable workflow templates, let users codify repeatable multi-step processes. Keelan described building a skill that automatically versions his code, pushes it to GitHub, and triggers a Vercel redeployment whenever he makes significant changes.
However, an important institutional note was raised during the session: Columbia currently has third-party plugins disabled on both Codex and ChatGPT due to privacy and security policies. Users who want to work with external data can still import it manually—exporting a calendar as a PDF or downloading emails as Excel files, for example—and then point Codex at those local files.
Threads, Agents, and How to Think About Concurrency
A significant portion of the Q&A focused on how Codex manages multiple tasks. Each conversation thread operates as its own agent, and complex projects benefit from spreading distinct workstreams across separate threads rather than overloading a single one. Subagents can run concurrently within threads, and if two agents modify overlapping code, Codex can merge their changes automatically.
Context management is handled behind the scenes through automatic thread compaction—when a conversation's context window grows too large, Codex summarizes and compresses earlier content so the thread can continue without the user needing to start fresh. Threads share context through the underlying repository and file system rather than through each other's conversation histories, so keeping files and documentation updated is the most effective way to pass information between parallel workstreams.
On reasoning effort, Keelan recommended starting at the default medium level. Low is sufficient for single-step, explicit tasks. High or extra-high should be reserved for multi-step operations involving tool calls, web searches, or concurrent subagents. Extra-high reasoning is rarely beneficial outside of orchestrating five or six concurrent tasks with very specific instructions.
Getting Started at Columbia
Columbia community members can access Codex by signing in to the standalone desktop app with their ChatGPT Education account. Education accounts currently have enhanced Codex limits through the end of May. Students can also claim $100 in Codex API credits using their student email, and advanced users are encouraged to apply for OpenAI's Codex Ambassador program, which includes access to hackathons and closer collaboration with OpenAI's product team.
OpenAI will share Codex documentation, cookbooks, and resource links in a follow-up communication. For individual workflow consultations and questions about how Codex might fit into your work, contact [email protected].
Takeaway
Codex signals a meaningful shift in how AI tools can be used across a university. It's no longer just about writing code faster—it's about enabling anyone with an idea to plan, build, test, and deploy a working tool without needing to understand the technology underneath. From organizing a messy desktop to generating an interactive analytics dashboard from a spreadsheet, the demonstrations made clear that the barrier between "I wish I had a tool for this" and "here's the tool" has gotten remarkably thin. As with all AI tools in the Columbia ecosystem, users should remain mindful of institutional policies around third-party integrations and sensitive data, but the creative surface area that Codex opens up is substantial.
For questions, contact: [email protected] For training and consult inquiries, contact: [email protected]