AI: Community of Practice

Emerging Technologies' AI: Community of Practice (AICoP) is a multidisciplinary congregation of curious minds, eager to delve into the realms of artificial intelligence (AI) and machine learning (ML). The community is a platform for learning, discussion, and application of AI principles across various fields of study at Columbia University. We aim to demystify AI, spur innovation, and approach challenges with a fresh, AI-centric perspective through regular meetings, workshops, and collaborative projects. All while fostering a culture of inclusivity, respect, and collective growth.

We encourage Columbia Researchers, Faculty, and Administrators interested in joining to send in your interest intake form.

Discussion Highlights

CUIT delivered a presentation on the fundamentals of the Attention mechanism (Attention), which is central to the functionality of large language models (LLMs), such as those used in GPT (Generative Pre-trained Transformer) architectures. This session provided a comprehensive understanding of the transformative effects of Attention technology in LLMs, exploring both its groundbreaking applications and the challenges it presents in terms of computational demands and potential ethical concerns.

Key highlights:

1. Understanding the Attention Mechanism

  • LLMs are based on a transformer architecture, heavily relying on a mechanism known as Attention to process language data.
  • Attention helps the model determine which parts of the input data are relevant, which improves its ability to generate coherent and contextually appropriate responses. It is pivotal for the performance improvements seen in models like GPT.

2. Practical Applications and Implications

  • Attention in models applies in practical applications from simple text generation to complex tasks like multimodal inputs (integrating text, image, and video).
  • Challenges include the computational demand of these models and their reliance on large, diverse datasets which may not always be of high quality or free from bias.
  • Conversation also touched on potential future advancements in AI and how attention-based models could revolutionize various fields by processing complex, multimodal data.

This month's AICoP was dedicated to demonstrating various approaches for creating custom bots using large language models (LLMs) and the practical application of these concepts. These include no-code options, open-source tools, and more technical solutions such as Retrieval Augmented Generation (RAG) fine-tuning using custom datasets. 

Key highlights included:

  • Creating custom bots using ChatGPT Enterprise offers an accessible, no-code solution to tailor LLMs for specific needs.
  • Running open-source LLMs on proprietary hardware or utilizing cloud computing platforms provides various options based on technical expertise and resource availability.
  • RAG fine-tuning technique offers paths to significantly enhance the performance and accuracy of LLMs using custom datasets.
  • Technical discussions centered around the importance of clean, well-prepared datasets for training and the potential for custom LLMs to be integrated into various applications and services.

The meeting served as an informative and interactive session on the practicalities of utilizing LLMs across various levels of technical expertise, focusing on customization, data privacy, and the balance between model accuracy and resource investment. The session emphasized the rapidly evolving nature of LLM technology and tools, suggesting that more accessible and more robust solutions for fine-tuning and customization are likely on the horizon.

AICoP convened a special session dedicated to the launch of ChatGPT Enterprise for Columbia University. The agenda encompasses an in-depth presentation of ChatGPT Enterprise, highlighting its features, enhanced security and privacy, and benefits of ChatGPT's custom GPTs to the Columbia University community. 

Columbia University ChatGPT Enterprise Launch:

  • CUIT has finalized an enterprise-wide license agreement for ChatGPT, marking a significant step forward in incorporating AI tools into the university's toolkit.
  • The process involved extensive reviews, including security, architecture, and legal considerations, to ensure data protection and compliance.

Security and Data Protection:

  • A 'walled garden' approach ensures high-level data protection, privacy, and encryption, with compliance with GDPR and HIPAA.
  • User data remains private and is not shared externally or used for model fine-tuning without permission.

Features and Benefits:

  • ChatGPT offers advanced AI capabilities, including the most mature AI model with image creation, data analytics, and code interpretation features.
  • Enterprise customers can create internal-only GPTs for specific business needs, departments, or proprietary datasets, without coding.
  • The enterprise license ensures dedicated, reliable, and scalable access, distinguishing it from free or commercial versions.

Integration and Use Cases:

  • ChatGPT demonstrates the potential for integration into Columbia's systems, such as CourseWorks, to enhance educational tools and create personalized learning experiences.
  • CUIT demoed various use cases and initiated discussions on API usage for broader application and customization possibilities, including building and sharing bots for specific departmental or research needs.

Columbia Technology Ventures (CTV), the tech transfer arm of Columbia University, recently embarked on an exciting journey to explore the capabilities of ChatGPT Enterprise in enhancing various office functions and automation. This exploration was part of an informal gathering of individuals with a shared interest in understanding AI's technical and functional capabilities.

CTV's AI Exploration

  • Experiments and Projects: CTV shared insights from various projects, including the automation of mass email campaigns using a Python package, the on-the-fly drafting of legal language for licensing agreements, improving negotiation efficiency, and analyzing data tables to streamline internal business functions using ChatGPT (DAT GPT). One of the most significant findings was ChatGPT's proficiency in high-skill tasks such as drafting legal language and performing data analysis.

  • Automation Potential: The exploration revealed a considerable potential for automating repetitive tasks, which could benefit those unfamiliar with coding. However, exporting non-textual files remains a challenge, albeit one expected to improve as technology advances.

    • Learning and Future Applications: CTV is optimistic about further integrating ChatGPT into their workflows, emphasizing the need for continued experimentation and adaptation to leverage AI tools effectively. The session emphasized the importance of setting up system prompts and scope prompts to guide ChatGPT's interactions, enhancing its efficiency and relevance to specific tasks.

    • Community Feedback and Interest: The exploration generated significant interest among the participants, with discussions on how to set up effective prompts for ChatGPT and the potential for its application in various projects.