Generative AI: RAG, AI Agents & Deployment

[Useful links](#Useful links)

[Types and Application of Gen AI](#Types and Application of Gen AI)

Marketing

[AI Hierarchy](#AI Hierarchy)

Tokenization

[Prompt Engineering](#Prompt Engineering)

LLM (Large Language Model)

Transformer

RAG (Retrieval Augmented Generation)

MCP (Model Context Protocol)

[AI Agent](#AI Agent)

[LangChain&LangGraph, LlamaIndex, CrewAI, AutoGen, PydanticAI](#LangChain&LangGraph, LlamaIndex, CrewAI, AutoGen, PydanticAI)

Chatbot

Deployment

Useful links

|---------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Book | 《AI Agents in Action》- Oreilly |
| Lastest News & Resources | * MIT AI News: Artificial intelligence | MIT News | Massachusetts Institute of Technology * AIGC Weekly：AIGC Weekly |
| API (GPT, Gemini, etc.) | GPT (Generative Pre-trained Transformer): https://developers.openai.com/api/docs Gemini: https://ai.google.dev/api |

Types and Application of Gen AI

Types: Text, Image, Video, Audio, 3D, Code, Task, Music, etc.

Application:

Text-to-Text
Text-to-Image
Text-to-Video
Text-to-Audio
Text-to-3D
Text-to-Code
Text-to-Task
Image-to-Text
Image-to-Image
Video-to-Text
Audio-to-Text
Image-to-Video
Text-to-Music
Music-to-Text
Audio-to-Audio

Image Generation:

MidJourney
Stable Diffusion
DALL-E 3

Marketing

AI Hierarchy

AIGC: Artificial Intelligence Generative Content

The emergence of AIGC is due to the breakthrough in parameter magnitude of large language models(LLM).

Tokenization

Breaks text into smaller units (words, subwords, characters) and maps them to numeric IDs.

Prompt Engineering

Methods of Prompt Engineering:

Domain-specific Knowledge
Effective Keywords
Role Prompting, Shot Prompting, Chain of Thought Prompting

Chain-of-Thoughts (CoT)
Chain-of-Thought-Self-Consistency
Tree-of-Thoughts (ToT)
Graph-of-Thoughts (GoT)
Algorithm-of-Thoughts
Skeleton-of-Thought
Program-of-Thoughts

LLM (Large Language Model)

Popular Proprietary LLMs

OpenAI - GPT
Google - Gemini
Anthropic - Claude

Open-Source LLMs

Meta - Llama
DeepSeek
OpenAI - GPT-oss
Google - Gemma

Applications

Chatbots
Doc QA
Coding
Agents

Transformer

My blog written in 2020 for AI music composition is built by LSTM:

https://blog.csdn.net/Beth_Chan/article/details/111351195

but after new GenAi era, the AI generation is not LSTM, is replaced by Transformation .

The new Generative LLMs like GPT, Gemini are built using transformers.

分析 LSTM 和 Transformer 在 AI 音乐生成领域的代际变化：

一、LSTM在AI音乐生成中的特点

LSTM作为循环神经网络（RNN）的变体，曾经是AI音乐生成的主流模型，它的优势在于：

序列建模能力：擅长捕捉音乐中的时序依赖关系，比如旋律的走向、和弦的衔接逻辑
低资源适配性：在2020年算力资源相对有限的时期，LSTM的训练成本更低，更容易在中小规模数据集上实现可用的音乐生成效果
专注局部特征：对短片段的音乐风格模仿能力较强，适合生成段落级别的旋律

二、Transformer架构带来的代际提升

以GPT、Gemini为代表的大语言模型采用的Transformer架构，为AI音乐生成带来了质的飞跃：

全局注意力机制：通过自注意力机制可以同时捕捉音乐中的长距离依赖，比如整首曲子的主题呼应、结构对称性
多模态融合能力：不仅能处理音符序列，还能结合歌词、情感标签、乐器音色等多维度信息生成更丰富的音乐
通用模型适配性：基于大语言模型的Transformer可以直接处理文本描述，实现"文字转音乐"的跨模态生成
风格迁移能力：更擅长学习不同音乐流派的全局风格特征，生成的音乐完整性和艺术性更强

三、技术迭代背后的核心逻辑

算力驱动：Transformer的训练需要海量算力，这是2020年尚不具备的基础条件
数据爆发：音乐版权数据的开放和数字化音乐库的扩张，为大模型训练提供了充足的素材
需求升级：从单纯的"生成旋律"升级为"创作符合特定场景、情感、风格的完整音乐作品"

Encoder & Decoder

Why do transformers use positional encoding?

To provide word order information, since transformers look at all tokens simultaneously.

RAG (Retrieval Augmented Generation)

MCP (Model Context Protocol)

AI Agent

Build AI Agents with LLMs

What are AI agents?

What exactly are AI agents, and why should you want to learn about them in the first place? AI agents are tools designed to allow users to interact with LLMs to achieve a more productive or creative workflow as seamlessly as possible. Before AI agents, users would be forced to build their own statistical language models---a time-consuming, technical, and expensive endeavor! Now, with AI agents, users who want to interact with AI simply get to log in to an interface and conduct business ranging from asking questions of their documents to getting help with their homework.
什么是人工智能代理，你为什么要首先了解它们？AI 代理是一种工具，旨在允许用户与 LLM 进行交互，以尽可能无缝地实现更高效或更具创造性的工作流程。在 AI 代理出现之前，用户将被迫构建自己的统计语言模型------这是一项耗时、技术且昂贵的工作！现在，有了人工智能代理，想要与人工智能互动的用户只需登录一个界面，就可以开展业务，从询问他们的文件问题到获得家庭作业的帮助。

At a more granular level, you might think of AI agents as UI "wrappers" around the models that power them. That is to say, AI agents are often user-friendly "frontends" that make using the models that fuel them easier, often by focusing and limiting just how users interact with the model. Take ChatGPT, for instance. The models fueling ChatGPT (GPT-3.5 Turbo or GPT-4) are massively complex, powerful, and difficult to use and operate on their own. As an AI agent, ChatGPT abstracts away these models' technical features and allows users to interact with them simply via text.
在更精细的级别上，你可以将 AI 代理视为支持它们的模型的 UI"包装器"。也就是说，人工智能代理通常是用户友好的"前端"，通常通过关注和限制用户与模型的交互方式，使使用驱动它们的模型变得更加容易。以ChatGPT为例。为 ChatGPT（GPT-3.5 Turbo 或 GPT-4）提供动力的模型非常复杂、功能强大且难以单独使用和操作。作为人工智能代理，ChatGPT 抽象出这些模型的技术特征，允许用户简单地通过文本与它们进行交互。

Use cases

Structure

Design Patterns

Copilot Pattern

Research Pattern