
TL;DR
The core value of the GitHub Copilot SDK is not the convenience of "calling an LLM" (that's already been solved by the OpenAI SDK, LangChain, etc.), but rather providing a production-proven Agent runtime.
The problems it actually solves are:
- Orchestration complexity: Planner, tool routing, and state management are built-in
- Stability: Reliability guaranteed by millions of developers using it daily
- Evolvability: New models and tool capabilities are automatically updated by the CLI
When you start building your next AI application, ask yourself two questions:
- Where is my core value? If it's in business logic and tool definitions, use the SDK; if it's in low-level orchestration innovation, build your own framework.
- How fast do you need to reach production? The SDK lets you skip 80% of the infrastructure work and focus on the last 20% of differentiated capability.
The barrier to Agent development has dropped, but the real challenge is: defining valuable tools, designing smooth interactions, and solving real problems. Technology is no longer the bottleneck --- imagination is.
Introduction: Why Agent Development Is No Longer Just for Experts
In January 2026, GitHub released the Copilot SDK, marking a pivotal shift in AI Agent development from "expert territory" to "mainstream tooling."
Before this, building an AI Agent capable of autonomous planning, tool invocation, and file editing required you to:
- Choose and integrate an LLM service (OpenAI, Anthropic, Azure...)
- Build your own Agent orchestrator (planner, tool routing, state management)
- Handle streaming output, error retries, and context management
- Implement tool definition standards (function calling schema)
This process was complex and fragile. Open-source frameworks (LangChain, AutoGPT) lowered the barrier, but still required deep understanding of Agent runtime mechanics. The real turning point: GitHub opened up the production-grade Agent runtime from Copilot CLI as an SDK.
What does this mean? You can launch a complete Agent runtime with just 5 lines of code:
python
import asyncio
from copilot import CopilotClient
async def main():
client = CopilotClient()
await client.start()
session = await client.create_session({"model": "gpt-4.1"})
response = await session.send_and_wait({"prompt": "Explain quantum entanglement"})
print(response.data.content)
asyncio.run(main())
No need to worry about model integration, prompt engineering, or response parsing --- all of this has been battle-tested by Copilot CLI across millions of developers. You only need to define business logic; the SDK handles everything else.
Goal of this article: Through a complete weather assistant example, help you understand:
- How the SDK communicates with the CLI (the architectural essence)
- How the tool invocation mechanism works (how the LLM "decides" to call your code)
- The key leap from toy to tool (streaming responses, event listening, state management)
Whether you want to quickly validate an AI application idea or build a customized Agent for your enterprise, this article is the starting point.
Prerequisites: Setting Up Your Environment
Before writing any code, make sure your development environment meets the following requirements.
Prerequisites Checklist
1. Install the GitHub Copilot CLI
The SDK itself does not contain AI inference capabilities --- it communicates with the Copilot CLI via JSON-RPC. The CLI is the real "engine"; the SDK is the "steering wheel."
bash
# macOS/Linux
brew install copilot-cli
# Verify installation
copilot --version
2. Authenticate Your GitHub Account
bash
copilot login
You need a GitHub Copilot subscription (individual or enterprise). If using BYOK (Bring Your Own Key) mode, you can skip this step.
Verify the Environment
Run the following command to confirm the CLI is working:
bash
copilot -p "Explain recursion in one sentence"
If you see an AI response, the environment is ready.
Step 1: Send Your First Message
Install the SDK
Create a project directory and install the Python SDK:
bash
mkdir copilot-demo && cd copilot-demo
# working in virtual env
python -m venv venv && source venv/bin/activate
pip install github-copilot-sdk
Minimal Code Example
Create main.py:
python
import asyncio
from copilot import CopilotClient
async def main():
client = CopilotClient()
await client.start()
session = await client.create_session({"model": "gpt-4.1"})
response = await session.send_and_wait({"prompt": "What is quantum entanglement?"})
print(response.data.content)
await client.stop()
asyncio.run(main())
Run it:
bash
python main.py
You'll see the AI's complete response. With just 9 lines of code, a complete AI conversation is done.
Execution Flow Breakdown
What happens behind this code?
scss
1. client.start() → SDK launches the Copilot CLI process (runs in the background)
2. create_session() → Requests the CLI to create a session via JSON-RPC
3. send_and_wait() → Sends the prompt; the CLI forwards it to the LLM
4. LLM inference → Response is returned to the SDK through the CLI
5. response.data → SDK parses the JSON response and extracts the content
The Architectural Essence: The SDK Is the CLI's "Remote Control"
GitHub's design philosophy is separation of concerns:
| Component | Responsibility |
|---|---|
| Copilot CLI | Agent runtime (planning, tool invocation, LLM communication) |
| SDK | Process management, JSON-RPC wrapper, event listening |
| Your code | Business logic and tool definitions |
Advantages of this architecture:
- Independent CLI upgrades: New models and tool capabilities don't require SDK changes
- Low multi-language support cost: Each language SDK only needs to implement a JSON-RPC client
- Debug-friendly: The CLI can run independently, making it easy to observe logs and troubleshoot
JSON-RPC Communication Example
When you call send_and_wait(), the actual request the SDK sends:
json
{
"jsonrpc": "2.0",
"method": "session.send",
"params": {
"sessionId": "abc123",
"prompt": "What is quantum entanglement?"
},
"id": 1
}
CLI response:
json
{
"jsonrpc": "2.0",
"result": {
"data": {
"content": "Quantum entanglement refers to a phenomenon where two or more quantum systems..."
}
},
"id": 1
}
Understanding this is important: The SDK is not "calling an LLM" --- it's "calling the CLI." The CLI has already encapsulated all the complexity.
Step 2: Real-Time AI Responses --- Streaming Output
Why Streaming Responses Are Needed
When using send_and_wait(), you must wait for the LLM to generate a complete response before seeing any output. For long-form generation (such as code explanations or documentation), users might stare at a blank screen for 10--30 seconds.
Streaming responses let the AI output text word by word, like a typewriter --- improving user experience while also allowing you to catch early signs that the model is going off track.
Event Listening Mechanism
Modify main.py to enable streaming output:
python
import asyncio
import sys
from copilot import CopilotClient
from copilot.generated.session_events import SessionEventType
async def main():
client = CopilotClient()
await client.start()
session = await client.create_session({
"model": "gpt-4.1",
"streaming": True, # Enable streaming mode
})
# Listen for response deltas
def handle_event(event):
if event.type == SessionEventType.ASSISTANT_MESSAGE_DELTA:
sys.stdout.write(event.data.delta_content)
sys.stdout.flush()
if event.type == SessionEventType.SESSION_IDLE:
print() # Newline when complete
session.on(handle_event)
await session.send_and_wait({"prompt": "Write a code example of quicksort"})
await client.stop()
asyncio.run(main())
After running it, you'll see results gradually "stream in" rather than appearing all at once.
The Design Philosophy of the Event-Driven Model
The SDK uses the Observer pattern to handle the asynchronous event stream from the CLI:
scss
CLI generates events → SDK parses → Dispatches to listeners → Your handle_event() executes
Main event types:
| Event | Triggered When | Typical Use |
|---|---|---|
ASSISTANT_MESSAGE_DELTA |
AI generates partial content | Real-time display |
ASSISTANT_MESSAGE |
AI completes a full message | Get final content |
SESSION_IDLE |
Session enters idle state | Mark task complete |
TOOL_CALL |
AI decides to invoke a tool | Logging, auth check |
Code Comparison: Synchronous vs. Streaming
Synchronous mode --- suitable for short responses:
python
response = await session.send_and_wait({"prompt": "1+1=?"})
print(response.data.content) # Wait and print all at once
Streaming mode --- suitable for long-form content:
python
session.on(lambda event:
print(event.data.delta_content, end="")
if event.type == SessionEventType.ASSISTANT_MESSAGE_DELTA
else None
)
await session.send_and_wait({"prompt": "Write an article"})
Technical Details Under the Hood
Streaming responses are based on Server-Sent Events (SSE) or WebSocket:
- The CLI receives a token stream from the LLM
- For each token received, the CLI sends a
message_deltaevent to the SDK - The SDK triggers your event listener
- The user immediately sees new content
This design lets your application perceive the AI's "thinking process", not just the final result.
Step 3: Giving the AI Capabilities --- Custom Tools
The Essence of Tools: Letting the LLM Call Your Code
Up to now, the AI can only "talk" --- it cannot interact with the outside world. Tools are the core capability of an Agent: you define functions, and the AI decides when to call them.
For example:
- User: "What's the weather in Beijing today?"
- AI thinks: I need weather data → call
get_weather("Beijing") - Your code: returns
{"temperature": "15°C", "condition": "sunny"} - AI synthesizes: "Beijing is sunny today, 15°C."
Key point: The AI autonomously decides whether to call a tool and what parameters to pass.
Three Elements of a Tool Definition
A tool contains:
- Description: Tells the AI what this tool does
- Parameter schema: Defines the structure of input parameters (using Pydantic)
- Handler: The Python function that actually executes
Complete Weather Assistant Example
Create weather_assistant.py:
python
import asyncio
import random
import sys
from copilot import CopilotClient
from copilot.tools import define_tool
from copilot.generated.session_events import SessionEventType
from pydantic import BaseModel, Field
# 1. Define parameter schema
class GetWeatherParams(BaseModel):
city: str = Field(description="City name, e.g., Beijing, Shanghai")
# 2. Define tool (description + handler)
@define_tool(description="Get current weather for a specified city")
async def get_weather(params: GetWeatherParams) -> dict:
city = params.city
# In production, call a real weather API here
# Using mock data for demonstration
conditions = ["sunny", "cloudy", "rainy", "overcast"]
temp = random.randint(10, 30)
condition = random.choice(conditions)
return {
"city": city,
"temperature": f"{temp}°C",
"condition": condition
}
async def main():
client = CopilotClient()
await client.start()
# 3. Pass tools to session
session = await client.create_session({
"model": "gpt-4.1",
"streaming": True,
"tools": [get_weather], # Register tool
})
# Listen for streaming responses
def handle_event(event):
if event.type == SessionEventType.ASSISTANT_MESSAGE_DELTA:
sys.stdout.write(event.data.delta_content)
sys.stdout.flush()
if event.type == SessionEventType.SESSION_IDLE:
print("\n")
session.on(handle_event)
# Send a prompt that requires tool calls
await session.send_and_wait({
"prompt": "What's the weather like in Beijing and Shanghai? Compare them."
})
await client.stop()
asyncio.run(main())
Run:
bash
python weather_assistant.py
Execution Flow Explained
When you ask "What's the weather in Beijing and Shanghai":
- AI analyzes the question → weather data needed
- AI checks available tools → finds the
get_weatherfunction - AI decides to call →
get_weather(city="Beijing") - SDK triggers the handler → your function returns
{"temperature": "22°C", ...} - AI receives the result → calls
get_weather(city="Shanghai")again - AI synthesizes the answer → "Beijing is sunny at 22°C; Shanghai is overcast at 18°C..."
The AI will automatically call the tool multiple times (once for Beijing, once for Shanghai) --- you don't need to write any loop logic.
Why the Parameter Schema Matters
Why define parameters with Pydantic?
python
class GetWeatherParams(BaseModel):
city: str = Field(description="City name")
unit: str = Field(default="celsius", description="Temperature unit: celsius or fahrenheit")
The SDK converts this schema to JSON Schema and passes it to the LLM:
json
{
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
"unit": {"type": "string", "description": "Temperature unit"}
},
"required": ["city"]
}
The LLM extracts parameters based on this schema. Therefore, the clearer the description, the more accurately the AI will invoke the tool.
Step 4: Building an Interactive Assistant
Now let's combine all the capabilities: streaming output + tool invocation + command-line interaction.
Complete Runnable Code
Create interactive_assistant.py:
python
import asyncio
import random
import sys
from copilot import CopilotClient
from copilot.tools import define_tool
from copilot.generated.session_events import SessionEventType
from pydantic import BaseModel, Field
# Define tools
class GetWeatherParams(BaseModel):
city: str = Field(description="City name, e.g., Beijing, Shanghai, Guangzhou")
@define_tool(description="Get current weather for a specified city")
async def get_weather(params: GetWeatherParams) -> dict:
city = params.city
conditions = ["sunny", "cloudy", "rainy", "overcast", "hazy"]
temp = random.randint(5, 35)
condition = random.choice(conditions)
humidity = random.randint(30, 90)
return {
"city": city,
"temperature": f"{temp}°C",
"condition": condition,
"humidity": f"{humidity}%"
}
async def main():
client = CopilotClient()
await client.start()
session = await client.create_session({
"model": "gpt-4.1",
"streaming": True,
"tools": [get_weather],
})
# Event listeners
def handle_event(event):
if event.type == SessionEventType.ASSISTANT_MESSAGE_DELTA:
sys.stdout.write(event.data.delta_content)
sys.stdout.flush()
if event.type == SessionEventType.SESSION_IDLE:
print() # Newline when complete
session.on(handle_event)
# Interactive conversation loop
print("🌤️ Weather Assistant (type 'exit' to quit)")
print("Try: 'What's the weather in Beijing?' or 'Compare weather in Guangzhou and Shenzhen'\n")
while True:
try:
user_input = input("You: ")
except EOFError:
break
if user_input.lower() in ["exit", "quit"]:
break
if not user_input.strip():
continue
sys.stdout.write("Assistant: ")
await session.send_and_wait({"prompt": user_input})
print() # Extra newline
await client.stop()
print("Goodbye!")
asyncio.run(main())
Sample Output
bash
python interactive_assistant.py
Example conversation:
vbnet
🌤️ Weather Assistant (type 'exit' to quit)
Try: 'What's the weather in Beijing?' or 'Compare weather in Guangzhou and Shenzhen'
You: Compare weather in Guangzhou and Shenzhen
Assistant: Guangzhou: 21°C, sunny, 84% humidity.
Shenzhen: 33°C, hazy, 77% humidity.
Shenzhen is significantly warmer and hazier, while Guangzhou is cooler and sunnier with slightly higher humidity.
You: What's the weather in Shenzhen
Assistant: The weather in Shenzhen is 8°C, overcast, with 47% humidity.
You: quit
Goodbye!
Key Design Considerations
1. Session Persistence
Notice that we create the session only once, and reuse it throughout the entire conversation loop. This means:
- The AI remembers previous conversation content
- Follow-up questions like "What about tomorrow?" work (the AI knows which city you mean)
- Tool call history is also retained
2. Async I/O Done Right
python
# Using input() in while True loop
user_input = input("You: ") # Synchronous blocking, but acceptable here
# send_and_wait() is async
await session.send_and_wait({"prompt": user_input})
Why is input() blocking acceptable here? Because we're waiting for user input, not an I/O operation. The real async behavior happens when communicating with the CLI.
3. Graceful Exit
python
try:
user_input = input("You: ")
except EOFError: # Catch Ctrl+D
break
Handling EOFError and common exit commands (exit, quit) ensures a smooth user experience.
Extension Ideas
Based on this framework, you can quickly extend functionality:
Add more tools:
python
@define_tool(description="Query real-time stock price")
async def get_stock_price(params): ...
@define_tool(description="Search information on the web")
async def web_search(params): ...
session = await client.create_session({
"tools": [get_weather, get_stock_price, web_search],
})
The AI will automatically select the appropriate tool based on the user's question.
Add a system prompt:
python
session = await client.create_session({
"model": "gpt-4.1",
"tools": [get_weather],
"system_message": {
"content": "You are a professional weather assistant. Keep answers concise but informative."
}
})
Log tool calls:
python
def handle_event(event):
if event.type == SessionEventType.TOOL_CALL:
print(f"\n[Debug] AI called tool: {event.data.tool_name}")
print(f"[Debug] Arguments: {event.data.arguments}\n")
Debugging Tips
During development, observing CLI logs is crucial for understanding Agent behavior.
Start a standalone CLI server:
bash
# Start CLI server in debug mode
copilot --headless --log-level debug --port 9999
# Optional: specify log directory
copilot --headless --log-level debug --port 9999 --log-dir ./logs
Connect from your code:
python
client = CopilotClient({
'cli_url': 'http://localhost:9999',
})
await client.start() # Connects directly without starting a new process
View logs:
By default, logs are saved in ~/.copilot/logs/, with an independent log file for each server process. Use tail -f to monitor in real time:
bash
tail -f ~/.copilot/logs/process-<timestamp>-<pid>.log
Debug tool calls:
python
def handle_event(event):
# Tool call starts
if event.type == SessionEventType.TOOL_USER_REQUESTED:
print(f"[Tool Call] {event.data.tool_name}")
print(f"Arguments: {event.data.arguments}")
# Tool execution result
if event.type == SessionEventType.TOOL_EXECUTION_COMPLETE:
print(f"[Tool Result] {event.data.tool_name}")
print(f"Result: {event.data.result}")
# AI's final response
if event.type == SessionEventType.ASSISTANT_MESSAGE:
print(f"[Assistant] {event.data.content[:100]}...")
session.on(handle_event)
This pattern gives you a clear view of the entire tool call chain: AI decision → tool execution → result return → final response.