Python+AI 全栈学习笔记

- [一、Python 基础语法](#一、Python 基础语法)
- - [1.1 字面量与变量](#1.1 字面量与变量)
  - [1.2 运算符](#1.2 运算符)
  - [1.3 流程控制](#1.3 流程控制)
  - [1.4 数据容器](#1.4 数据容器)
  - - [列表 list](#列表 list)
    - [元组 tuple](#元组 tuple)
    - [字典 dict](#字典 dict)
    - [集合 set](#集合 set)
  - [1.5 函数](#1.5 函数)
  - [1.6 模块与包](#1.6 模块与包)
  - [1.7 面向对象基础](#1.7 面向对象基础)
  - [1.8 异常处理](#1.8 异常处理)
- 二、面向对象高级
- - [2.1 封装](#2.1 封装)
  - [2.2 继承与重写](#2.2 继承与重写)
  - [2.3 多继承](#2.3 多继承)
  - [2.4 多态与鸭子类型](#2.4 多态与鸭子类型)
- [三、文件操作与 JSON](#三、文件操作与 JSON)
- - [3.1 文件读写](#3.1 文件读写)
  - [3.2 JSON 操作](#3.2 JSON 操作)
  - [3.3 CSV 操作](#3.3 CSV 操作)
- [四、AI 应用开发](#四、AI 应用开发)
- - [4.1 DeepSeek API 调用](#4.1 DeepSeek API 调用)
  - [4.2 Streamlit 基础](#4.2 Streamlit 基础)
  - [4.3 会话记忆 + 流式输出](#4.3 会话记忆 + 流式输出)
  - [4.4 完整 AI 机器人（侧边栏 + 会话持久化）](#4.4 完整 AI 机器人（侧边栏 + 会话持久化）)
- 五、网络爬虫
- - [5.1 requests + lxml + XPath](#5.1 requests + lxml + XPath)
  - [5.2 TMDB 电影爬取（列表页→详情页）](#5.2 TMDB 电影爬取（列表页→详情页）)
  - [5.3 正则表达式（数据清洗）](#5.3 正则表达式（数据清洗）)
- 六、数据分析（入门）
- - [6.1 Pandas 基础](#6.1 Pandas 基础)
- [七、Web 框架（FastAPI）](#七、Web 框架（FastAPI）)
- 速查表

一、Python 基础语法

1.1 字面量与变量

Python 常见数据类型：int、float、bool、str、None

python 复制代码

# 基本字面量
print(1)       # int
print(1.2)     # float
print(True)    # bool
print("字符串") # str
print(None)    # NoneType

# 变量赋值
num = 1

# 查看类型
print(type(1))       # <class 'int'>
print(isinstance(1, int))  # True

# 变量交换
a, b = 1, 2
a, b = b, a

字符串定义方式：

单引号：'str'
双引号："str"
三引号（多行）："""str"""
转义字符：\' \" \n \t

字符串格式化三种方式：

python 复制代码

name = "jgs"
age = 18

# 1. str拼接
print(name + str(age))

# 2. %s占位符
print("我是%s，今年%s" % (name, age))

# 3. f-string（推荐）
print(f"我是{name}，今年{age}")

1.2 运算符

运算符	说明	示例
`+` `-` `*` `/`	加减乘除	`/` 结果为浮点数
`//`	整除	`10 // 3` → `3`
`%`	取余	`10 % 3` → `1`
`**`	幂	`10 ** 2` → `100`
`+=` `-=` `*=` 等	赋值运算	`num += 1`
`and` `or` `not`	逻辑运算	`a > 5 and a < 10`

1.3 流程控制

if / elif / else：

python 复制代码

num = int(input())
if num % 2 == 0:
    print("偶数")
elif num % 3 == 0:
    print("能被3整除")
else:
    print("奇数")

match / case（Python 3.10+）：

python 复制代码

num = int(input())
match num:
    case 1:
        print(num)
    case 2:
        print(num * 2)
    case _:
        print("兜底")

while 循环：

python 复制代码

i = 10
while i != 0:
    print(f'第{i}次循环')
    i -= 1
else:
    print('正常结束循环')

for 循环 + range：

python 复制代码

# range(end) / range(start, end) / range(start, end, step)
for i in range(1, 11):
    print(i)

break 与 continue：

break：跳出整个循环
continue：跳过本次，进入下一次循环

猜数字游戏：

python 复制代码

import random
random_num = random.randint(1, 100)

while True:
    num = int(input())
    if num == random_num:
        print('猜对了')
        break
    elif num < random_num:
        print('小了')
    else:
        print('大了')

1.4 数据容器

列表 list

python 复制代码

list1 = [1, 2, 3, 4, 5]

# 索引：list1[0] → 1，list1[-1] → 5
# 切片：list1[0:5:1]，list1[::-1] 反转

# 常用方法
list1.append(6)      # 尾部追加
list1.insert(0, 0)   # 指定位置插入
list1.remove(3)      # 删除第一个匹配值
list1.pop()          # 删除最后一个
list1.sort()         # 排序
list1.reverse()      # 反转

# 统计
min(list1), max(list1), sum(list1), len(list1)

# 合并
new_list = [*list1, *list2]

# 判断存在
if 3 in list1: ...

# 列表推导式
squares = [i ** 2 for i in range(1, 21)]

元组 tuple

python 复制代码

tup = (1, 2, 3, 1)
tup.count(1)   # 2
tup.index(2)   # 1

# 组包与解包
a, b, c = (1, 2, 3)
e, *f = (1, 2, 3, 4)  # e=1, f=[2,3,4]

字典 dict

python 复制代码

dict1 = {}
dict1["apple"] = {"price": 5, "count": 3}
dict1.pop("apple")  # 删除

# 购物车案例用 match/case 处理用户输入

集合 set

python 复制代码

set1 = {1, 2, 3, 4, 5}
set1.add(6)
set1.remove(6)

# 集合运算
set1 & set2   # 交集
set1 | set2   # 并集
set1 - set2   # 差集

# 集合推导式
result = {s for s in football_set if s not in basketball_set}

1.5 函数

python 复制代码

# 变量作用域
num = 100
def change():
    global num  # 声明全局变量
    num = 1

# 传参方式：位置传参、关键字传参
def func(a, b, c):
    print(a, b, c)
func(1, c=3, b=2)

# 不定长参数
def func2(*args, **kwargs):
    print(args)    # 元组
    print(kwargs)  # 字典
func2(1, 2, 3, name="test")

# 递归：阶乘
def factorial(n):
    if n == 0:
        return 1
    return n * factorial(n - 1)

1.6 模块与包

python 复制代码

# 导入方式
import random
import random as rd
from random import randint
from random import randint as rdint
from random import *

# 包结构
# utils/
#   __init__.py   ← 标识为包，可定义 __all__
#   index.py      ← 模块文件

__init__.py 中可以定义 __all__ 来控制 from 包 import * 导出哪些模块。

1.7 面向对象基础

python 复制代码

class Student:
    basic = 1  # 类属性

    def __init__(self, name, age):
        self.name = name  # 实例属性
        self.age = age

    def __str__(self):
        return f"姓名:{self.name}"

    def __eq__(self, other):
        return self.name == other.name

    def __lt__(self, other):
        return self.age < other.age

1.8 异常处理

python 复制代码

try:
    print(1 / 0)
except Exception as e:
    print(e)  # division by zero

二、面向对象高级

2.1 封装

python 复制代码

class Car:
    def __init__(self, name, color):
        self.name = name
        self.color = color

    def run(self):
        print('car running')

    def __charge(self):  # 私有方法，外部无法调用
        print('car charge')

2.2 继承与重写

python 复制代码

# 继承
class FuelCar(Car):
    def __init__(self, name, color):
        super().__init__(name, color)

# 重写
class ElectraCar(Car):
    def run(self):
        super().run()  # 调用父类方法
        print('electra car run')

2.3 多继承

python 复制代码

class AIDriver:
    def run(self):
        print('AI run')

class AICar(Car, AIDriver):
    def __init__(self, name, color):
        super().__init__(name, color)

2.4 多态与鸭子类型

python 复制代码

def handleRun(car: Car):
    car.run()

# 鸭子类型：不需要继承，只要有 run() 方法就行
class Duck:
    def run(self):
        print('duck run')

handleRun(Duck())  # 正常运行

三、文件操作与 JSON

3.1 文件读写

python 复制代码

# 读取
f = open('./resource/望庐山瀑布', 'r', encoding="utf-8")
content = f.read()
f.close()

# 写入（覆盖）
f = open('./resource/静夜思', 'w', encoding="utf-8")
f.write("窗前明月光\n")
f.close()

# with 语句（推荐，自动关闭）
with open('./resource/静夜思', 'w', encoding="utf-8") as f:
    f.write("窗前明月光\n")

# 追加模式
with open('./resource/静夜思', 'a', encoding="utf-8") as f:
    f.write("低头思故乡\n")

3.2 JSON 操作

python 复制代码

import json

data = {"name": "张三", "age": 18}

# 写入 JSON
with open('ok.json', 'w', encoding='utf-8') as f:
    json.dump(data, f, ensure_ascii=False, indent=2)

# 读取 JSON
with open('ok.json', 'r', encoding='utf-8') as f:
    data = json.load(f)
    print(data['age'])  # 18

3.3 CSV 操作

python 复制代码

import csv

# 写入
with open('1.csv', 'w', encoding='utf-8', newline='') as f:
    writer = csv.DictWriter(f, fieldnames=['id', 'name', 'age'])
    writer.writeheader()
    writer.writerow({'id': 1, 'name': '张三', 'age': 18})

# 读取
with open('1.csv', 'r', encoding='utf-8') as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(row['id'])

四、AI 应用开发

4.1 DeepSeek API 调用

python 复制代码

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get('DEEPSEEK_API_KEY'),
    base_url="https://api.deepseek.com"
)

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Hello"},
    ],
    stream=False
)
print(response.choices[0].message.content)

4.2 Streamlit 基础

python 复制代码

import streamlit as st

st.set_page_config(page_title="AI智能伴侣", page_icon="🤖", layout="wide")
st.title("AI智能伴侣")

# session_state 管理会话状态
if "messages" not in st.session_state:
    st.session_state.messages = []

# 展示历史消息
for msg in st.session_state.messages:
    st.chat_message(msg["role"]).write(msg["content"])

# 获取用户输入
prompt = st.chat_input("请输入你的问题")

4.3 会话记忆 + 流式输出

关键：将完整历史 *st.session_state.messages 传入 API

python 复制代码

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "你是助手"},
        *st.session_state.messages  # 传入历史消息
    ],
    stream=True  # 流式输出
)

# 流式显示
response_content = st.empty()
full_response = ""
for chunk in response:
    if chunk.choices[0].delta.content is not None:
        full_response += chunk.choices[0].delta.content
        response_content.chat_message("assistant").write(full_response)

4.4 完整 AI 机器人（侧边栏 + 会话持久化）

核心功能：

侧边栏 ：with st.sidebar: 上下文管理器
会话保存 ：json.dump() 保存到文件
会话加载 ：json.load() 从文件读取
会话删除 ：os.remove() 删除文件
自定义人设：通过 system_prompt 模板注入

python 复制代码

# 侧边栏
with st.sidebar:
    st.subheader("AI控制面板")
    if st.button('新建会话'):
        save_session()
        st.session_state.messages = []

# 保存会话
def save_session():
    with open("./session/%s.json" % session_name, 'w', encoding='utf-8') as f:
        json.dump(session_dict, f, ensure_ascii=False, indent=2)

# System Prompt 模板
system_prompt = """你叫%s，性格：%s"""

五、网络爬虫

5.1 requests + lxml + XPath

python 复制代码

from lxml import html
import requests

resp = requests.get("https://www.tiobe.com/tiobe-index/")
doc = html.fromstring(resp.text)

# XPath 语法
# //table             → 所有 table
# //table[@id='top20'] → id 为 top20 的 table
# /tbody/tr           → 子元素
# ./td/text()         → 当前节点下 td 的文本
# @href               → 获取属性值

tr_list = doc.xpath("//table[@id='top20']/tbody/tr")
for tr in tr_list:
    td_l = tr.xpath("./td/text()")
    print([td_l[-3], td_l[-2], td_l[-1]])

5.2 TMDB 电影爬取（列表页→详情页）

python 复制代码

# 1. 从列表页获取所有电影链接
href_list = resp_html.xpath("//div[@id='media-list']//a/@href")

# 2. 遍历链接，请求详情页
for item in href_list:
    resp = requests.get(f"https://www.themoviedb.org{item}")
    html_ = html.fromstring(resp.text)
    slogan = html_.xpath("//div[@class='header_info']/h3[1]/text()")

# 3. 保存到 CSV
csv_writer = csv.DictWriter(f, fieldnames=column)
csv_writer.writeheader()
csv_writer.writerows(data)

5.3 正则表达式（数据清洗）

正则表达式用于从文本中匹配、提取、替换特定模式的字符串。

基础语法：

符号	含义	示例
`.`	匹配任意单个字符（除换行）	`a.c` → abc, aXc
`*`	前一个字符出现 0 次或多次	`ab*` → a, ab, abb
`+`	前一个字符出现 1 次或多次	`ab+` → ab, abb
`?`	前一个字符出现 0 次或 1 次	`ab?` → a, ab
`{n}`	前一个字符恰好出现 n 次	`a{3}` → aaa
`{n,m}`	出现 n 到 m 次	`a{2,4}` → aa, aaa, aaaa
`^`	匹配字符串开头	`^Hello`
`$`	匹配字符串结尾	`world$`
`\d`	匹配数字 $0-9$	`\d+` → 123
`\w`	匹配字母/数字/下划线	`\w+` → hello_1
`\s`	匹配空白字符	`\s+` → 空格/tab
`[]`	字符集合	`[abc]` → a 或 b 或 c
`()`	分组捕获	`(\d+)-(\d+)`
`	`	或

Python re 模块常用方法：

python 复制代码

import re

text = "电影评分：8.5分，上映日期：2024-01-15，票房：12.3亿"

# 1. re.search() --- 查找第一个匹配
match = re.search(r'\d+\.\d+', text)
print(match.group())  # 8.5

# 2. re.findall() --- 查找所有匹配
numbers = re.findall(r'\d+\.?\d*', text)
print(numbers)  # ['8.5', '2024', '01', '15', '12.3']

# 3. re.sub() --- 替换
clean = re.sub(r'[^\w\s]', '', text)
print(clean)  # 去除标点

# 4. re.split() --- 分割
parts = re.split(r'[，。]', text)
print(parts)

# 5. re.match() --- 从开头匹配
result = re.match(r'电影', text)
print(result.group())  # 电影

爬虫数据清洗实战：

python 复制代码

import re

# 场景1：清洗电影标题中的多余空格和特殊字符
raw_title = "  The Shawshank Redemption   (1994)  "
clean_title = re.sub(r'\s+', ' ', raw_title).strip()
# → "The Shawshank Redemption (1994)"

# 场景2：提取年份
year = re.search(r'\((\d{4})\)', raw_title)
print(year.group(1))  # 1994

# 场景3：提取评分（如 "8.5/10" 或 "评分：9.2"）
rating_text = "用户评分：9.2/10"
rating = re.search(r'(\d+\.\d+)/10', rating_text)
print(rating.group(1))  # 9.2

# 场景4：清洗HTML标签
html_text = "<p>这是<b>加粗</b>文本</p>"
plain = re.sub(r'<[^>]+>', '', html_text)
print(plain)  # 这是加粗文本

# 场景5：提取所有URL
text = "访问 https://example.com 或 http://test.org/path?q=1"
urls = re.findall(r'https?://[\w./\-?=&]+', text)
print(urls)  # ['https://example.com', 'http://test.org/path?q=1']

# 场景6：验证手机号格式
phone = "13812345678"
if re.match(r'^1[3-9]\d{9}$', phone):
    print("有效手机号")

贪婪匹配 vs 非贪婪匹配：

python 复制代码

text = "<div>内容1</div><div>内容2</div>"

# 贪婪（默认）：尽可能多匹配
re.findall(r'<div>.*</div>', text)
# → ['<div>内容1</div><div>内容2</div>']

# 非贪婪（加?）：尽可能少匹配
re.findall(r'<div>.*?</div>', text)
# → ['<div>内容1</div>', '<div>内容2</div>']

六、数据分析（入门）

6.1 Pandas 基础

python 复制代码

import pandas as pd

db = pd.DataFrame([
    {'name': 'Alice', 'age': 15},
    {'name': 'Bob', 'age': 17},
    {'name': 'Charlie', 'age': 16}
])

db['age'].max()   # 17
db['age'].min()   # 15
db['age'].mean()  # 16.0

七、Web 框架（FastAPI）

python 复制代码

from fastapi import FastAPI

app = FastAPI()

@app.get("/")
async def root():
    return {"message": "Hello World"}

@app.get("/users")
async def get_users():
    return [
        {"name": "John", "age": 18},
        {"name": "Alice", "age": 28},
    ]

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="127.0.0.1", port=8000)

速查表

分类	核心知识点
数据类型	int, float, bool, str, None, list, tuple, dict, set
格式化	f-string > %s > str拼接
流程控制	if/elif/else, match/case, while, for, break, continue
函数	args, *kwargs, global, 递归, lambda
OOP	init , str , eq, 继承, 多态, 鸭子类型
文件	open(r/w/a), with, json.dump/load, csv.DictWriter/Reader
AI	OpenAI SDK, stream=True, session_state, system_prompt
爬虫	requests.get, lxml.html, XPath, csv保存
正则	re.search, re.findall, re.sub, re.split, 贪婪/非贪婪