项目1:个人博客系统(Flask)
项目简介
使用Flask框架开发一个功能完整的个人博客系统,包括用户认证、文章管理、评论系统等核心功能。
技术栈
- 后端框架:Flask
- 数据库:SQLite / MySQL
- ORM:SQLAlchemy
- 模板引擎:Jinja2
- 表单验证:WTForms
- 用户认证:Flask-Login
- Markdown支持:Flask-Markdown
功能需求
核心功能
- 用户注册、登录、登出
- 文章的增删改查(CRUD)
- Markdown编辑器
- 文章分类和标签
- 评论系统
- 文章搜索
- 分页显示
扩展功能
- 文章点赞/收藏
- 评论回复
- 文件上传(图片)
- RSS订阅
- 管理后台
项目结构
blog/
├── app/
│ ├── __init__.py
│ ├── models.py # 数据模型
│ ├── routes.py # 路由
│ ├── forms.py # 表单
│ ├── templates/ # 模板
│ │ ├── base.html
│ │ ├── index.html
│ │ ├── post.html
│ │ └── login.html
│ └── static/ # 静态文件
│ ├── css/
│ └── js/
├── migrations/ # 数据库迁移
├── config.py # 配置文件
├── requirements.txt # 依赖
└── run.py # 启动文件
核心代码实现
1. 数据模型(models.py)
python
from flask_sqlalchemy import SQLAlchemy
from flask_login import UserMixin
from werkzeug.security import generate_password_hash, check_password_hash
from datetime import datetime
db = SQLAlchemy()
class User(UserMixin, db.Model):
"""用户模型"""
__tablename__ = 'users'
id = db.Column(db.Integer, primary_key=True)
username = db.Column(db.String(80), unique=True, nullable=False)
email = db.Column(db.String(120), unique=True, nullable=False)
password_hash = db.Column(db.String(128))
created_at = db.Column(db.DateTime, default=datetime.utcnow)
# 关系
posts = db.relationship('Post', backref='author', lazy='dynamic')
comments = db.relationship('Comment', backref='author', lazy='dynamic')
def set_password(self, password):
"""设置密码"""
self.password_hash = generate_password_hash(password)
def check_password(self, password):
"""验证密码"""
return check_password_hash(self.password_hash, password)
def __repr__(self):
return f'<User {self.username}>'
# 文章-标签关联表
post_tags = db.Table('post_tags',
db.Column('post_id', db.Integer, db.ForeignKey('posts.id')),
db.Column('tag_id', db.Integer, db.ForeignKey('tags.id'))
)
class Post(db.Model):
"""文章模型"""
__tablename__ = 'posts'
id = db.Column(db.Integer, primary_key=True)
title = db.Column(db.String(200), nullable=False)
content = db.Column(db.Text, nullable=False)
summary = db.Column(db.String(500))
created_at = db.Column(db.DateTime, default=datetime.utcnow)
updated_at = db.Column(db.DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
views = db.Column(db.Integer, default=0)
# 外键
user_id = db.Column(db.Integer, db.ForeignKey('users.id'), nullable=False)
category_id = db.Column(db.Integer, db.ForeignKey('categories.id'))
# 关系
comments = db.relationship('Comment', backref='post', lazy='dynamic', cascade='all, delete-orphan')
tags = db.relationship('Tag', secondary=post_tags, backref=db.backref('posts', lazy='dynamic'))
def __repr__(self):
return f'<Post {self.title}>'
class Category(db.Model):
"""分类模型"""
__tablename__ = 'categories'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(50), unique=True, nullable=False)
posts = db.relationship('Post', backref='category', lazy='dynamic')
class Tag(db.Model):
"""标签模型"""
__tablename__ = 'tags'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(50), unique=True, nullable=False)
class Comment(db.Model):
"""评论模型"""
__tablename__ = 'comments'
id = db.Column(db.Integer, primary_key=True)
content = db.Column(db.Text, nullable=False)
created_at = db.Column(db.DateTime, default=datetime.utcnow)
# 外键
user_id = db.Column(db.Integer, db.ForeignKey('users.id'), nullable=False)
post_id = db.Column(db.Integer, db.ForeignKey('posts.id'), nullable=False)
def __repr__(self):
return f'<Comment {self.id}>'
2. 应用初始化(init.py)
python
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
from flask_login import LoginManager
from flask_migrate import Migrate
from config import Config
db = SQLAlchemy()
login_manager = LoginManager()
migrate = Migrate()
def create_app(config_class=Config):
"""应用工厂"""
app = Flask(__name__)
app.config.from_object(config_class)
# 初始化扩展
db.init_app(app)
login_manager.init_app(app)
migrate.init_app(app, db)
# 配置登录
login_manager.login_view = 'auth.login'
login_manager.login_message = '请先登录'
# 注册蓝图
from app.routes import main, auth, blog
app.register_blueprint(main.bp)
app.register_blueprint(auth.bp)
app.register_blueprint(blog.bp)
return app
@login_manager.user_loader
def load_user(user_id):
"""加载用户"""
from app.models import User
return User.query.get(int(user_id))
3. 路由实现(routes/blog.py)
python
from flask import Blueprint, render_template, redirect, url_for, flash, request, abort
from flask_login import login_required, current_user
from app.models import Post, Category, Tag, Comment, db
from app.forms import PostForm, CommentForm
bp = Blueprint('blog', __name__, url_prefix='/blog')
@bp.route('/')
def index():
"""首页"""
page = request.args.get('page', 1, type=int)
posts = Post.query.order_by(Post.created_at.desc()).paginate(
page=page, per_page=10, error_out=False
)
return render_template('blog/index.html', posts=posts)
@bp.route('/post/<int:post_id>')
def post_detail(post_id):
"""文章详情"""
post = Post.query.get_or_404(post_id)
# 增加浏览量
post.views += 1
db.session.commit()
# 获取评论
comments = post.comments.order_by(Comment.created_at.desc()).all()
return render_template('blog/post_detail.html', post=post, comments=comments)
@bp.route('/post/new', methods=['GET', 'POST'])
@login_required
def new_post():
"""创建文章"""
form = PostForm()
if form.validate_on_submit():
post = Post(
title=form.title.data,
content=form.content.data,
summary=form.summary.data,
category_id=form.category.data,
user_id=current_user.id
)
# 处理标签
tag_names = [name.strip() for name in form.tags.data.split(',')]
for tag_name in tag_names:
tag = Tag.query.filter_by(name=tag_name).first()
if not tag:
tag = Tag(name=tag_name)
db.session.add(tag)
post.tags.append(tag)
db.session.add(post)
db.session.commit()
flash('文章发布成功!', 'success')
return redirect(url_for('blog.post_detail', post_id=post.id))
return render_template('blog/edit_post.html', form=form, title='新建文章')
@bp.route('/post/<int:post_id>/edit', methods=['GET', 'POST'])
@login_required
def edit_post(post_id):
"""编辑文章"""
post = Post.query.get_or_404(post_id)
# 权限检查
if post.author != current_user:
abort(403)
form = PostForm()
if form.validate_on_submit():
post.title = form.title.data
post.content = form.content.data
post.summary = form.summary.data
post.category_id = form.category.data
# 更新标签
post.tags.clear()
tag_names = [name.strip() for name in form.tags.data.split(',')]
for tag_name in tag_names:
tag = Tag.query.filter_by(name=tag_name).first()
if not tag:
tag = Tag(name=tag_name)
db.session.add(tag)
post.tags.append(tag)
db.session.commit()
flash('文章更新成功!', 'success')
return redirect(url_for('blog.post_detail', post_id=post.id))
# 预填充表单
form.title.data = post.title
form.content.data = post.content
form.summary.data = post.summary
form.category.data = post.category_id
form.tags.data = ', '.join([tag.name for tag in post.tags])
return render_template('blog/edit_post.html', form=form, title='编辑文章')
@bp.route('/post/<int:post_id>/delete', methods=['POST'])
@login_required
def delete_post(post_id):
"""删除文章"""
post = Post.query.get_or_404(post_id)
if post.author != current_user:
abort(403)
db.session.delete(post)
db.session.commit()
flash('文章已删除', 'info')
return redirect(url_for('blog.index'))
@bp.route('/post/<int:post_id>/comment', methods=['POST'])
@login_required
def add_comment(post_id):
"""添加评论"""
post = Post.query.get_or_404(post_id)
form = CommentForm()
if form.validate_on_submit():
comment = Comment(
content=form.content.data,
user_id=current_user.id,
post_id=post.id
)
db.session.add(comment)
db.session.commit()
flash('评论发布成功!', 'success')
return redirect(url_for('blog.post_detail', post_id=post_id))
@bp.route('/search')
def search():
"""搜索文章"""
keyword = request.args.get('q', '')
page = request.args.get('page', 1, type=int)
if keyword:
posts = Post.query.filter(
Post.title.contains(keyword) | Post.content.contains(keyword)
).order_by(Post.created_at.desc()).paginate(
page=page, per_page=10, error_out=False
)
else:
posts = None
return render_template('blog/search.html', posts=posts, keyword=keyword)
4. 表单定义(forms.py)
python
from flask_wtf import FlaskForm
from wtforms import StringField, TextAreaField, SelectField, SubmitField
from wtforms.validators import DataRequired, Length
class PostForm(FlaskForm):
"""文章表单"""
title = StringField('标题', validators=[
DataRequired(message='标题不能为空'),
Length(min=1, max=200, message='标题长度为1-200字符')
])
summary = TextAreaField('摘要', validators=[
Length(max=500, message='摘要最多500字符')
])
content = TextAreaField('内容', validators=[
DataRequired(message='内容不能为空')
])
category = SelectField('分类', coerce=int)
tags = StringField('标签', validators=[
Length(max=100, message='标签最多100字符')
])
submit = SubmitField('发布')
class CommentForm(FlaskForm):
"""评论表单"""
content = TextAreaField('评论内容', validators=[
DataRequired(message='评论不能为空'),
Length(min=1, max=500, message='评论长度为1-500字符')
])
submit = SubmitField('提交')
部署说明
bash
# 1. 安装依赖
pip install -r requirements.txt
# 2. 初始化数据库
flask db init
flask db migrate -m "Initial migration"
flask db upgrade
# 3. 运行应用
flask run
项目2:RESTful API服务(FastAPI)
项目简介
使用FastAPI开发一个高性能的任务管理API服务,包含完整的CRUD操作、用户认证、权限控制等功能。
技术栈
- 框架:FastAPI
- ORM:SQLAlchemy
- 数据验证:Pydantic
- 认证:JWT Token
- 数据库:PostgreSQL / SQLite
- 文档:自动生成的Swagger UI
项目结构
fastapi-todo/
├── app/
│ ├── __init__.py
│ ├── main.py # 主应用
│ ├── database.py # 数据库配置
│ ├── models.py # SQLAlchemy模型
│ ├── schemas.py # Pydantic模型
│ ├── crud.py # 数据库操作
│ ├── auth.py # 认证相关
│ └── routers/ # 路由
│ ├── users.py
│ └── tasks.py
├── tests/ # 测试
├── requirements.txt
└── .env # 环境变量
核心代码
1. Pydantic模型(schemas.py)
python
from pydantic import BaseModel, EmailStr, Field
from typing import Optional, List
from datetime import datetime
from enum import Enum
class TaskStatus(str, Enum):
"""任务状态"""
TODO = "todo"
IN_PROGRESS = "in_progress"
DONE = "done"
class TaskPriority(str, Enum):
"""优先级"""
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
# 任务Schema
class TaskBase(BaseModel):
title: str = Field(..., min_length=1, max_length=200)
description: Optional[str] = None
status: TaskStatus = TaskStatus.TODO
priority: TaskPriority = TaskPriority.MEDIUM
due_date: Optional[datetime] = None
class TaskCreate(TaskBase):
pass
class TaskUpdate(BaseModel):
title: Optional[str] = Field(None, min_length=1, max_length=200)
description: Optional[str] = None
status: Optional[TaskStatus] = None
priority: Optional[TaskPriority] = None
due_date: Optional[datetime] = None
class Task(TaskBase):
id: int
user_id: int
created_at: datetime
updated_at: datetime
class Config:
orm_mode = True
# 用户Schema
class UserBase(BaseModel):
email: EmailStr
username: str = Field(..., min_length=3, max_length=50)
class UserCreate(UserBase):
password: str = Field(..., min_length=6)
class User(UserBase):
id: int
is_active: bool
created_at: datetime
class Config:
orm_mode = True
class UserInDB(User):
hashed_password: str
# Token Schema
class Token(BaseModel):
access_token: str
token_type: str
class TokenData(BaseModel):
username: Optional[str] = None
2. 认证模块(auth.py)
python
import os
import logging
from datetime import datetime, timedelta
from typing import Optional, Dict, Any
from jose import JWTError, jwt
from passlib.context import CryptContext
from fastapi import Depends, HTTPException, status
from fastapi.security import OAuth2PasswordBearer
from sqlalchemy.orm import Session
# 配置日志
logger = logging.getLogger(__name__)
# 安全配置(从环境变量读取,提供开发环境默认值)
# ⚠️ 生产环境必须设置环境变量 SECRET_KEY
SECRET_KEY = os.getenv('SECRET_KEY', 'dev-secret-key-CHANGE-IN-PRODUCTION')
ALGORITHM = os.getenv('JWT_ALGORITHM', 'HS256')
ACCESS_TOKEN_EXPIRE_MINUTES = int(os.getenv('ACCESS_TOKEN_EXPIRE_MINUTES', '30'))
# 密码加密上下文
pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
# OAuth2密码模式
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token")
# 验证生产环境配置
if SECRET_KEY == 'dev-secret-key-CHANGE-IN-PRODUCTION':
logger.warning("⚠️ 使用默认SECRET_KEY,生产环境必须修改!")
def verify_password(plain_password: str, hashed_password: str) -> bool:
"""
验证密码
Args:
plain_password: 明文密码
hashed_password: 哈希后的密码
Returns:
密码是否匹配
"""
return pwd_context.verify(plain_password, hashed_password)
def get_password_hash(password: str) -> str:
"""
生成密码哈希
Args:
password: 明文密码
Returns:
哈希后的密码
"""
return pwd_context.hash(password)
def create_access_token(data: Dict[str, Any], expires_delta: Optional[timedelta] = None) -> str:
"""
创建JWT访问令牌
Args:
data: 要编码到令牌中的数据(通常包含用户标识)
expires_delta: 令牌有效期,None则使用默认15分钟
Returns:
JWT令牌字符串
Example:
>>> token = create_access_token({"sub": "username"})
"""
to_encode = data.copy()
# 设置过期时间
if expires_delta:
expire = datetime.utcnow() + expires_delta
else:
expire = datetime.utcnow() + timedelta(minutes=15)
to_encode.update({"exp": expire})
# 生成JWT令牌
encoded_jwt = jwt.encode(to_encode, SECRET_KEY, algorithm=ALGORITHM)
logger.debug(f"创建JWT令牌,用户: {data.get('sub')}, 过期时间: {expire}")
return encoded_jwt
async def get_current_user(
token: str = Depends(oauth2_scheme),
db: Session = Depends(get_db)
):
"""
从JWT令牌获取当前登录用户
Args:
token: JWT令牌(从请求头自动提取)
db: 数据库会话(依赖注入)
Returns:
当前用户对象
Raises:
HTTPException: 令牌无效或用户不存在
"""
credentials_exception = HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="无法验证凭据",
headers={"WWW-Authenticate": "Bearer"},
)
try:
# 解码JWT令牌
payload = jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM])
username: str = payload.get("sub")
if username is None:
logger.warning("JWT令牌中缺少用户标识")
raise credentials_exception
except JWTError as e:
logger.error(f"JWT验证失败: {e}")
raise credentials_exception
# 从数据库查询用户
user = crud.get_user_by_username(db, username=username)
if user is None:
logger.warning(f"用户不存在: {username}")
raise credentials_exception
logger.debug(f"用户认证成功: {username}")
return user
async def get_current_active_user(
current_user: User = Depends(get_current_user)
):
"""获取当前活跃用户"""
if not current_user.is_active:
raise HTTPException(status_code=400, detail="Inactive user")
return current_user
3. API路由(routers/tasks.py)
python
from fastapi import APIRouter, Depends, HTTPException, status, Query
from sqlalchemy.orm import Session
from typing import List, Optional
from app import schemas, crud, models
from app.database import get_db
from app.auth import get_current_active_user
router = APIRouter(
prefix="/tasks",
tags=["tasks"]
)
@router.post("/", response_model=schemas.Task, status_code=status.HTTP_201_CREATED)
def create_task(
task: schemas.TaskCreate,
db: Session = Depends(get_db),
current_user: models.User = Depends(get_current_active_user)
):
"""创建任务"""
return crud.create_task(db=db, task=task, user_id=current_user.id)
@router.get("/", response_model=List[schemas.Task])
def read_tasks(
skip: int = 0,
limit: int = Query(default=100, le=100),
status: Optional[schemas.TaskStatus] = None,
priority: Optional[schemas.TaskPriority] = None,
db: Session = Depends(get_db),
current_user: models.User = Depends(get_current_active_user)
):
"""获取任务列表"""
tasks = crud.get_tasks(
db=db,
user_id=current_user.id,
skip=skip,
limit=limit,
status=status,
priority=priority
)
return tasks
@router.get("/{task_id}", response_model=schemas.Task)
def read_task(
task_id: int,
db: Session = Depends(get_db),
current_user: models.User = Depends(get_current_active_user)
):
"""获取单个任务"""
task = crud.get_task(db=db, task_id=task_id, user_id=current_user.id)
if task is None:
raise HTTPException(status_code=404, detail="Task not found")
return task
@router.put("/{task_id}", response_model=schemas.Task)
def update_task(
task_id: int,
task_update: schemas.TaskUpdate,
db: Session = Depends(get_db),
current_user: models.User = Depends(get_current_active_user)
):
"""更新任务"""
task = crud.update_task(
db=db,
task_id=task_id,
task_update=task_update,
user_id=current_user.id
)
if task is None:
raise HTTPException(status_code=404, detail="Task not found")
return task
@router.delete("/{task_id}", status_code=status.HTTP_204_NO_CONTENT)
def delete_task(
task_id: int,
db: Session = Depends(get_db),
current_user: models.User = Depends(get_current_active_user)
):
"""删除任务"""
success = crud.delete_task(db=db, task_id=task_id, user_id=current_user.id)
if not success:
raise HTTPException(status_code=404, detail="Task not found")
return None
@router.get("/statistics/summary")
def get_task_statistics(
db: Session = Depends(get_db),
current_user: models.User = Depends(get_current_active_user)
):
"""获取任务统计"""
return crud.get_task_statistics(db=db, user_id=current_user.id)
测试
python
# tests/test_tasks.py
from fastapi.testclient import TestClient
from app.main import app
client = TestClient(app)
def test_create_task():
response = client.post(
"/tasks/",
json={"title": "Test Task", "status": "todo"},
headers={"Authorization": f"Bearer {token}"}
)
assert response.status_code == 201
assert response.json()["title"] == "Test Task"
项目3:新闻爬虫系统
项目简介
开发一个新闻爬虫系统,自动抓取多个新闻网站的文章,存储到数据库,并提供数据分析功能。
技术栈
- 爬虫框架:Scrapy
- HTTP库:requests, aiohttp
- 解析库:BeautifulSoup4, lxml
- 数据库:MongoDB / MySQL
- 任务调度:APScheduler
- 数据分析:pandas
核心代码
python
import scrapy
from scrapy.crawler import CrawlerProcess
import requests
from bs4 import BeautifulSoup
from datetime import datetime
import pymongo
class NewsSpider(scrapy.Spider):
"""新闻爬虫"""
name = 'news_spider'
allowed_domains = ['example.com']
start_urls = ['https://example.com/news']
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# MongoDB连接
self.client = pymongo.MongoClient('mongodb://localhost:27017/')
self.db = self.client['news_db']
self.collection = self.db['articles']
def parse(self, response):
"""解析新闻列表页"""
# 提取文章链接
for article in response.css('article.news-item'):
article_url = article.css('a.title::attr(href)').get()
if article_url:
yield response.follow(article_url, self.parse_article)
# 翻页
next_page = response.css('a.next-page::attr(href)').get()
if next_page:
yield response.follow(next_page, self.parse)
def parse_article(self, response):
"""解析文章详情页"""
article = {
'url': response.url,
'title': response.css('h1.article-title::text').get(),
'author': response.css('span.author::text').get(),
'publish_date': response.css('time::attr(datetime)').get(),
'content': ' '.join(response.css('div.content p::text').getall()),
'tags': response.css('a.tag::text').getall(),
'crawled_at': datetime.now()
}
# 保存到数据库
self.collection.insert_one(article)
yield article
class NewsAggregator:
"""新闻聚合器"""
def __init__(self):
self.client = pymongo.MongoClient('mongodb://localhost:27017/')
self.db = self.client['news_db']
self.collection = self.db['articles']
def get_latest_news(self, limit=10):
"""获取最新新闻"""
return list(self.collection.find().sort('publish_date', -1).limit(limit))
def search_news(self, keyword):
"""搜索新闻"""
return list(self.collection.find({
'$or': [
{'title': {'$regex': keyword, '$options': 'i'}},
{'content': {'$regex': keyword, '$options': 'i'}}
]
}))
def get_news_by_tag(self, tag):
"""按标签获取新闻"""
return list(self.collection.find({'tags': tag}))
def analyze_hot_topics(self):
"""分析热门话题"""
from collections import Counter
# 统计所有标签
all_tags = []
for article in self.collection.find():
all_tags.extend(article.get('tags', []))
counter = Counter(all_tags)
return counter.most_common(10)
项目4:数据可视化Dashboard
项目简介
使用Streamlit或Dash创建一个交互式数据可视化仪表板。
技术栈
- 框架:Streamlit / Dash
- 数据处理:pandas, numpy
- 可视化:plotly, matplotlib
- 数据源:CSV, 数据库, API
核心代码
python
import streamlit as st
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
# 页面配置
st.set_page_config(
page_title="销售数据分析",
page_icon="📊",
layout="wide"
)
# 标题
st.title("📊 销售数据分析Dashboard")
# 侧边栏
st.sidebar.header("筛选条件")
# 加载数据
@st.cache_data
def load_data():
# 这里可以从数据库或API加载数据
df = pd.read_csv('sales_data.csv')
df['date'] = pd.to_datetime(df['date'])
return df
df = load_data()
# 日期筛选
date_range = st.sidebar.date_input(
"选择日期范围",
[df['date'].min(), df['date'].max()]
)
# 地区筛选
regions = st.sidebar.multiselect(
"选择地区",
options=df['region'].unique(),
default=df['region'].unique()
)
# 过滤数据
filtered_df = df[
(df['date'] >= pd.to_datetime(date_range[0])) &
(df['date'] <= pd.to_datetime(date_range[1])) &
(df['region'].isin(regions))
]
# KPI指标
col1, col2, col3, col4 = st.columns(4)
with col1:
st.metric(
label="总销售额",
value=f"¥{filtered_df['amount'].sum():,.0f}",
delta=f"{filtered_df['amount'].sum() / df['amount'].sum() * 100:.1f}%"
)
with col2:
st.metric(
label="订单数",
value=f"{len(filtered_df):,}",
delta=f"{len(filtered_df) / len(df) * 100:.1f}%"
)
with col3:
st.metric(
label="平均订单金额",
value=f"¥{filtered_df['amount'].mean():,.0f}"
)
with col4:
st.metric(
label="客户数",
value=f"{filtered_df['customer_id'].nunique():,}"
)
# 图表
col1, col2 = st.columns(2)
with col1:
# 销售趋势
st.subheader("销售趋势")
daily_sales = filtered_df.groupby('date')['amount'].sum().reset_index()
fig = px.line(
daily_sales,
x='date',
y='amount',
title='每日销售额'
)
st.plotly_chart(fig, use_container_width=True)
with col2:
# 地区分布
st.subheader("地区分布")
region_sales = filtered_df.groupby('region')['amount'].sum().reset_index()
fig = px.pie(
region_sales,
values='amount',
names='region',
title='各地区销售占比'
)
st.plotly_chart(fig, use_container_width=True)
# 产品排行
st.subheader("产品销售排行")
product_sales = filtered_df.groupby('product')['amount'].sum().sort_values(ascending=False).head(10)
fig = px.bar(
product_sales,
orientation='h',
title='Top 10 产品'
)
st.plotly_chart(fig, use_container_width=True)
# 数据表
st.subheader("详细数据")
st.dataframe(filtered_df, use_container_width=True)
# 下载数据
csv = filtered_df.to_csv(index=False).encode('utf-8')
st.download_button(
label="下载CSV",
data=csv,
file_name='filtered_sales_data.csv',
mime='text/csv'
)
项目5:命令行工具(CLI)
项目简介
使用Click或argparse开发专业的命令行工具。
核心代码
python
import click
import requests
from pathlib import Path
import json
@click.group()
@click.version_option(version='1.0.0')
def cli():
"""我的CLI工具"""
pass
@cli.command()
@click.argument('url')
@click.option('--output', '-o', default='output.html', help='输出文件名')
@click.option('--verbose', '-v', is_flag=True, help='详细输出')
def download(url, output, verbose):
"""下载网页内容"""
if verbose:
click.echo(f'正在下载: {url}')
try:
response = requests.get(url)
response.raise_for_status()
Path(output).write_text(response.text, encoding='utf-8')
click.secho(f'✓ 下载成功: {output}', fg='green')
except Exception as e:
click.secho(f'✗ 下载失败: {e}', fg='red', err=True)
@cli.command()
@click.argument('directory', type=click.Path(exists=True))
@click.option('--extension', '-e', default='.txt', help='文件扩展名')
def count_files(directory, extension):
"""统计文件数量"""
path = Path(directory)
files = list(path.rglob(f'*{extension}'))
click.echo(f'找到 {len(files)} 个 {extension} 文件')
for file in files:
click.echo(f' - {file}')
if __name__ == '__main__':
cli()
项目6:Jupyter Notebook数据分析
项目简介
使用Jupyter Notebook进行完整的数据分析流程,包括数据加载、清洗、探索性分析、可视化和简单的机器学习建模。
技术栈
- 环境:Jupyter Notebook / JupyterLab
- 数据处理:pandas, numpy
- 可视化:matplotlib, seaborn, plotly
- 机器学习:scikit-learn
- 统计分析:scipy, statsmodels
功能需求
核心功能
- 数据加载和预览
- 数据清洗(缺失值、重复值、异常值处理)
- 探索性数据分析(EDA)
- 数据可视化
- 特征工程
- 简单的预测模型
核心代码实现
1. 环境配置和数据加载
python
# notebook_analysis.ipynb
# 导入必要的库
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import warnings
warnings.filterwarnings('ignore')
# 设置可视化样式
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
%matplotlib inline
# 设置显示选项
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)
pd.set_option('display.float_format', '{:.2f}'.format)
print("✅ 环境配置完成!")
2. 数据加载和初步探索
python
# 加载数据(以房价数据为例)
def load_and_preview_data(file_path):
"""
加载数据并进行初步预览
"""
# 读取数据
df = pd.read_csv(file_path)
print("=" * 60)
print("📊 数据基本信息")
print("=" * 60)
print(f"数据形状: {df.shape}")
print(f"行数: {df.shape[0]:,}, 列数: {df.shape[1]}")
print("\n" + "=" * 60)
print("📋 前5行数据")
print("=" * 60)
display(df.head())
print("\n" + "=" * 60)
print("ℹ️ 数据类型")
print("=" * 60)
display(df.dtypes)
print("\n" + "=" * 60)
print("📈 统计摘要")
print("=" * 60)
display(df.describe())
print("\n" + "=" * 60)
print("⚠️ 缺失值统计")
print("=" * 60)
missing = df.isnull().sum()
missing_pct = 100 * missing / len(df)
missing_table = pd.DataFrame({
'缺失数量': missing,
'缺失比例(%)': missing_pct
})
display(missing_table[missing_table['缺失数量'] > 0].sort_values('缺失数量', ascending=False))
return df
# 使用示例
df = load_and_preview_data('house_prices.csv')
3. 数据清洗函数
python
def clean_data(df):
"""
数据清洗流程
"""
print("🧹 开始数据清洗...\n")
# 1. 删除完全重复的行
before = len(df)
df = df.drop_duplicates()
after = len(df)
print(f"✓ 删除重复行: {before - after} 行")
# 2. 处理缺失值
# 数值型:用中位数填充
numeric_cols = df.select_dtypes(include=[np.number]).columns
for col in numeric_cols:
if df[col].isnull().sum() > 0:
median_val = df[col].median()
df[col].fillna(median_val, inplace=True)
print(f"✓ {col}: 用中位数 {median_val:.2f} 填充缺失值")
# 分类型:用众数填充
categorical_cols = df.select_dtypes(include=['object']).columns
for col in categorical_cols:
if df[col].isnull().sum() > 0:
mode_val = df[col].mode()[0]
df[col].fillna(mode_val, inplace=True)
print(f"✓ {col}: 用众数 '{mode_val}' 填充缺失值")
# 3. 检测和处理异常值(使用IQR方法)
for col in numeric_cols:
Q1 = df[col].quantile(0.25)
Q3 = df[col].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = df[(df[col] < lower_bound) | (df[col] > upper_bound)]
if len(outliers) > 0:
# 用边界值替换异常值
df[col] = df[col].clip(lower_bound, upper_bound)
print(f"✓ {col}: 处理了 {len(outliers)} 个异常值")
print(f"\n✅ 数据清洗完成!最终数据形状: {df.shape}")
return df
# 清洗数据
df_clean = clean_data(df.copy())
4. 探索性数据分析(EDA)
python
def exploratory_analysis(df, target_col='price'):
"""
探索性数据分析
"""
# 1. 目标变量分布
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# 直方图
axes[0].hist(df[target_col], bins=50, edgecolor='black', alpha=0.7)
axes[0].set_title(f'{target_col} 分布(直方图)', fontsize=14)
axes[0].set_xlabel(target_col)
axes[0].set_ylabel('频数')
axes[0].grid(True, alpha=0.3)
# 箱线图
axes[1].boxplot(df[target_col], vert=True)
axes[1].set_title(f'{target_col} 分布(箱线图)', fontsize=14)
axes[1].set_ylabel(target_col)
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 2. 相关性热力图
numeric_df = df.select_dtypes(include=[np.number])
plt.figure(figsize=(12, 10))
correlation_matrix = numeric_df.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0,
fmt='.2f', square=True, linewidths=1, cbar_kws={"shrink": 0.8})
plt.title('特征相关性热力图', fontsize=16, pad=20)
plt.tight_layout()
plt.show()
# 3. 分类变量分析
categorical_cols = df.select_dtypes(include=['object']).columns[:4] # 取前4个
if len(categorical_cols) > 0:
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.ravel()
for idx, col in enumerate(categorical_cols):
value_counts = df[col].value_counts().head(10)
axes[idx].barh(value_counts.index, value_counts.values)
axes[idx].set_title(f'{col} 分布', fontsize=12)
axes[idx].set_xlabel('数量')
axes[idx].grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.show()
# 4. 数值特征与目标变量的关系
numeric_features = numeric_df.columns.drop(target_col)[:6] # 取前6个
fig, axes = plt.subplots(2, 3, figsize=(16, 10))
axes = axes.ravel()
for idx, feature in enumerate(numeric_features):
axes[idx].scatter(df[feature], df[target_col], alpha=0.5)
axes[idx].set_xlabel(feature)
axes[idx].set_ylabel(target_col)
axes[idx].set_title(f'{feature} vs {target_col}')
axes[idx].grid(True, alpha=0.3)
# 添加趋势线
z = np.polyfit(df[feature], df[target_col], 1)
p = np.poly1d(z)
axes[idx].plot(df[feature], p(df[feature]), "r--", alpha=0.8, linewidth=2)
plt.tight_layout()
plt.show()
# 执行探索性分析
exploratory_analysis(df_clean, target_col='price')
5. 特征工程
python
def feature_engineering(df):
"""
特征工程
"""
print("🔧 开始特征工程...\n")
df_fe = df.copy()
# 1. 创建新特征
# 例如:如果有面积和房间数,创建人均面积
if 'area' in df_fe.columns and 'bedrooms' in df_fe.columns:
df_fe['area_per_bedroom'] = df_fe['area'] / (df_fe['bedrooms'] + 1)
print("✓ 创建特征: area_per_bedroom")
# 2. 对数变换(适用于偏态分布)
numeric_cols = df_fe.select_dtypes(include=[np.number]).columns
for col in numeric_cols:
if df_fe[col].min() > 0: # 确保值为正
skewness = df_fe[col].skew()
if abs(skewness) > 1: # 偏度大于1
df_fe[f'{col}_log'] = np.log1p(df_fe[col])
print(f"✓ 对 {col} 进行对数变换(偏度: {skewness:.2f})")
# 3. 独热编码
categorical_cols = df_fe.select_dtypes(include=['object']).columns
if len(categorical_cols) > 0:
df_fe = pd.get_dummies(df_fe, columns=categorical_cols, drop_first=True)
print(f"✓ 对 {len(categorical_cols)} 个分类变量进行独热编码")
print(f"\n✅ 特征工程完成!新的特征数: {df_fe.shape[1]}")
return df_fe
# 应用特征工程
df_featured = feature_engineering(df_clean)
6. 建模和评估
python
def build_and_evaluate_model(df, target_col='price'):
"""
构建和评估模型
"""
print("🤖 开始建模...\n")
# 准备数据
X = df.drop(columns=[target_col])
y = df[target_col]
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# 标准化
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# 训练模型
model = LinearRegression()
model.fit(X_train_scaled, y_train)
# 预测
y_train_pred = model.predict(X_train_scaled)
y_test_pred = model.predict(X_test_scaled)
# 评估
train_r2 = r2_score(y_train, y_train_pred)
test_r2 = r2_score(y_test, y_test_pred)
train_rmse = np.sqrt(mean_squared_error(y_train, y_train_pred))
test_rmse = np.sqrt(mean_squared_error(y_test, y_test_pred))
print("📊 模型评估结果")
print("=" * 60)
print(f"训练集 R²: {train_r2:.4f}")
print(f"测试集 R²: {test_r2:.4f}")
print(f"训练集 RMSE: {train_rmse:,.2f}")
print(f"测试集 RMSE: {test_rmse:,.2f}")
# 可视化预测结果
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# 训练集
axes[0].scatter(y_train, y_train_pred, alpha=0.5)
axes[0].plot([y_train.min(), y_train.max()],
[y_train.min(), y_train.max()], 'r--', lw=2)
axes[0].set_xlabel('实际值')
axes[0].set_ylabel('预测值')
axes[0].set_title(f'训练集预测 (R²={train_r2:.4f})')
axes[0].grid(True, alpha=0.3)
# 测试集
axes[1].scatter(y_test, y_test_pred, alpha=0.5, color='green')
axes[1].plot([y_test.min(), y_test.max()],
[y_test.min(), y_test.max()], 'r--', lw=2)
axes[1].set_xlabel('实际值')
axes[1].set_ylabel('预测值')
axes[1].set_title(f'测试集预测 (R²={test_r2:.4f})')
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 特征重要性
feature_importance = pd.DataFrame({
'feature': X.columns,
'importance': abs(model.coef_)
}).sort_values('importance', ascending=False).head(10)
plt.figure(figsize=(10, 6))
plt.barh(feature_importance['feature'], feature_importance['importance'])
plt.xlabel('重要性(绝对系数值)')
plt.title('Top 10 特征重要性')
plt.gca().invert_yaxis()
plt.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.show()
return model, scaler
# 建模
model, scaler = build_and_evaluate_model(df_featured, target_col='price')
项目7:Excel自动化工具
项目简介
批量处理Excel文件,自动生成报表和图表。能够读取、分析、处理Excel数据,并自动创建专业的报表和可视化图表。
技术栈
- Excel处理:openpyxl, xlrd, xlwt
- 数据处理:pandas
- 日期处理:datetime
- 文件操作:pathlib
核心功能实现
1. 环境准备与导入
python
import pandas as pd
import openpyxl
from openpyxl import Workbook, load_workbook
from openpyxl.chart import BarChart, PieChart, LineChart, Reference
from openpyxl.styles import Font, Alignment, PatternFill, Border, Side
from openpyxl.utils import get_column_letter
from pathlib import Path
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')
print("📦 Excel自动化工具 - 环境准备完成")
2. Excel文件读取和预览
python
class ExcelReader:
"""Excel文件读取器"""
def __init__(self, file_path):
self.file_path = Path(file_path)
self.wb = None
self.df_dict = {}
def load_workbook(self):
"""加载工作簿"""
if self.file_path.suffix in ['.xlsx', '.xlsm']:
self.wb = load_workbook(self.file_path, data_only=True)
print(f"✓ 成功加载工作簿: {self.file_path.name}")
print(f" 工作表列表: {self.wb.sheetnames}")
else:
raise ValueError("仅支持 .xlsx 和 .xlsm 格式")
return self
def read_sheet(self, sheet_name=None, header_row=0):
"""读取指定工作表到DataFrame"""
if sheet_name is None:
sheet_name = self.wb.sheetnames[0]
df = pd.read_excel(self.file_path, sheet_name=sheet_name, header=header_row)
self.df_dict[sheet_name] = df
print(f"\n📋 工作表: {sheet_name}")
print(f" 形状: {df.shape}")
print(f" 列名: {list(df.columns)[:5]}...")
print(f"\n前3行预览:")
print(df.head(3))
return df
def get_sheet_info(self, sheet_name):
"""获取工作表详细信息"""
ws = self.wb[sheet_name]
info = {
'工作表名': sheet_name,
'行数': ws.max_row,
'列数': ws.max_column,
'使用范围': f"{ws.dimensions}"
}
return info
# 使用示例
reader = ExcelReader('sales_data.xlsx')
reader.load_workbook()
df = reader.read_sheet('销售数据')
3. 数据处理和分析
python
class ExcelDataProcessor:
"""Excel数据处理器"""
def __init__(self, df):
self.df = df.copy()
def clean_data(self):
"""数据清洗"""
print("\n🧹 开始数据清洗...")
# 删除完全空白的行
before = len(self.df)
self.df.dropna(how='all', inplace=True)
print(f"✓ 删除空行: {before - len(self.df)} 行")
# 删除重复行
before = len(self.df)
self.df.drop_duplicates(inplace=True)
print(f"✓ 删除重复行: {before - len(self.df)} 行")
# 处理日期列
date_cols = [col for col in self.df.columns if '日期' in col or 'date' in col.lower()]
for col in date_cols:
self.df[col] = pd.to_datetime(self.df[col], errors='coerce')
print(f"✓ 转换日期列: {col}")
return self
def calculate_summary(self, group_by_col, value_col):
"""计算汇总统计"""
summary = self.df.groupby(group_by_col)[value_col].agg([
('总计', 'sum'),
('平均', 'mean'),
('最大', 'max'),
('最小', 'min'),
('计数', 'count')
]).round(2)
print(f"\n📊 {group_by_col} 的 {value_col} 汇总:")
print(summary)
return summary
def create_pivot_table(self, index, columns, values, aggfunc='sum'):
"""创建数据透视表"""
pivot = pd.pivot_table(
self.df,
index=index,
columns=columns,
values=values,
aggfunc=aggfunc,
fill_value=0
)
print(f"\n📈 数据透视表 ({index} x {columns}):")
print(pivot.head())
return pivot
def add_calculated_columns(self):
"""添加计算列"""
# 示例:如果有销售额和数量,计算单价
if '销售额' in self.df.columns and '数量' in self.df.columns:
self.df['单价'] = (self.df['销售额'] / self.df['数量']).round(2)
print("✓ 添加计算列: 单价")
# 示例:提取月份
date_cols = [col for col in self.df.columns if self.df[col].dtype == 'datetime64[ns]']
for col in date_cols:
self.df[f'{col}_年月'] = self.df[col].dt.to_period('M').astype(str)
print(f"✓ 添加月份列: {col}_年月")
return self
# 使用示例
processor = ExcelDataProcessor(df)
processor.clean_data().add_calculated_columns()
summary = processor.calculate_summary('产品', '销售额')
4. Excel报表生成器
python
class ExcelReportGenerator:
"""Excel报表生成器"""
def __init__(self, output_path='report.xlsx'):
self.wb = Workbook()
self.wb.remove(self.wb.active) # 删除默认sheet
self.output_path = Path(output_path)
# 定义样式
self.header_font = Font(name='微软雅黑', size=11, bold=True, color='FFFFFF')
self.header_fill = PatternFill(start_color='4472C4', end_color='4472C4', fill_type='solid')
self.header_alignment = Alignment(horizontal='center', vertical='center')
self.cell_border = Border(
left=Side(style='thin', color='D3D3D3'),
right=Side(style='thin', color='D3D3D3'),
top=Side(style='thin', color='D3D3D3'),
bottom=Side(style='thin', color='D3D3D3')
)
def create_data_sheet(self, sheet_name, df, add_summary=True):
"""创建数据工作表"""
ws = self.wb.create_sheet(sheet_name)
# 写入标题行
for col_idx, column in enumerate(df.columns, 1):
cell = ws.cell(row=1, column=col_idx, value=column)
cell.font = self.header_font
cell.fill = self.header_fill
cell.alignment = self.header_alignment
cell.border = self.cell_border
# 写入数据
for row_idx, row_data in enumerate(df.values, 2):
for col_idx, value in enumerate(row_data, 1):
cell = ws.cell(row=row_idx, column=col_idx, value=value)
cell.border = self.cell_border
# 数值格式化
if isinstance(value, (int, float)):
cell.number_format = '#,##0.00'
# 自动调整列宽
for col_idx in range(1, len(df.columns) + 1):
col_letter = get_column_letter(col_idx)
max_length = max(
len(str(df.columns[col_idx - 1])),
df.iloc[:, col_idx - 1].astype(str).str.len().max()
)
ws.column_dimensions[col_letter].width = min(max_length + 2, 50)
# 冻结首行
ws.freeze_panes = 'A2'
# 添加汇总行
if add_summary and len(df) > 0:
summary_row = len(df) + 2
ws.cell(row=summary_row, column=1, value='合计').font = Font(bold=True)
for col_idx in range(2, len(df.columns) + 1):
if df.iloc[:, col_idx - 1].dtype in ['int64', 'float64']:
cell = ws.cell(row=summary_row, column=col_idx)
cell.value = f"=SUM({get_column_letter(col_idx)}2:{get_column_letter(col_idx)}{len(df) + 1})"
cell.font = Font(bold=True)
cell.number_format = '#,##0.00'
print(f"✓ 创建数据表: {sheet_name} ({df.shape[0]}行 x {df.shape[1]}列)")
return ws
def add_bar_chart(self, ws, data_range, chart_title, position='H2'):
"""添加柱状图"""
chart = BarChart()
chart.title = chart_title
chart.style = 10
chart.width = 15
chart.height = 10
data = Reference(ws, min_col=data_range['min_col'], min_row=data_range['min_row'],
max_col=data_range['max_col'], max_row=data_range['max_row'])
cats = Reference(ws, min_col=data_range['cat_col'], min_row=data_range['min_row'] + 1,
max_row=data_range['max_row'])
chart.add_data(data, titles_from_data=True)
chart.set_categories(cats)
ws.add_chart(chart, position)
print(f"✓ 添加柱状图: {chart_title}")
def add_pie_chart(self, ws, data_range, chart_title, position='H18'):
"""添加饼图"""
chart = PieChart()
chart.title = chart_title
chart.width = 15
chart.height = 10
data = Reference(ws, min_col=data_range['min_col'], min_row=data_range['min_row'],
max_row=data_range['max_row'])
cats = Reference(ws, min_col=data_range['cat_col'], min_row=data_range['min_row'] + 1,
max_row=data_range['max_row'])
chart.add_data(data, titles_from_data=True)
chart.set_categories(cats)
ws.add_chart(chart, position)
print(f"✓ 添加饼图: {chart_title}")
def save(self):
"""保存工作簿"""
self.wb.save(self.output_path)
print(f"\n💾 报表已保存: {self.output_path.absolute()}")
# 使用示例
report = ExcelReportGenerator('销售报表.xlsx')
# 创建数据表
ws = report.create_data_sheet('销售汇总', summary)
# 添加图表
chart_range = {
'min_col': 2, 'max_col': 2,
'min_row': 1, 'max_row': len(summary) + 1,
'cat_col': 1
}
report.add_bar_chart(ws, chart_range, '销售额对比', 'E2')
report.add_pie_chart(ws, chart_range, '销售额占比', 'E18')
report.save()
5. 批量处理多个Excel文件
python
class BatchExcelProcessor:
"""批量Excel文件处理器"""
def __init__(self, input_folder, output_folder='output'):
self.input_folder = Path(input_folder)
self.output_folder = Path(output_folder)
self.output_folder.mkdir(exist_ok=True)
# 查找所有Excel文件
self.excel_files = list(self.input_folder.glob('*.xlsx')) + \
list(self.input_folder.glob('*.xls'))
print(f"📁 找到 {len(self.excel_files)} 个Excel文件")
def merge_all_files(self, sheet_name=0):
"""合并所有Excel文件"""
print("\n🔄 开始合并文件...")
all_data = []
for file_path in self.excel_files:
try:
df = pd.read_excel(file_path, sheet_name=sheet_name)
df['源文件'] = file_path.name
all_data.append(df)
print(f"✓ 读取: {file_path.name} ({len(df)} 行)")
except Exception as e:
print(f"✗ 跳过: {file_path.name} - {e}")
if all_data:
merged_df = pd.concat(all_data, ignore_index=True)
output_path = self.output_folder / f'合并结果_{datetime.now():%Y%m%d_%H%M%S}.xlsx'
merged_df.to_excel(output_path, index=False)
print(f"\n✅ 合并完成: {len(merged_df)} 行数据")
print(f"💾 保存到: {output_path}")
return merged_df
else:
print("❌ 没有可合并的数据")
return None
def process_each_file(self, process_func):
"""对每个文件应用处理函数"""
print("\n🔄 批量处理文件...")
results = []
for file_path in self.excel_files:
try:
df = pd.read_excel(file_path)
processed_df = process_func(df)
output_path = self.output_folder / f"processed_{file_path.name}"
processed_df.to_excel(output_path, index=False)
results.append({
'文件': file_path.name,
'状态': '成功',
'行数': len(processed_df)
})
print(f"✓ 处理: {file_path.name}")
except Exception as e:
results.append({
'文件': file_path.name,
'状态': f'失败: {e}',
'行数': 0
})
print(f"✗ 失败: {file_path.name} - {e}")
# 生成处理报告
report_df = pd.DataFrame(results)
report_path = self.output_folder / f'处理报告_{datetime.now():%Y%m%d_%H%M%S}.xlsx'
report_df.to_excel(report_path, index=False)
print(f"\n📋 处理报告: {report_path}")
return report_df
def generate_consolidated_report(self):
"""生成汇总报表"""
print("\n📊 生成汇总报表...")
report_wb = Workbook()
report_wb.remove(report_wb.active)
for idx, file_path in enumerate(self.excel_files, 1):
try:
df = pd.read_excel(file_path)
# 创建工作表(限制名称长度)
sheet_name = file_path.stem[:25]
ws = report_wb.create_sheet(sheet_name)
# 写入数据
for r_idx, row in enumerate([df.columns.tolist()] + df.values.tolist(), 1):
for c_idx, value in enumerate(row, 1):
ws.cell(row=r_idx, column=c_idx, value=value)
print(f"✓ [{idx}/{len(self.excel_files)}] {file_path.name}")
except Exception as e:
print(f"✗ 跳过: {file_path.name} - {e}")
output_path = self.output_folder / f'汇总报表_{datetime.now():%Y%m%d_%H%M%S}.xlsx'
report_wb.save(output_path)
print(f"\n💾 汇总报表: {output_path}")
# 使用示例
batch = BatchExcelProcessor('data', 'output')
# 合并所有文件
merged_data = batch.merge_all_files()
# 批量处理
def my_process(df):
"""自定义处理函数"""
# 添加处理时间列
df['处理时间'] = datetime.now()
# 其他处理...
return df
batch.process_each_file(my_process)
6. 完整使用示例
python
def main():
"""完整工作流示例"""
print("=" * 60)
print("Excel自动化工具 - 完整演示")
print("=" * 60)
# 1. 读取数据
reader = ExcelReader('sales_data.xlsx')
reader.load_workbook()
df = reader.read_sheet('销售数据')
# 2. 数据处理
processor = ExcelDataProcessor(df)
processor.clean_data().add_calculated_columns()
# 3. 数据分析
summary = processor.calculate_summary('产品类别', '销售额')
pivot = processor.create_pivot_table('产品类别', '月份', '销售额')
# 4. 生成报表
report = ExcelReportGenerator('销售分析报表.xlsx')
# 原始数据表
report.create_data_sheet('原始数据', processor.df, add_summary=True)
# 汇总表
ws_summary = report.create_data_sheet('产品汇总', summary.reset_index())
chart_range = {
'min_col': 2, 'max_col': 2,
'min_row': 1, 'max_row': len(summary) + 1,
'cat_col': 1
}
report.add_bar_chart(ws_summary, chart_range, '销售额对比')
# 透视表
report.create_data_sheet('月度透视', pivot.reset_index())
report.save()
print("\n" + "=" * 60)
print("✅ 所有任务完成!")
print("=" * 60)
if __name__ == '__main__':
main()
项目8:邮件自动化系统
项目简介
自动化邮件发送系统,支持HTML模板、附件、批量发送和定时任务。可用于发送通知邮件、报表邮件、营销邮件等。
技术栈
- 邮件发送:smtplib, email
- 模板引擎:Jinja2
- 任务调度:schedule, APScheduler
- 数据处理:pandas
- 配置管理:configparser
核心功能实现
1. 邮件发送基础类
python
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from email.mime.application import MIMEApplication
from email.header import Header
from pathlib import Path
from typing import List, Optional
import configparser
import os
import logging
# 配置日志
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
class EmailSender:
"""邮件发送器"""
def __init__(self, smtp_server, smtp_port, username, password, use_ssl=True):
self.smtp_server = smtp_server
self.smtp_port = smtp_port
self.username = username
self.password = password
self.use_ssl = use_ssl
def send_email(self,
to_addrs: List[str],
subject: str,
content: str,
content_type: str = 'plain',
cc_addrs: Optional[List[str]] = None,
bcc_addrs: Optional[List[str]] = None,
attachments: Optional[List[str]] = None,
from_name: Optional[str] = None):
"""
发送邮件
Args:
to_addrs: 收件人列表
subject: 邮件主题
content: 邮件内容
content_type: 内容类型 ('plain' 或 'html')
cc_addrs: 抄送列表
bcc_addrs: 密送列表
attachments: 附件文件路径列表
from_name: 发件人显示名称
"""
try:
# 创建邮件对象
msg = MIMEMultipart()
# 设置发件人
if from_name:
msg['From'] = f"{from_name} <{self.username}>"
else:
msg['From'] = self.username
# 设置收件人
msg['To'] = ', '.join(to_addrs)
# 设置抄送
if cc_addrs:
msg['Cc'] = ', '.join(cc_addrs)
# 设置主题
msg['Subject'] = Header(subject, 'utf-8')
# 添加邮件正文
msg.attach(MIMEText(content, content_type, 'utf-8'))
# 添加附件
if attachments:
for file_path in attachments:
self._add_attachment(msg, file_path)
# 合并所有收件人
all_recipients = to_addrs.copy()
if cc_addrs:
all_recipients.extend(cc_addrs)
if bcc_addrs:
all_recipients.extend(bcc_addrs)
# 连接SMTP服务器并发送
if self.use_ssl:
server = smtplib.SMTP_SSL(self.smtp_server, self.smtp_port)
else:
server = smtplib.SMTP(self.smtp_server, self.smtp_port)
server.starttls()
server.login(self.username, self.password)
server.sendmail(self.username, all_recipients, msg.as_string())
server.quit()
print(f"✅ 邮件发送成功: {subject}")
print(f" 收件人: {', '.join(to_addrs)}")
return True
except Exception as e:
print(f"❌ 邮件发送失败: {e}")
return False
def _add_attachment(self, msg, file_path):
"""添加附件"""
file_path = Path(file_path)
if not file_path.exists():
print(f"⚠️ 附件不存在: {file_path}")
return
with open(file_path, 'rb') as f:
attachment = MIMEApplication(f.read())
attachment.add_header(
'Content-Disposition',
'attachment',
filename=('utf-8', '', file_path.name)
)
msg.attach(attachment)
print(f"📎 添加附件: {file_path.name}")
# 使用示例(使用环境变量,更安全)
# ⚠️ 使用前请设置环境变量或创建.env文件
# 示例 .env 文件内容:
# SMTP_SERVER=smtp.gmail.com
# SMTP_PORT=465
# EMAIL_USERNAME=your_email@gmail.com
# EMAIL_PASSWORD=your_app_password
# EMAIL_USE_SSL=True
sender = EmailSender(
smtp_server=os.getenv('SMTP_SERVER', 'smtp.gmail.com'),
smtp_port=int(os.getenv('SMTP_PORT', '465')),
username=os.getenv('EMAIL_USERNAME'),
password=os.getenv('EMAIL_PASSWORD'),
use_ssl=os.getenv('EMAIL_USE_SSL', 'True').lower() == 'true'
)
# 验证必需的环境变量
if not sender.username or not sender.password:
logger.error("❌ 邮件配置错误:未设置 EMAIL_USERNAME 或 EMAIL_PASSWORD 环境变量")
logger.info("💡 请设置环境变量或使用配置文件(见下方 EmailConfig 类)")
else:
sender.send_email(
to_addrs=['recipient@example.com'],
subject='测试邮件',
content='这是一封测试邮件',
content_type='plain'
)
2. HTML邮件模板系统
python
from jinja2 import Environment, FileSystemLoader, Template
import pandas as pd
class EmailTemplateEngine:
"""邮件模板引擎"""
def __init__(self, template_dir='templates'):
self.template_dir = Path(template_dir)
self.template_dir.mkdir(exist_ok=True)
# 初始化Jinja2环境
self.env = Environment(loader=FileSystemLoader(str(self.template_dir)))
def render_template(self, template_name, **context):
"""渲染模板"""
try:
template = self.env.get_template(template_name)
html_content = template.render(**context)
print(f"✓ 模板渲染成功: {template_name}")
return html_content
except Exception as e:
print(f"✗ 模板渲染失败: {e}")
return None
def create_report_template(self):
"""创建报表邮件模板"""
template_content = """
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<style>
body {
font-family: 'Microsoft YaHei', Arial, sans-serif;
line-height: 1.6;
color: #333;
max-width: 800px;
margin: 0 auto;
padding: 20px;
}
.header {
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
color: white;
padding: 30px;
border-radius: 10px 10px 0 0;
text-align: center;
}
.content {
background: #f9f9f9;
padding: 30px;
border: 1px solid #ddd;
}
.metric {
display: inline-block;
background: white;
padding: 20px;
margin: 10px;
border-radius: 8px;
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
text-align: center;
min-width: 150px;
}
.metric-value {
font-size: 32px;
font-weight: bold;
color: #667eea;
}
.metric-label {
color: #666;
margin-top: 5px;
}
table {
width: 100%;
border-collapse: collapse;
margin: 20px 0;
background: white;
}
th {
background: #667eea;
color: white;
padding: 12px;
text-align: left;
}
td {
padding: 10px;
border-bottom: 1px solid #ddd;
}
tr:hover {
background: #f5f5f5;
}
.footer {
text-align: center;
padding: 20px;
color: #666;
font-size: 12px;
border-top: 1px solid #ddd;
}
</style>
</head>
<body>
<div class="header">
<h1>{{ title }}</h1>
<p>{{ report_date }}</p>
</div>
<div class="content">
<h2>关键指标</h2>
{% for metric in metrics %}
<div class="metric">
<div class="metric-value">{{ metric.value }}</div>
<div class="metric-label">{{ metric.label }}</div>
</div>
{% endfor %}
<h2>详细数据</h2>
<table>
<thead>
<tr>
{% for col in table_headers %}
<th>{{ col }}</th>
{% endfor %}
</tr>
</thead>
<tbody>
{% for row in table_data %}
<tr>
{% for cell in row %}
<td>{{ cell }}</td>
{% endfor %}
</tr>
{% endfor %}
</tbody>
</table>
{% if notes %}
<div style="background: #fff3cd; padding: 15px; border-left: 4px solid #ffc107; margin-top: 20px;">
<strong>备注:</strong>{{ notes }}
</div>
{% endif %}
</div>
<div class="footer">
<p>此邮件由系统自动发送,请勿回复</p>
<p>{{ company_name }} © {{ year }}</p>
</div>
</body>
</html>
"""
template_path = self.template_dir / 'report.html'
with open(template_path, 'w', encoding='utf-8') as f:
f.write(template_content.strip())
print(f"✓ 创建模板: {template_path}")
def create_notification_template(self):
"""创建通知邮件模板"""
template_content = """
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<style>
body {
font-family: 'Microsoft YaHei', Arial, sans-serif;
background: #f5f5f5;
padding: 20px;
}
.container {
max-width: 600px;
margin: 0 auto;
background: white;
border-radius: 10px;
overflow: hidden;
box-shadow: 0 4px 6px rgba(0,0,0,0.1);
}
.header {
background: #4CAF50;
color: white;
padding: 20px;
text-align: center;
}
.content {
padding: 30px;
}
.button {
display: inline-block;
background: #4CAF50;
color: white;
padding: 12px 30px;
text-decoration: none;
border-radius: 5px;
margin: 20px 0;
}
.footer {
background: #f9f9f9;
padding: 15px;
text-align: center;
font-size: 12px;
color: #666;
}
</style>
</head>
<body>
<div class="container">
<div class="header">
<h2>{{ title }}</h2>
</div>
<div class="content">
<p>尊敬的 {{ recipient_name }},</p>
<p>{{ message }}</p>
{% if action_url %}
<center>
<a href="{{ action_url }}" class="button">{{ action_text }}</a>
</center>
{% endif %}
<p style="margin-top: 30px;">
如有疑问,请联系我们。<br>
谢谢!
</p>
</div>
<div class="footer">
{{ company_name }}<br>
{{ contact_email }}
</div>
</div>
</body>
</html>
"""
template_path = self.template_dir / 'notification.html'
with open(template_path, 'w', encoding='utf-8') as f:
f.write(template_content.strip())
print(f"✓ 创建模板: {template_path}")
# 使用示例
template_engine = EmailTemplateEngine('email_templates')
template_engine.create_report_template()
template_engine.create_notification_template()
# 渲染报表邮件
html_content = template_engine.render_template(
'report.html',
title='月度销售报表',
report_date='2024年1月',
metrics=[
{'value': '¥1,234,567', 'label': '总销售额'},
{'value': '456', 'label': '订单数'},
{'value': '89', 'label': '新客户'}
],
table_headers=['产品', '销量', '销售额'],
table_data=[
['产品A', '100', '¥50,000'],
['产品B', '85', '¥42,500'],
['产品C', '120', '¥60,000']
],
notes='本月销售额同比增长15%',
company_name='ABC公司',
year='2024'
)
3. 批量邮件发送系统
python
import pandas as pd
import time
from datetime import datetime
class BulkEmailSender:
"""批量邮件发送器"""
def __init__(self, email_sender, template_engine):
self.sender = email_sender
self.template_engine = template_engine
def send_from_excel(self, excel_file, template_name, subject_col='主题',
email_col='邮箱', name_col='姓名', delay=1):
"""
从Excel文件批量发送邮件
Args:
excel_file: Excel文件路径
template_name: 模板名称
subject_col: 主题列名
email_col: 邮箱列名
name_col: 姓名列名
delay: 发送间隔(秒)
"""
# 读取Excel
df = pd.read_excel(excel_file)
print(f"📋 读取到 {len(df)} 条记录\n")
success_count = 0
fail_count = 0
for idx, row in df.iterrows():
try:
# 准备模板数据
context = row.to_dict()
context['recipient_name'] = row[name_col]
# 渲染模板
html_content = self.template_engine.render_template(
template_name,
**context
)
if html_content:
# 发送邮件
result = self.sender.send_email(
to_addrs=[row[email_col]],
subject=row[subject_col],
content=html_content,
content_type='html'
)
if result:
success_count += 1
else:
fail_count += 1
# 延迟
if idx < len(df) - 1:
time.sleep(delay)
except Exception as e:
print(f"✗ 发送失败 [{idx + 1}]: {e}")
fail_count += 1
print(f"\n{'=' * 60}")
print(f"批量发送完成")
print(f"成功: {success_count} | 失败: {fail_count}")
print(f"{'=' * 60}")
# 生成发送报告
self._generate_report(df, success_count, fail_count)
def _generate_report(self, df, success, fail):
"""生成发送报告"""
report = {
'发送时间': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
'总数': len(df),
'成功': success,
'失败': fail,
'成功率': f"{success / len(df) * 100:.1f}%"
}
report_path = f'email_report_{datetime.now():%Y%m%d_%H%M%S}.txt'
with open(report_path, 'w', encoding='utf-8') as f:
f.write("邮件发送报告\n")
f.write("=" * 40 + "\n")
for key, value in report.items():
f.write(f"{key}: {value}\n")
print(f"📄 报告已保存: {report_path}")
def send_personalized_reports(self, recipients_data, template_name):
"""
发送个性化报表
Args:
recipients_data: 列表,每项包含收件人信息和报表数据
template_name: 模板名称
"""
print(f"📧 开始发送个性化报表...\n")
for idx, recipient in enumerate(recipients_data, 1):
try:
html_content = self.template_engine.render_template(
template_name,
**recipient['data']
)
self.sender.send_email(
to_addrs=[recipient['email']],
subject=recipient['subject'],
content=html_content,
content_type='html',
attachments=recipient.get('attachments', [])
)
print(f"✓ [{idx}/{len(recipients_data)}] {recipient['name']}")
time.sleep(1)
except Exception as e:
print(f"✗ [{idx}/{len(recipients_data)}] {recipient['name']}: {e}")
# 使用示例
bulk_sender = BulkEmailSender(sender, template_engine)
# 从Excel批量发送
bulk_sender.send_from_excel(
excel_file='recipients.xlsx',
template_name='notification.html',
subject_col='邮件主题',
email_col='邮箱地址',
name_col='姓名',
delay=2
)
4. 定时任务调度
python
import schedule
import time
from datetime import datetime
class EmailScheduler:
"""邮件定时调度器"""
def __init__(self, email_sender, template_engine):
self.sender = email_sender
self.template_engine = template_engine
self.jobs = []
def add_daily_report(self, time_str, recipients, report_generator_func):
"""添加每日报表任务"""
def job():
print(f"\n⏰ 执行每日报表任务 - {datetime.now()}")
# 生成报表数据
report_data = report_generator_func()
# 渲染邮件
html_content = self.template_engine.render_template(
'report.html',
**report_data
)
# 发送邮件
self.sender.send_email(
to_addrs=recipients,
subject=f"每日报表 - {datetime.now():%Y-%m-%d}",
content=html_content,
content_type='html'
)
schedule.every().day.at(time_str).do(job)
print(f"✓ 添加每日报表任务: {time_str}")
def add_weekly_report(self, day_of_week, time_str, recipients, report_func):
"""添加每周报表任务"""
def job():
print(f"\n⏰ 执行每周报表任务 - {datetime.now()}")
report_data = report_func()
html_content = self.template_engine.render_template(
'report.html',
**report_data
)
self.sender.send_email(
to_addrs=recipients,
subject=f"周报 - {datetime.now():%Y年第%W周}",
content=html_content,
content_type='html'
)
# day_of_week: 'monday', 'tuesday', etc.
getattr(schedule.every(), day_of_week).at(time_str).do(job)
print(f"✓ 添加每周报表任务: 每周{day_of_week} {time_str}")
def add_custom_task(self, interval, unit, task_func):
"""添加自定义任务"""
def job():
print(f"\n⏰ 执行自定义任务 - {datetime.now()}")
task_func()
if unit == 'seconds':
schedule.every(interval).seconds.do(job)
elif unit == 'minutes':
schedule.every(interval).minutes.do(job)
elif unit == 'hours':
schedule.every(interval).hours.do(job)
print(f"✓ 添加自定义任务: 每{interval}{unit}")
def start(self):
"""启动调度器"""
print("\n" + "=" * 60)
print("📅 邮件调度器已启动")
print("=" * 60 + "\n")
try:
while True:
schedule.run_pending()
time.sleep(1)
except KeyboardInterrupt:
print("\n\n⏸️ 调度器已停止")
# 使用示例
scheduler = EmailScheduler(sender, template_engine)
# 添加每日报表(每天9:00发送)
def generate_daily_report():
return {
'title': '日报',
'report_date': datetime.now().strftime('%Y-%m-%d'),
'metrics': [
{'value': '1,234', 'label': '访问量'},
{'value': '567', 'label': '新用户'}
],
'table_headers': ['指标', '数值'],
'table_data': [['销售额', '¥10,000']],
'company_name': 'ABC公司',
'year': '2024'
}
scheduler.add_daily_report(
time_str='09:00',
recipients=['manager@company.com'],
report_generator_func=generate_daily_report
)
# 启动调度器(注释掉以免阻塞)
# scheduler.start()
5. 配置文件管理
python
import configparser
from pathlib import Path
import os
import logging
logger = logging.getLogger(__name__)
class EmailConfig:
"""
邮件配置管理类
用于管理邮件系统的配置信息,支持从配置文件读取
⚠️ 安全提示:
1. 配置文件包含敏感信息,应添加到 .gitignore
2. 生产环境建议使用环境变量或密钥管理系统
3. 对于Gmail,密码应使用"应用专用密码"而非账户密码
"""
def __init__(self, config_file='email_config.ini'):
"""
初始化配置管理器
Args:
config_file: 配置文件路径
"""
self.config_file = Path(config_file)
self.config = configparser.ConfigParser()
if self.config_file.exists():
self.config.read(self.config_file, encoding='utf-8')
logger.info(f"✓ 加载配置文件: {self.config_file}")
else:
self._create_default_config()
logger.warning(f"⚠️ 配置文件不存在,已创建模板: {self.config_file}")
logger.warning("⚠️ 请填写 username 和 password 后再使用")
def _create_default_config(self):
"""
创建默认配置文件模板
注意:创建的配置文件包含空的用户名和密码,需要手动填写
"""
self.config['SMTP'] = {
'server': 'smtp.gmail.com',
'port': '465',
'use_ssl': 'True',
'username': '', # ⚠️ 请填写邮箱地址
'password': '' # ⚠️ 请填写密码或应用专用密码
}
self.config['DEFAULT'] = {
'from_name': '系统邮件',
'company_name': 'ABC公司',
'contact_email': 'support@company.com'
}
self.config['SCHEDULE'] = {
'daily_report_time': '09:00',
'weekly_report_day': 'monday',
'weekly_report_time': '09:00'
}
self.save()
print(f"✓ 创建配置文件模板: {self.config_file}")
print(f"⚠️ 请编辑 {self.config_file} 填写邮箱用户名和密码")
print(f"💡 建议:将 {self.config_file} 添加到 .gitignore")
def save(self):
"""保存配置到文件"""
with open(self.config_file, 'w', encoding='utf-8') as f:
self.config.write(f)
def get_smtp_config(self):
"""
获取SMTP配置
Returns:
dict: 包含SMTP服务器配置的字典
Raises:
ValueError: 当用户名或密码为空时抛出异常
"""
username = self.config.get('SMTP', 'username')
password = self.config.get('SMTP', 'password')
# 验证必需配置
if not username or not password:
raise ValueError(
f"❌ 邮件配置不完整:请在 {self.config_file} 中填写 username 和 password"
)
return {
'smtp_server': self.config.get('SMTP', 'server'),
'smtp_port': self.config.getint('SMTP', 'port'),
'username': username,
'password': password,
'use_ssl': self.config.getboolean('SMTP', 'use_ssl')
}
# 使用配置文件(推荐方式)
try:
config = EmailConfig() # 会创建模板文件 email_config.ini
smtp_config = config.get_smtp_config()
sender = EmailSender(**smtp_config)
logger.info("✓ 邮件发送器初始化成功(使用配置文件)")
except ValueError as e:
logger.error(str(e))
logger.info("💡 或者使用环境变量方式初始化(见上方示例)")
项目9:微信公众号后台
项目简介
开发微信公众号后台服务,实现自动回复、菜单管理、用户管理等功能。支持消息接收、事件处理和主动推送。
技术栈
- Web框架:Flask
- 微信SDK:wechatpy
- 缓存:Redis
- 数据库:MySQL/SQLite
- 任务队列:Celery
核心功能实现
1. 基础配置和初始化
python
from flask import Flask, request, abort
from wechatpy import parse_message, create_reply
from wechatpy.utils import check_signature
from wechatpy.exceptions import InvalidSignatureException
from wechatpy.crypto import WeChatCrypto
import redis
import json
app = Flask(__name__)
# 微信公众号配置
WECHAT_TOKEN = 'your_token'
WECHAT_APPID = 'your_appid'
WECHAT_SECRET = 'your_secret'
WECHAT_ENCODING_AES_KEY = 'your_encoding_aes_key'
# Redis连接
redis_client = redis.Redis(host='localhost', port=6379, db=0, decode_responses=True)
# 加密消息处理
crypto = WeChatCrypto(WECHAT_TOKEN, WECHAT_ENCODING_AES_KEY, WECHAT_APPID)
print("✓ 微信公众号后台初始化完成")
2. 消息接收和验证
python
@app.route('/wechat', methods=['GET', 'POST'])
def wechat():
"""微信消息接收接口"""
signature = request.args.get('signature', '')
timestamp = request.args.get('timestamp', '')
nonce = request.args.get('nonce', '')
echostr = request.args.get('echostr', '')
# GET请求:服务器验证
if request.method == 'GET':
try:
check_signature(WECHAT_TOKEN, signature, timestamp, nonce)
return echostr
except InvalidSignatureException:
abort(403)
# POST请求:处理消息
try:
check_signature(WECHAT_TOKEN, signature, timestamp, nonce)
except InvalidSignatureException:
abort(403)
# 解密消息(如果启用了加密模式)
msg_signature = request.args.get('msg_signature', '')
try:
decrypted_xml = crypto.decrypt_message(
request.data,
msg_signature,
timestamp,
nonce
)
except:
decrypted_xml = request.data
# 解析消息
msg = parse_message(decrypted_xml)
# 处理消息
reply = handle_message(msg)
# 返回回复
if reply:
# 加密回复(如果启用了加密模式)
try:
encrypted_reply = crypto.encrypt_message(
reply.render(),
nonce,
timestamp
)
return encrypted_reply
except:
return reply.render()
else:
return 'success'
3. 消息处理器
python
from wechatpy.messages import (
TextMessage, ImageMessage, VoiceMessage,
VideoMessage, LocationMessage, LinkMessage
)
from wechatpy.replies import TextReply, ArticlesReply, ImageReply
def handle_message(msg):
"""处理各类消息"""
# 文本消息
if msg.type == 'text':
return handle_text_message(msg)
# 图片消息
elif msg.type == 'image':
return handle_image_message(msg)
# 语音消息
elif msg.type == 'voice':
return handle_voice_message(msg)
# 位置消息
elif msg.type == 'location':
return handle_location_message(msg)
# 事件消息
elif msg.type == 'event':
return handle_event_message(msg)
# 默认回复
else:
reply = TextReply(message=msg)
reply.content = '暂不支持此类消息'
return reply
def handle_text_message(msg):
"""处理文本消息"""
content = msg.content.strip()
# 关键词自动回复
keywords_replies = {
'你好': '你好!欢迎关注我们的公众号',
'帮助': '请输入以下关键词获取帮助:\n1. 功能介绍\n2. 联系我们\n3. 最新活动',
'功能介绍': '我们提供以下功能:\n- 自动回复\n- 信息查询\n- 在线客服',
'联系我们': '客服电话:400-xxx-xxxx\n邮箱:support@example.com'
}
# 检查关键词
for keyword, reply_text in keywords_replies.items():
if keyword in content:
reply = TextReply(message=msg)
reply.content = reply_text
# 记录用户消息
log_user_message(msg.source, content)
return reply
# 智能回复(接入AI)
if content.startswith('问'):
ai_response = get_ai_response(content[1:])
reply = TextReply(message=msg)
reply.content = ai_response
return reply
# 默认回复
reply = TextReply(message=msg)
reply.content = f'收到您的消息:{content}\n输入"帮助"获取更多信息'
log_user_message(msg.source, content)
return reply
def handle_image_message(msg):
"""处理图片消息"""
# 保存图片信息
image_data = {
'url': msg.image,
'media_id': msg.media_id,
'user': msg.source,
'time': msg.time
}
# 存储到Redis
redis_client.lpush(f'user:images:{msg.source}', json.dumps(image_data))
reply = TextReply(message=msg)
reply.content = '已收到您的图片'
return reply
def handle_event_message(msg):
"""处理事件消息"""
# 关注事件
if msg.event == 'subscribe':
return handle_subscribe_event(msg)
# 取消关注
elif msg.event == 'unsubscribe':
return handle_unsubscribe_event(msg)
# 点击菜单
elif msg.event == 'click':
return handle_menu_click_event(msg)
# 扫描二维码
elif msg.event == 'scan':
return handle_scan_event(msg)
return None
def handle_subscribe_event(msg):
"""处理关注事件"""
# 记录新粉丝
save_user_info(msg.source, subscribed=True)
# 欢迎消息
reply = TextReply(message=msg)
reply.content = '''感谢关注!
🎉 欢迎来到我们的公众号
您可以:
📌 回复"帮助"查看功能
📌 回复"活动"查看最新活动
📌 回复"客服"联系在线客服
期待为您服务!'''
return reply
def handle_unsubscribe_event(msg):
"""处理取消关注事件"""
# 更新用户状态
save_user_info(msg.source, subscribed=False)
return None # 取消关注不需要回复
def handle_menu_click_event(msg):
"""处理菜单点击事件"""
event_key = msg.key
# 根据菜单key返回不同内容
menu_responses = {
'LATEST_NEWS': get_latest_news(),
'ABOUT_US': get_about_us_info(),
'CONTACT': get_contact_info()
}
content = menu_responses.get(event_key, '功能开发中...')
reply = TextReply(message=msg)
reply.content = content
return reply
4. 用户管理
python
from datetime import datetime
import sqlite3
class UserManager:
"""用户管理器"""
def __init__(self, db_path='wechat_users.db'):
self.db_path = db_path
self.init_db()
def init_db(self):
"""初始化数据库"""
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
cursor.execute('''
CREATE TABLE IF NOT EXISTS users (
openid TEXT PRIMARY KEY,
nickname TEXT,
subscribe_time DATETIME,
unsubscribe_time DATETIME,
is_subscribed INTEGER DEFAULT 1,
message_count INTEGER DEFAULT 0,
last_message_time DATETIME
)
''')
cursor.execute('''
CREATE TABLE IF NOT EXISTS messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
openid TEXT,
content TEXT,
msg_type TEXT,
create_time DATETIME DEFAULT CURRENT_TIMESTAMP
)
''')
conn.commit()
conn.close()
def save_user(self, openid, nickname=None, subscribed=True):
"""保存/更新用户信息"""
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
now = datetime.now()
cursor.execute('''
INSERT OR REPLACE INTO users
(openid, nickname, subscribe_time, is_subscribed)
VALUES (?, ?, ?, ?)
''', (openid, nickname, now, 1 if subscribed else 0))
conn.commit()
conn.close()
def log_message(self, openid, content, msg_type='text'):
"""记录用户消息"""
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
cursor.execute('''
INSERT INTO messages (openid, content, msg_type)
VALUES (?, ?, ?)
''', (openid, content, msg_type))
# 更新用户消息计数
cursor.execute('''
UPDATE users
SET message_count = message_count + 1,
last_message_time = CURRENT_TIMESTAMP
WHERE openid = ?
''', (openid,))
conn.commit()
conn.close()
def get_user_stats(self):
"""获取用户统计"""
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
stats = {}
# 总用户数
cursor.execute('SELECT COUNT(*) FROM users WHERE is_subscribed = 1')
stats['total_users'] = cursor.fetchone()[0]
# 今日新增
cursor.execute('''
SELECT COUNT(*) FROM users
WHERE DATE(subscribe_time) = DATE('now')
AND is_subscribed = 1
''')
stats['today_new'] = cursor.fetchone()[0]
# 今日活跃
cursor.execute('''
SELECT COUNT(DISTINCT openid) FROM messages
WHERE DATE(create_time) = DATE('now')
''')
stats['today_active'] = cursor.fetchone()[0]
conn.close()
return stats
# 全局用户管理器
user_manager = UserManager()
def save_user_info(openid, subscribed=True):
user_manager.save_user(openid, subscribed=subscribed)
def log_user_message(openid, content, msg_type='text'):
user_manager.log_message(openid, content, msg_type)
5. 主动消息推送
python
from wechatpy import WeChatClient
# 创建微信客户端
wechat_client = WeChatClient(WECHAT_APPID, WECHAT_SECRET)
def send_template_message(openid, template_id, data, url=None):
"""发送模板消息"""
try:
result = wechat_client.message.send_template(
openid,
template_id,
data,
url=url
)
print(f"✓ 模板消息发送成功: {openid}")
return True
except Exception as e:
print(f"✗ 模板消息发送失败: {e}")
return False
def send_mass_message(user_list, msg_type='text', content=None, media_id=None):
"""群发消息"""
try:
if msg_type == 'text':
result = wechat_client.message.send_mass_text(
content,
user_list
)
elif msg_type == 'image':
result = wechat_client.message.send_mass_image(
media_id,
user_list
)
print(f"✓ 群发消息成功: {len(user_list)} 人")
return True
except Exception as e:
print(f"✗ 群发消息失败: {e}")
return False
# 使用示例
def send_daily_report():
"""发送每日报表"""
stats = user_manager.get_user_stats()
# 准备模板数据
template_data = {
'first': {'value': '每日数据报表', 'color': '#173177'},
'keyword1': {'value': stats['total_users'], 'color': '#173177'},
'keyword2': {'value': stats['today_new'], 'color': '#173177'},
'keyword3': {'value': stats['today_active'], 'color': '#173177'},
'remark': {'value': '感谢您的关注!', 'color': '#173177'}
}
# 发送给管理员
send_template_message(
'admin_openid',
'template_id',
template_data,
url='https://example.com/report'
)
6. 自定义菜单管理
python
def create_custom_menu():
"""创建自定义菜单"""
menu_data = {
'button': [
{
'type': 'click',
'name': '最新资讯',
'key': 'LATEST_NEWS'
},
{
'name': '功能服务',
'sub_button': [
{
'type': 'view',
'name': '在线查询',
'url': 'https://example.com/query'
},
{
'type': 'click',
'name': '联系客服',
'key': 'CONTACT'
},
{
'type': 'miniprogram',
'name': '小程序',
'url': 'https://example.com',
'appid': 'miniprogram_appid',
'pagepath': 'pages/index/index'
}
]
},
{
'name': '关于我们',
'sub_button': [
{
'type': 'click',
'name': '公司介绍',
'key': 'ABOUT_US'
},
{
'type': 'view',
'name': '官网',
'url': 'https://example.com'
}
]
}
]
}
try:
result = wechat_client.menu.create(menu_data)
print("✓ 自定义菜单创建成功")
return True
except Exception as e:
print(f"✗ 菜单创建失败: {e}")
return False
def delete_custom_menu():
"""删除自定义菜单"""
wechat_client.menu.delete()
print("✓ 菜单已删除")
7. 辅助功能
python
def get_latest_news():
"""获取最新资讯"""
# 从数据库或API获取
return "📰 最新资讯\n\n1. 新功能上线通知\n2. 系统维护公告\n3. 活动预告"
def get_about_us_info():
"""获取关于我们信息"""
return "🏢 关于我们\n\n我们致力于为用户提供优质服务...\n\n官网:https://example.com"
def get_contact_info():
"""获取联系方式"""
return "📞 联系我们\n\n客服热线:400-xxx-xxxx\n工作时间:9:00-18:00\n邮箱:support@example.com"
def get_ai_response(question):
"""AI智能回复(示例)"""
# 这里可以接入ChatGPT等AI服务
return f"正在为您查询:{question}\n\n抱歉,AI功能正在开发中..."
if __name__ == '__main__':
# 创建菜单
create_custom_menu()
# 启动Flask应用
app.run(host='0.0.0.0', port=80, debug=False)
项目10:实时日志监控系统
项目简介
实时监控应用日志文件,自动解析、分析和告警。支持多种日志格式,提供Web界面实时展示。
技术栈
- 文件监控:watchdog
- 日志解析:正则表达式、logparser
- Web框架:Flask + WebSocket
- 数据存储:SQLite/MongoDB
- 实时推送:Flask-SocketIO
核心功能实现
1. 日志文件监控
python
import time
import re
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
from pathlib import Path
from datetime import datetime
import json
class LogFileHandler(FileSystemEventHandler):
"""日志文件监控处理器"""
def __init__(self, log_parser, callback=None):
self.log_parser = log_parser
self.callback = callback
self.file_positions = {} # 记录文件读取位置
def on_modified(self, event):
"""文件修改时触发"""
if event.is_directory:
return
file_path = event.src_path
# 只处理日志文件
if not file_path.endswith(('.log', '.txt')):
return
print(f"📝 检测到日志更新: {file_path}")
# 读取新增内容
new_lines = self.read_new_lines(file_path)
# 解析日志
for line in new_lines:
log_entry = self.log_parser.parse(line)
if log_entry and self.callback:
self.callback(log_entry)
def read_new_lines(self, file_path):
"""读取文件新增行"""
try:
# 获取上次读取位置
last_pos = self.file_positions.get(file_path, 0)
with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
# 移动到上次位置
f.seek(last_pos)
# 读取新内容
new_lines = f.readlines()
# 更新位置
self.file_positions[file_path] = f.tell()
return new_lines
except Exception as e:
print(f"✗ 读取文件失败: {e}")
return []
class LogMonitor:
"""日志监控器"""
def __init__(self, watch_paths, log_parser, callback=None):
self.watch_paths = watch_paths if isinstance(watch_paths, list) else [watch_paths]
self.log_parser = log_parser
self.callback = callback
self.observer = Observer()
self.handler = LogFileHandler(log_parser, callback)
def start(self):
"""启动监控"""
for path in self.watch_paths:
self.observer.schedule(self.handler, path, recursive=True)
print(f"📂 开始监控: {path}")
self.observer.start()
print("✅ 日志监控已启动\n")
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
self.stop()
def stop(self):
"""停止监控"""
self.observer.stop()
self.observer.join()
print("\n⏸️ 日志监控已停止")
# 使用示例
# monitor = LogMonitor(
# watch_paths=['/var/log', './logs'],
# log_parser=LogParser(),
# callback=process_log_entry
# )
# monitor.start()
2. 日志解析器
python
import re
from datetime import datetime
from collections import defaultdict
class LogParser:
"""日志解析器"""
def __init__(self):
# 常见日志格式的正则表达式
self.patterns = {
# Apache/Nginx访问日志
'access': re.compile(
r'(?P<ip>[\d.]+) - - \[(?P<time>[^\]]+)\] '
r'"(?P<method>\w+) (?P<url>[^\s]+) HTTP/[\d.]+" '
r'(?P<status>\d+) (?P<size>\d+)'
),
# Python logging
'python': re.compile(
r'(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}) - '
r'(?P<level>\w+) - (?P<module>[\w.]+) - (?P<message>.+)'
),
# 通用日志格式
'generic': re.compile(
r'\[(?P<time>[^\]]+)\] \[(?P<level>\w+)\] (?P<message>.+)'
)
}
self.stats = defaultdict(int)
def parse(self, line):
"""解析单行日志"""
line = line.strip()
if not line:
return None
# 尝试不同格式
for format_name, pattern in self.patterns.items():
match = pattern.match(line)
if match:
log_entry = match.groupdict()
log_entry['format'] = format_name
log_entry['raw'] = line
# 统计
self.stats[format_name] += 1
# 检测错误级别
if 'level' in log_entry:
log_entry['is_error'] = log_entry['level'] in ['ERROR', 'CRITICAL', 'FATAL']
elif 'status' in log_entry:
log_entry['is_error'] = int(log_entry['status']) >= 400
return log_entry
# 无法识别的格式
return {
'format': 'unknown',
'raw': line,
'message': line
}
def extract_errors(self, log_file):
"""提取错误日志"""
errors = []
with open(log_file, 'r', encoding='utf-8', errors='ignore') as f:
for line in f:
entry = self.parse(line)
if entry and entry.get('is_error'):
errors.append(entry)
return errors
def get_stats(self):
"""获取解析统计"""
return dict(self.stats)
# 使用示例
parser = LogParser()
# 解析单行
log_line = '2024-01-04 10:30:45,123 - ERROR - myapp.views - Database connection failed'
entry = parser.parse(log_line)
print(entry)
3. 日志分析器
python
from collections import Counter, defaultdict
from datetime import datetime, timedelta
import re
class LogAnalyzer:
"""日志分析器"""
def __init__(self):
self.entries = []
def add_entry(self, entry):
"""添加日志条目"""
self.entries.append(entry)
def analyze(self):
"""分析日志"""
if not self.entries:
return {}
return {
'total_count': len(self.entries),
'error_count': self.count_errors(),
'level_distribution': self.get_level_distribution(),
'top_ips': self.get_top_ips(10),
'top_urls': self.get_top_urls(10),
'status_distribution': self.get_status_distribution(),
'error_messages': self.get_top_errors(10),
'timeline': self.get_timeline()
}
def count_errors(self):
"""统计错误数量"""
return sum(1 for entry in self.entries if entry.get('is_error'))
def get_level_distribution(self):
"""获取日志级别分布"""
levels = [entry.get('level', 'UNKNOWN') for entry in self.entries if 'level' in entry]
return dict(Counter(levels))
def get_top_ips(self, n=10):
"""获取访问量最多的IP"""
ips = [entry.get('ip') for entry in self.entries if 'ip' in entry]
return dict(Counter(ips).most_common(n))
def get_top_urls(self, n=10):
"""获取访问量最多的URL"""
urls = [entry.get('url') for entry in self.entries if 'url' in entry]
return dict(Counter(urls).most_common(n))
def get_status_distribution(self):
"""获取HTTP状态码分布"""
statuses = [entry.get('status') for entry in self.entries if 'status' in entry]
return dict(Counter(statuses))
def get_top_errors(self, n=10):
"""获取最常见的错误"""
errors = [
entry.get('message', '')
for entry in self.entries
if entry.get('is_error')
]
return dict(Counter(errors).most_common(n))
def get_timeline(self, interval_minutes=5):
"""获取时间线分布"""
timeline = defaultdict(int)
for entry in self.entries:
time_str = entry.get('time')
if time_str:
try:
# 解析时间并按间隔分组
dt = self.parse_time(time_str)
if dt:
bucket = dt.replace(
minute=(dt.minute // interval_minutes) * interval_minutes,
second=0,
microsecond=0
)
timeline[bucket.strftime('%H:%M')] += 1
except:
pass
return dict(sorted(timeline.items()))
@staticmethod
def parse_time(time_str):
"""解析时间字符串"""
formats = [
'%Y-%m-%d %H:%M:%S,%f',
'%d/%b/%Y:%H:%M:%S',
'%Y-%m-%d %H:%M:%S'
]
for fmt in formats:
try:
return datetime.strptime(time_str.split()[0], fmt)
except:
continue
return None
def generate_report(self):
"""生成分析报告"""
analysis = self.analyze()
report = f"""
╔══════════════════════════════════════════╗
║ 日志分析报告 ║
╚══════════════════════════════════════════╝
📊 基础统计
总日志数: {analysis['total_count']}
错误数: {analysis['error_count']}
错误率: {analysis['error_count']/analysis['total_count']*100:.2f}%
📈 日志级别分布
"""
for level, count in analysis.get('level_distribution', {}).items():
report += f" {level:10s}: {count:6d}\n"
report += "\n🌐 TOP 10 访问IP\n"
for ip, count in analysis.get('top_ips', {}).items():
report += f" {ip:15s}: {count:6d}\n"
report += "\n🔗 TOP 10 访问URL\n"
for url, count in list(analysis.get('top_urls', {}).items())[:5]:
report += f" {url[:50]:50s}: {count:6d}\n"
report += "\n❌ TOP 错误消息\n"
for msg, count in list(analysis.get('error_messages', {}).items())[:5]:
report += f" {msg[:50]:50s}: {count:6d}\n"
return report
# 使用示例
analyzer = LogAnalyzer()
# 批量分析日志文件
with open('app.log', 'r') as f:
parser = LogParser()
for line in f:
entry = parser.parse(line)
if entry:
analyzer.add_entry(entry)
# 生成报告
print(analyzer.generate_report())
项目11:图片处理Web服务
项目简介
提供RESTful API的图片处理服务,支持缩放、裁剪、滤镜、水印等功能。使用异步任务队列处理耗时操作。
技术栈
- Web框架:FastAPI
- 图像处理:Pillow (PIL)
- 异步任务:Celery
- 消息队列:Redis
- 文件存储:本地/OSS
核心功能实现
1. FastAPI应用配置
python
from fastapi import FastAPI, File, UploadFile, HTTPException, BackgroundTasks
from fastapi.responses import FileResponse, StreamingResponse
from PIL import Image, ImageFilter, ImageEnhance, ImageDraw, ImageFont
from pathlib import Path
import io
import uuid
from typing import Optional
import shutil
import logging
import imghdr
# 配置日志
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
app = FastAPI(title="图片处理API", version="1.0.0")
# 安全配置常量
MAX_FILE_SIZE = 10 * 1024 * 1024 # 10 MB 最大文件大小
ALLOWED_EXTENSIONS = {'.jpg', '.jpeg', '.png', '.gif', '.bmp', '.webp'} # 允许的文件扩展名
ALLOWED_IMAGE_TYPES = {'jpeg', 'png', 'gif', 'bmp', 'webp'} # 允许的图片类型(通过魔数验证)
# 配置目录
UPLOAD_DIR = Path("uploads")
PROCESSED_DIR = Path("processed")
UPLOAD_DIR.mkdir(exist_ok=True)
PROCESSED_DIR.mkdir(exist_ok=True)
@app.get("/")
def read_root():
return {
"message": "图片处理API服务",
"endpoints": [
"/upload - 上传图片",
"/resize - 调整大小",
"/crop - 裁剪图片",
"/filter - 应用滤镜",
"/watermark - 添加水印"
]
}
2. 图片上传与安全验证
python
def validate_image_file(file: UploadFile, file_content: bytes) -> tuple[bool, str]:
"""
验证上传的文件是否为合法图片
Args:
file: 上传的文件对象
file_content: 文件内容字节
Returns:
tuple: (是否有效, 错误信息)
"""
# 1. 验证文件大小
if len(file_content) > MAX_FILE_SIZE:
return False, f"文件过大,最大允许 {MAX_FILE_SIZE / 1024 / 1024:.1f} MB"
# 2. 验证文件扩展名(防止路径遍历攻击)
if not file.filename:
return False, "文件名不能为空"
file_ext = Path(file.filename).suffix.lower()
if file_ext not in ALLOWED_EXTENSIONS:
return False, f"不支持的文件扩展名 {file_ext},仅支持: {', '.join(ALLOWED_EXTENSIONS)}"
# 3. 验证文件魔数(Magic Number)- 防止扩展名伪装
image_type = imghdr.what(None, h=file_content)
if image_type not in ALLOWED_IMAGE_TYPES:
return False, f"文件内容不是有效的图片格式(检测到: {image_type})"
# 4. 使用 PIL 验证图片完整性
try:
img = Image.open(io.BytesIO(file_content))
img.verify() # 验证图片是否损坏
# 检查图片尺寸是否合理(防止图片炸弹攻击)
max_pixels = 50_000_000 # 5000万像素限制
if img.size[0] * img.size[1] > max_pixels:
return False, f"图片像素过大({img.size[0]}x{img.size[1]}),最大允许 {max_pixels:,} 像素"
except Exception as e:
return False, f"图片验证失败: {str(e)}"
return True, ""
@app.post("/upload")
async def upload_image(file: UploadFile = File(...)):
"""
上传图片(包含完整的安全验证)
安全措施:
1. 文件大小限制
2. 扩展名白名单
3. 魔数(Magic Number)验证
4. PIL 完整性验证
5. 像素数量限制(防止图片炸弹)
6. 文件名安全处理(UUID重命名)
"""
try:
# 读取文件内容
file_content = await file.read()
# 安全验证
is_valid, error_msg = validate_image_file(file, file_content)
if not is_valid:
logger.warning(f"文件上传失败: {file.filename} - {error_msg}")
raise HTTPException(status_code=400, detail=error_msg)
# 生成安全的文件名(使用UUID,防止路径遍历)
file_ext = Path(file.filename).suffix.lower()
file_id = str(uuid.uuid4())
safe_filename = f"{file_id}{file_ext}"
file_path = UPLOAD_DIR / safe_filename
# 保存文件
with open(file_path, "wb") as buffer:
buffer.write(file_content)
# 获取图片信息
with Image.open(file_path) as img:
image_info = {
"file_id": file_id,
"original_filename": file.filename,
"format": img.format,
"size": img.size,
"mode": img.mode,
"file_size": len(file_content),
"file_size_mb": f"{len(file_content) / 1024 / 1024:.2f} MB"
}
logger.info(f"✓ 文件上传成功: {file.filename} -> {safe_filename}")
return {
"message": "上传成功",
"data": image_info
}
except HTTPException:
raise
except Exception as e:
logger.error(f"✗ 上传处理失败: {str(e)}")
raise HTTPException(status_code=500, detail=f"服务器错误: {str(e)}")
3. 图片处理类
python
class ImageProcessor:
"""图片处理器"""
@staticmethod
def resize(image_path, width=None, height=None, keep_ratio=True):
"""调整大小"""
with Image.open(image_path) as img:
original_size = img.size
if keep_ratio:
if width and not height:
ratio = width / img.size[0]
height = int(img.size[1] * ratio)
elif height and not width:
ratio = height / img.size[1]
width = int(img.size[0] * ratio)
if width and height:
img = img.resize((width, height), Image.Resampling.LANCZOS)
return img, {
"original_size": original_size,
"new_size": img.size
}
@staticmethod
def crop(image_path, x, y, width, height):
"""裁剪图片"""
with Image.open(image_path) as img:
box = (x, y, x + width, y + height)
cropped = img.crop(box)
return cropped, {
"crop_box": box,
"cropped_size": cropped.size
}
@staticmethod
def apply_filter(image_path, filter_name):
"""应用滤镜"""
filters = {
'blur': ImageFilter.BLUR,
'contour': ImageFilter.CONTOUR,
'detail': ImageFilter.DETAIL,
'edge_enhance': ImageFilter.EDGE_ENHANCE,
'emboss': ImageFilter.EMBOSS,
'sharpen': ImageFilter.SHARPEN,
'smooth': ImageFilter.SMOOTH
}
if filter_name not in filters:
raise ValueError(f"不支持的滤镜: {filter_name}")
with Image.open(image_path) as img:
filtered = img.filter(filters[filter_name])
return filtered, {"filter": filter_name}
@staticmethod
def add_watermark(image_path, text, position='bottom-right', opacity=128):
"""添加水印"""
with Image.open(image_path) as img:
# 转换为RGBA模式
if img.mode != 'RGBA':
img = img.convert('RGBA')
# 创建水印层
watermark = Image.new('RGBA', img.size, (255, 255, 255, 0))
draw = ImageDraw.Draw(watermark)
# 字体(使用默认字体)
try:
font = ImageFont.truetype("arial.ttf", 36)
except:
font = ImageFont.load_default()
# 计算文本大小
bbox = draw.textbbox((0, 0), text, font=font)
text_width = bbox[2] - bbox[0]
text_height = bbox[3] - bbox[1]
# 计算位置
positions = {
'top-left': (10, 10),
'top-right': (img.width - text_width - 10, 10),
'bottom-left': (10, img.height - text_height - 10),
'bottom-right': (img.width - text_width - 10,
img.height - text_height - 10),
'center': ((img.width - text_width) // 2,
(img.height - text_height) // 2)
}
pos = positions.get(position, positions['bottom-right'])
# 绘制水印
draw.text(pos, text, fill=(255, 255, 255, opacity), font=font)
# 合并图层
watermarked = Image.alpha_composite(img, watermark)
return watermarked.convert('RGB'), {
"watermark_text": text,
"position": position
}
@staticmethod
def adjust_brightness(image_path, factor):
"""调整亮度 (factor: 0.0-2.0)"""
with Image.open(image_path) as img:
enhancer = ImageEnhance.Brightness(img)
enhanced = enhancer.enhance(factor)
return enhanced, {"brightness_factor": factor}
@staticmethod
def adjust_contrast(image_path, factor):
"""调整对比度 (factor: 0.0-2.0)"""
with Image.open(image_path) as img:
enhancer = ImageEnhance.Contrast(img)
enhanced = enhancer.enhance(factor)
return enhanced, {"contrast_factor": factor}
@staticmethod
def rotate(image_path, angle, expand=True):
"""旋转图片"""
with Image.open(image_path) as img:
rotated = img.rotate(angle, expand=expand)
return rotated, {"angle": angle}
processor = ImageProcessor()
4. API端点
python
@app.post("/resize")
async def resize_image(
file_id: str,
width: Optional[int] = None,
height: Optional[int] = None,
keep_ratio: bool = True
):
"""调整图片大小"""
input_path = find_image(file_id)
if not input_path:
raise HTTPException(status_code=404, detail="图片不存在")
# 处理图片
result_img, info = processor.resize(input_path, width, height, keep_ratio)
# 保存结果
output_path = PROCESSED_DIR / f"{file_id}_resized.jpg"
result_img.save(output_path, quality=95)
return {
"message": "处理成功",
"file_id": file_id,
"info": info,
"download_url": f"/download/{file_id}_resized"
}
@app.post("/crop")
async def crop_image(
file_id: str,
x: int,
y: int,
width: int,
height: int
):
"""裁剪图片"""
input_path = find_image(file_id)
if not input_path:
raise HTTPException(status_code=404, detail="图片不存在")
result_img, info = processor.crop(input_path, x, y, width, height)
output_path = PROCESSED_DIR / f"{file_id}_cropped.jpg"
result_img.save(output_path, quality=95)
return {
"message": "裁剪成功",
"file_id": file_id,
"info": info,
"download_url": f"/download/{file_id}_cropped"
}
@app.post("/filter")
async def apply_filter(
file_id: str,
filter_name: str
):
"""应用滤镜"""
input_path = find_image(file_id)
if not input_path:
raise HTTPException(status_code=404, detail="图片不存在")
try:
result_img, info = processor.apply_filter(input_path, filter_name)
output_path = PROCESSED_DIR / f"{file_id}_{filter_name}.jpg"
result_img.save(output_path, quality=95)
return {
"message": "滤镜应用成功",
"file_id": file_id,
"info": info,
"download_url": f"/download/{file_id}_{filter_name}"
}
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
@app.post("/watermark")
async def add_watermark(
file_id: str,
text: str,
position: str = 'bottom-right',
opacity: int = 128
):
"""添加水印"""
input_path = find_image(file_id)
if not input_path:
raise HTTPException(status_code=404, detail="图片不存在")
result_img, info = processor.add_watermark(input_path, text, position, opacity)
output_path = PROCESSED_DIR / f"{file_id}_watermarked.jpg"
result_img.save(output_path, quality=95)
return {
"message": "水印添加成功",
"file_id": file_id,
"info": info,
"download_url": f"/download/{file_id}_watermarked"
}
@app.get("/download/{file_name}")
async def download_image(file_name: str):
"""下载处理后的图片"""
file_path = PROCESSED_DIR / f"{file_name}.jpg"
if not file_path.exists():
raise HTTPException(status_code=404, detail="文件不存在")
return FileResponse(file_path, media_type="image/jpeg")
def find_image(file_id):
"""查找图片文件"""
for ext in ['.jpg', '.jpeg', '.png', '.gif', '.webp']:
path = UPLOAD_DIR / f"{file_id}{ext}"
if path.exists():
return path
return None
if __name__ == '__main__':
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
项目12:Markdown博客生成器
项目简介
将Markdown文件转换为精美的静态博客网站。支持主题定制、标签分类、自动导航等功能,可部署到GitHub Pages。
技术栈
- Markdown解析:markdown, markdown2
- 模板引擎:Jinja2
- 前端框架:Bootstrap (可选)
- 部署:GitHub Pages, Netlify
核心功能实现
1. Markdown解析器
python
import markdown
from markdown.extensions import fenced_code, tables, toc
from pathlib import Path
import frontmatter
from datetime import datetime
import re
class MarkdownParser:
"""Markdown文件解析器"""
def __init__(self):
self.md = markdown.Markdown(extensions=[
'extra',
'codehilite',
'toc',
'meta',
'fenced_code',
'tables'
])
def parse_file(self, file_path):
"""解析Markdown文件"""
# 读取文件
with open(file_path, 'r', encoding='utf-8') as f:
content = f.read()
# 解析front matter(元数据)
post = frontmatter.loads(content)
# 转换Markdown为HTML
html_content = self.md.convert(post.content)
# 提取元数据
metadata = {
'title': post.get('title', Path(file_path).stem),
'date': post.get('date', datetime.now()),
'tags': post.get('tags', []),
'category': post.get('category', 'Uncategorized'),
'author': post.get('author', 'Anonymous'),
'description': post.get('description', ''),
'toc': self.md.toc if hasattr(self.md, 'toc') else ''
}
# 重置解析器
self.md.reset()
return {
'content': html_content,
'metadata': metadata,
'file_path': file_path
}
def extract_summary(self, html_content, max_length=200):
"""提取摘要"""
# 移除HTML标签
text = re.sub(r'<[^>]+>', '', html_content)
# 截取前N个字符
summary = text[:max_length]
if len(text) > max_length:
summary += '...'
return summary
# 使用示例
parser = MarkdownParser()
post = parser.parse_file('posts/my-first-post.md')
print(post['metadata']['title'])
print(post['content'][:100])
2. 博客生成器
python
from jinja2 import Environment, FileSystemLoader, select_autoescape
from pathlib import Path
import shutil
from datetime import datetime
import json
class BlogGenerator:
"""静态博客生成器"""
def __init__(self, source_dir='content', output_dir='output', templates_dir='templates'):
self.source_dir = Path(source_dir)
self.output_dir = Path(output_dir)
self.templates_dir = Path(templates_dir)
# 初始化Jinja2环境
self.env = Environment(
loader=FileSystemLoader(str(self.templates_dir)),
autoescape=select_autoescape(['html', 'xml'])
)
self.parser = MarkdownParser()
self.posts = []
self.tags = {}
self.categories = {}
def scan_posts(self):
"""扫描所有Markdown文件"""
print("📂 扫描文章...")
md_files = list(self.source_dir.glob('**/*.md'))
for md_file in md_files:
try:
post_data = self.parser.parse_file(md_file)
# 添加额外信息
post_data['url'] = self.get_post_url(md_file)
post_data['summary'] = self.parser.extract_summary(post_data['content'])
self.posts.append(post_data)
# 构建标签索引
for tag in post_data['metadata']['tags']:
if tag not in self.tags:
self.tags[tag] = []
self.tags[tag].append(post_data)
# 构建分类索引
category = post_data['metadata']['category']
if category not in self.categories:
self.categories[category] = []
self.categories[category].append(post_data)
print(f" ✓ {md_file.name}")
except Exception as e:
print(f" ✗ {md_file.name}: {e}")
# 按日期排序
self.posts.sort(key=lambda p: p['metadata']['date'], reverse=True)
print(f"\n✅ 共找到 {len(self.posts)} 篇文章")
def get_post_url(self, file_path):
"""生成文章URL"""
relative = file_path.relative_to(self.source_dir)
return str(relative.with_suffix('.html'))
def generate_post_pages(self):
"""生成文章页面"""
print("\n📝 生成文章页面...")
template = self.env.get_template('post.html')
for post in self.posts:
# 渲染模板
html = template.render(
post=post,
site_title='我的博客',
all_tags=self.tags.keys(),
all_categories=self.categories.keys()
)
# 保存文件
output_path = self.output_dir / post['url']
output_path.parent.mkdir(parents=True, exist_ok=True)
with open(output_path, 'w', encoding='utf-8') as f:
f.write(html)
print(f" ✓ {post['url']}")
def generate_index_page(self):
"""生成首页"""
print("\n🏠 生成首页...")
template = self.env.get_template('index.html')
html = template.render(
posts=self.posts[:10], # 显示最新10篇
site_title='我的博客',
site_description='分享技术与生活',
all_tags=self.tags.keys(),
all_categories=self.categories.keys()
)
output_path = self.output_dir / 'index.html'
with open(output_path, 'w', encoding='utf-8') as f:
f.write(html)
print(" ✓ index.html")
def generate_tag_pages(self):
"""生成标签页面"""
print("\n🏷️ 生成标签页面...")
template = self.env.get_template('tag.html')
for tag, posts in self.tags.items():
html = template.render(
tag=tag,
posts=posts,
site_title='我的博客'
)
output_path = self.output_dir / 'tags' / f'{tag}.html'
output_path.parent.mkdir(parents=True, exist_ok=True)
with open(output_path, 'w', encoding='utf-8') as f:
f.write(html)
print(f" ✓ tags/{tag}.html")
def generate_archive_page(self):
"""生成归档页面"""
print("\n📚 生成归档页面...")
# 按年月分组
archives = {}
for post in self.posts:
date = post['metadata']['date']
year_month = date.strftime('%Y-%m')
if year_month not in archives:
archives[year_month] = []
archives[year_month].append(post)
template = self.env.get_template('archive.html')
html = template.render(
archives=archives,
site_title='我的博客'
)
output_path = self.output_dir / 'archive.html'
with open(output_path, 'w', encoding='utf-8') as f:
f.write(html)
print(" ✓ archive.html")
def copy_static_files(self):
"""复制静态文件"""
print("\n📦 复制静态文件...")
static_dir = self.templates_dir / 'static'
if static_dir.exists():
output_static = self.output_dir / 'static'
if output_static.exists():
shutil.rmtree(output_static)
shutil.copytree(static_dir, output_static)
print(" ✓ 静态文件已复制")
def generate_sitemap(self):
"""生成站点地图"""
sitemap_lines = ['<?xml version="1.0" encoding="UTF-8"?>']
sitemap_lines.append('<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">')
base_url = 'https://yourdomain.com'
for post in self.posts:
sitemap_lines.append(' <url>')
sitemap_lines.append(f' <loc>{base_url}/{post["url"]}</loc>')
sitemap_lines.append(f' <lastmod>{post["metadata"]["date"].strftime("%Y-%m-%d")}</lastmod>')
sitemap_lines.append(' </url>')
sitemap_lines.append('</urlset>')
output_path = self.output_dir / 'sitemap.xml'
with open(output_path, 'w', encoding='utf-8') as f:
f.write('\n'.join(sitemap_lines))
print("\n🗺️ 生成sitemap.xml")
def build(self):
"""构建整个站点"""
print("=" * 60)
print("开始生成静态博客")
print("=" * 60)
# 清理输出目录
if self.output_dir.exists():
shutil.rmtree(self.output_dir)
self.output_dir.mkdir(parents=True)
# 执行生成步骤
self.scan_posts()
self.generate_post_pages()
self.generate_index_page()
self.generate_tag_pages()
self.generate_archive_page()
self.copy_static_files()
self.generate_sitemap()
print("\n" + "=" * 60)
print(f"✅ 博客生成完成!输出目录: {self.output_dir.absolute()}")
print("=" * 60)
# 使用示例
if __name__ == '__main__':
generator = BlogGenerator(
source_dir='content',
output_dir='output',
templates_dir='templates'
)
generator.build()
3. 示例模板 (templates/post.html)
html
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>{{ post.metadata.title }} - {{ site_title }}</title>
<link rel="stylesheet" href="/static/css/style.css">
<link rel="stylesheet" href="/static/css/highlight.css">
</head>
<body>
<header>
<nav>
<a href="/">首页</a>
<a href="/archive.html">归档</a>
<a href="/about.html">关于</a>
</nav>
</header>
<main>
<article>
<h1>{{ post.metadata.title }}</h1>
<div class="post-meta">
<span>📅 {{ post.metadata.date.strftime('%Y-%m-%d') }}</span>
<span>✍️ {{ post.metadata.author }}</span>
<span>📂 {{ post.metadata.category }}</span>
</div>
<div class="post-tags">
{% for tag in post.metadata.tags %}
<a href="/tags/{{ tag }}.html" class="tag">#{{ tag }}</a>
{% endfor %}
</div>
<div class="post-content">
{{ post.content|safe }}
</div>
</article>
</main>
<footer>
<p>© 2024 {{ site_title }}. All rights reserved.</p>
</footer>
</body>
</html>
项目13:股票数据分析工具
项目简介
获取实时股票数据,进行技术分析和可视化。支持计算技术指标、绘制K线图、生成交易信号。
技术栈
- 数据源:akshare, yfinance
- 数据处理:pandas, numpy
- 可视化:matplotlib, mplfinance
- 技术指标:ta-lib (可选)
- 数据存储:SQLite
核心功能实现
1. 股票数据获取
python
import akshare as ak
import pandas as pd
import sqlite3
import logging
import re
from datetime import datetime, timedelta
from typing import Optional
# 配置日志
logger = logging.getLogger(__name__)
class StockDataFetcher:
"""
股票数据获取器
支持从akshare获取A股历史数据,并存储到SQLite数据库
"""
# 股票代码格式验证(A股6位数字)
STOCK_CODE_PATTERN = re.compile(r'^\d{6}$')
# 默认数据获取天数
DEFAULT_DAYS = 365
def __init__(self, db_path: str = 'stock_data.db') -> None:
"""
初始化股票数据获取器
Args:
db_path: SQLite数据库文件路径
"""
self.db_path = db_path
self.init_db()
logger.info(f"股票数据获取器已初始化,数据库: {db_path}")
def init_db(self) -> None:
"""初始化数据库表结构"""
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
cursor.execute('''
CREATE TABLE IF NOT EXISTS stock_daily (
stock_code TEXT,
date DATE,
open REAL,
high REAL,
low REAL,
close REAL,
volume INTEGER,
PRIMARY KEY (stock_code, date)
)
''')
conn.commit()
conn.close()
logger.debug("数据库表初始化完成")
@staticmethod
def validate_stock_code(stock_code: str) -> bool:
"""
验证股票代码格式
Args:
stock_code: 股票代码
Returns:
是否为有效的A股代码(6位数字)
"""
return bool(StockDataFetcher.STOCK_CODE_PATTERN.match(stock_code))
def fetch_stock_data(self,
stock_code: str,
start_date: Optional[str] = None,
end_date: Optional[str] = None) -> Optional[pd.DataFrame]:
"""
获取股票历史数据
Args:
stock_code: 股票代码(6位数字,如 '600519')
start_date: 开始日期(YYYYMMDD格式),默认为一年前
end_date: 结束日期(YYYYMMDD格式),默认为今天
Returns:
包含股票数据的DataFrame,失败返回None
Example:
>>> fetcher = StockDataFetcher()
>>> df = fetcher.fetch_stock_data('600519', '20230101', '20240101')
"""
# 验证股票代码
if not self.validate_stock_code(stock_code):
logger.error(f"无效的股票代码: {stock_code},必须是6位数字")
return None
# 设置默认日期
if start_date is None:
start_date = (datetime.now() - timedelta(days=self.DEFAULT_DAYS)).strftime('%Y%m%d')
if end_date is None:
end_date = datetime.now().strftime('%Y%m%d')
try:
logger.info(f"开始获取股票数据: {stock_code}, {start_date} - {end_date}")
# 使用akshare获取数据
df = ak.stock_zh_a_hist(
symbol=stock_code,
period="daily",
start_date=start_date,
end_date=end_date,
adjust="qfq" # 前复权
)
if df.empty:
logger.warning(f"未获取到数据: {stock_code}")
return None
# 重命名列
df = df.rename(columns={
'日期': 'date',
'开盘': 'open',
'最高': 'high',
'最低': 'low',
'收盘': 'close',
'成交量': 'volume'
})
# 选择需要的列
df = df[['date', 'open', 'high', 'low', 'close', 'volume']]
df['stock_code'] = stock_code
logger.info(f"✓ 获取 {stock_code} 数据: {len(df)} 条")
return df
except Exception as e:
logger.error(f"✗ 获取失败 {stock_code}: {e}")
return None
def save_to_db(self, df: pd.DataFrame) -> bool:
"""
保存数据到数据库
Args:
df: 股票数据DataFrame
Returns:
是否保存成功
"""
try:
conn = sqlite3.connect(self.db_path)
df.to_sql('stock_daily', conn, if_exists='append', index=False)
conn.commit()
conn.close()
logger.info(f"💾 数据已保存到数据库: {len(df)} 条")
return True
except Exception as e:
logger.error(f"保存数据失败: {e}")
return False
def load_from_db(self,
stock_code: str,
start_date: Optional[str] = None,
end_date: Optional[str] = None) -> pd.DataFrame:
"""
从数据库加载数据(使用参数化查询防止SQL注入)
Args:
stock_code: 股票代码
start_date: 开始日期(YYYY-MM-DD格式)
end_date: 结束日期(YYYY-MM-DD格式)
Returns:
股票数据DataFrame
Raises:
ValueError: 股票代码格式无效
"""
# 验证股票代码(防止SQL注入)
if not self.validate_stock_code(stock_code):
raise ValueError(f"无效的股票代码: {stock_code}")
conn = sqlite3.connect(self.db_path)
# 构建参数化查询(安全)
query = "SELECT * FROM stock_daily WHERE stock_code = ?"
params = [stock_code]
if start_date:
query += " AND date >= ?"
params.append(start_date)
if end_date:
query += " AND date <= ?"
params.append(end_date)
query += " ORDER BY date"
# 使用参数化查询(防止SQL注入)
df = pd.read_sql_query(query, conn, params=params)
conn.close()
logger.info(f"从数据库加载 {stock_code}: {len(df)} 条记录")
return df
# 使用示例
if __name__ == '__main__':
# 配置日志
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
# 创建数据获取器
fetcher = StockDataFetcher()
# 获取贵州茅台数据
df = fetcher.fetch_stock_data('600519', '20230101', '20240101')
if df is not None:
fetcher.save_to_db(df)
# 从数据库读取数据
loaded_df = fetcher.load_from_db('600519')
print(f"从数据库加载数据: {len(loaded_df)} 条")
print(loaded_df.head())
2. 技术指标计算
python
import numpy as np
import pandas as pd
class TechnicalIndicators:
"""技术指标计算器"""
@staticmethod
def calculate_ma(df, periods=[5, 10, 20, 60]):
"""计算移动平均线"""
for period in periods:
df[f'MA{period}'] = df['close'].rolling(window=period).mean()
return df
@staticmethod
def calculate_ema(df, periods=[12, 26]):
"""计算指数移动平均线"""
for period in periods:
df[f'EMA{period}'] = df['close'].ewm(span=period, adjust=False).mean()
return df
@staticmethod
def calculate_macd(df, fast=12, slow=26, signal=9):
"""计算MACD指标"""
# 计算EMA
ema_fast = df['close'].ewm(span=fast, adjust=False).mean()
ema_slow = df['close'].ewm(span=slow, adjust=False).mean()
# MACD线
df['MACD'] = ema_fast - ema_slow
# 信号线
df['MACD_signal'] = df['MACD'].ewm(span=signal, adjust=False).mean()
# MACD柱
df['MACD_hist'] = df['MACD'] - df['MACD_signal']
return df
@staticmethod
def calculate_rsi(df, period=14):
"""计算RSI相对强弱指标"""
delta = df['close'].diff()
gain = (delta.where(delta > 0, 0)).rolling(window=period).mean()
loss = (-delta.where(delta < 0, 0)).rolling(window=period).mean()
rs = gain / loss
df['RSI'] = 100 - (100 / (1 + rs))
return df
@staticmethod
def calculate_bollinger_bands(df, period=20, std_dev=2):
"""计算布林带"""
df['BB_middle'] = df['close'].rolling(window=period).mean()
std = df['close'].rolling(window=period).std()
df['BB_upper'] = df['BB_middle'] + (std_dev * std)
df['BB_lower'] = df['BB_middle'] - (std_dev * std)
return df
@staticmethod
def calculate_kdj(df, n=9, m1=3, m2=3):
"""计算KDJ指标"""
low_min = df['low'].rolling(window=n).min()
high_max = df['high'].rolling(window=n).max()
rsv = (df['close'] - low_min) / (high_max - low_min) * 100
df['K'] = rsv.ewm(com=m1 - 1, adjust=False).mean()
df['D'] = df['K'].ewm(com=m2 - 1, adjust=False).mean()
df['J'] = 3 * df['K'] - 2 * df['D']
return df
@staticmethod
def calculate_all(df):
"""计算所有常用指标"""
df = TechnicalIndicators.calculate_ma(df)
df = TechnicalIndicators.calculate_ema(df)
df = TechnicalIndicators.calculate_macd(df)
df = TechnicalIndicators.calculate_rsi(df)
df = TechnicalIndicators.calculate_bollinger_bands(df)
df = TechnicalIndicators.calculate_kdj(df)
return df
# 使用示例
indicators = TechnicalIndicators()
df = indicators.calculate_all(df)
print(df[['date', 'close', 'MA5', 'MA20', 'RSI', 'MACD']].tail())
3. K线图绘制
python
import mplfinance as mpf
import matplotlib.pyplot as plt
class StockVisualizer:
"""股票数据可视化"""
def __init__(self):
# 设置中文字体
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
def plot_candlestick(self, df, title='K线图', save_path=None):
"""绘制K线图"""
# 准备数据
df_plot = df.copy()
df_plot['date'] = pd.to_datetime(df_plot['date'])
df_plot.set_index('date', inplace=True)
# 定义样式
mc = mpf.make_marketcolors(
up='red',
down='green',
edge='inherit',
wick='inherit',
volume='in'
)
style = mpf.make_mpf_style(
marketcolors=mc,
gridstyle='-',
y_on_right=True
)
# 添加移动平均线
add_plots = []
if 'MA5' in df_plot.columns:
add_plots.append(mpf.make_addplot(df_plot['MA5'], color='blue', width=1))
if 'MA20' in df_plot.columns:
add_plots.append(mpf.make_addplot(df_plot['MA20'], color='orange', width=1))
# 绘制
mpf.plot(
df_plot,
type='candle',
style=style,
title=title,
ylabel='价格',
volume=True,
addplot=add_plots if add_plots else None,
savefig=save_path if save_path else None
)
def plot_indicators(self, df, stock_code='Stock'):
"""绘制技术指标"""
fig, axes = plt.subplots(4, 1, figsize=(14, 12))
df_plot = df.copy()
df_plot['date'] = pd.to_datetime(df_plot['date'])
# 1. 价格和均线
axes[0].plot(df_plot['date'], df_plot['close'], label='收盘价', linewidth=2)
axes[0].plot(df_plot['date'], df_plot['MA5'], label='MA5', alpha=0.7)
axes[0].plot(df_plot['date'], df_plot['MA20'], label='MA20', alpha=0.7)
axes[0].set_title(f'{stock_code} - 价格走势')
axes[0].set_ylabel('价格')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
# 2. MACD
axes[1].plot(df_plot['date'], df_plot['MACD'], label='MACD', linewidth=2)
axes[1].plot(df_plot['date'], df_plot['MACD_signal'], label='Signal', linewidth=2)
axes[1].bar(df_plot['date'], df_plot['MACD_hist'], label='Histogram', alpha=0.3)
axes[1].set_title('MACD指标')
axes[1].set_ylabel('MACD')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
# 3. RSI
axes[2].plot(df_plot['date'], df_plot['RSI'], label='RSI', color='purple', linewidth=2)
axes[2].axhline(y=70, color='r', linestyle='--', label='超买线', alpha=0.5)
axes[2].axhline(y=30, color='g', linestyle='--', label='超卖线', alpha=0.5)
axes[2].set_title('RSI指标')
axes[2].set_ylabel('RSI')
axes[2].set_ylim(0, 100)
axes[2].legend()
axes[2].grid(True, alpha=0.3)
# 4. 成交量
axes[3].bar(df_plot['date'], df_plot['volume'], alpha=0.5)
axes[3].set_title('成交量')
axes[3].set_ylabel('成交量')
axes[3].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
# 使用示例
visualizer = StockVisualizer()
visualizer.plot_candlestick(df, title='贵州茅台 K线图')
visualizer.plot_indicators(df, stock_code='600519')
项目14:PDF批量处理工具
项目简介
命令行工具,批量处理PDF文件,支持合并、拆分、加密、解密、添加水印等功能。
技术栈
- PDF处理:PyPDF2, pdfrw
- PDF生成:reportlab
- CLI框架:Click
- 进度显示:tqdm
核心功能实现
1. PDF处理核心类
python
from PyPDF2 import PdfReader, PdfWriter, PdfMerger
from pathlib import Path
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
import io
class PDFProcessor:
"""PDF处理器"""
@staticmethod
def merge_pdfs(pdf_files, output_path):
"""合并多个PDF文件"""
merger = PdfMerger()
for pdf_file in pdf_files:
try:
merger.append(pdf_file)
print(f" ✓ 添加: {Path(pdf_file).name}")
except Exception as e:
print(f" ✗ 失败: {Path(pdf_file).name} - {e}")
# 保存合并后的PDF
merger.write(output_path)
merger.close()
print(f"\n✅ 合并完成: {output_path}")
@staticmethod
def split_pdf(pdf_file, output_dir, pages_per_file=1):
"""拆分PDF文件"""
reader = PdfReader(pdf_file)
total_pages = len(reader.pages)
print(f"📄 总页数: {total_pages}")
output_dir = Path(output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
file_count = 0
for i in range(0, total_pages, pages_per_file):
writer = PdfWriter()
# 添加指定范围的页面
for j in range(i, min(i + pages_per_file, total_pages)):
writer.add_page(reader.pages[j])
# 保存文件
output_path = output_dir / f"split_{file_count + 1}.pdf"
with open(output_path, 'wb') as f:
writer.write(f)
file_count += 1
print(f" ✓ 创建: {output_path.name}")
print(f"\n✅ 拆分完成: {file_count} 个文件")
@staticmethod
def encrypt_pdf(pdf_file, output_path, password):
"""加密PDF文件"""
reader = PdfReader(pdf_file)
writer = PdfWriter()
# 复制所有页面
for page in reader.pages:
writer.add_page(page)
# 加密
writer.encrypt(password)
# 保存
with open(output_path, 'wb') as f:
writer.write(f)
print(f"🔒 PDF已加密: {output_path}")
@staticmethod
def decrypt_pdf(pdf_file, output_path, password):
"""解密PDF文件"""
reader = PdfReader(pdf_file)
if reader.is_encrypted:
reader.decrypt(password)
writer = PdfWriter()
# 复制所有页面
for page in reader.pages:
writer.add_page(page)
# 保存
with open(output_path, 'wb') as f:
writer.write(f)
print(f"🔓 PDF已解密: {output_path}")
@staticmethod
def extract_pages(pdf_file, output_path, page_numbers):
"""提取指定页面"""
reader = PdfReader(pdf_file)
writer = PdfWriter()
for page_num in page_numbers:
if 0 <= page_num < len(reader.pages):
writer.add_page(reader.pages[page_num])
print(f" ✓ 提取第 {page_num + 1} 页")
else:
print(f" ✗ 页码 {page_num + 1} 超出范围")
with open(output_path, 'wb') as f:
writer.write(f)
print(f"\n✅ 提取完成: {output_path}")
@staticmethod
def add_watermark(pdf_file, output_path, watermark_text):
"""添加水印"""
# 创建水印PDF
packet = io.BytesIO()
can = canvas.Canvas(packet, pagesize=letter)
# 设置水印样式
can.setFont("Helvetica", 60)
can.setFillColorRGB(0.5, 0.5, 0.5, alpha=0.3)
# 旋转文字
can.rotate(45)
can.drawString(200, 100, watermark_text)
can.save()
packet.seek(0)
# 读取水印PDF
watermark_pdf = PdfReader(packet)
watermark_page = watermark_pdf.pages[0]
# 读取原PDF
reader = PdfReader(pdf_file)
writer = PdfWriter()
# 为每一页添加水印
for page in reader.pages:
page.merge_page(watermark_page)
writer.add_page(page)
# 保存
with open(output_path, 'wb') as f:
writer.write(f)
print(f"💧 水印已添加: {output_path}")
@staticmethod
def get_pdf_info(pdf_file):
"""获取PDF信息"""
reader = PdfReader(pdf_file)
info = {
'文件名': Path(pdf_file).name,
'页数': len(reader.pages),
'是否加密': reader.is_encrypted
}
# 获取元数据
if reader.metadata:
info['标题'] = reader.metadata.get('/Title', 'N/A')
info['作者'] = reader.metadata.get('/Author', 'N/A')
info['创建时间'] = reader.metadata.get('/CreationDate', 'N/A')
return info
2. CLI命令行界面
python
import click
from pathlib import Path
from tqdm import tqdm
@click.group()
def cli():
"""PDF批量处理工具"""
pass
@cli.command()
@click.argument('pdf_files', nargs=-1, type=click.Path(exists=True))
@click.option('--output', '-o', default='merged.pdf', help='输出文件名')
def merge(pdf_files, output):
"""合并多个PDF文件"""
if len(pdf_files) < 2:
click.echo("❌ 至少需要2个PDF文件")
return
click.echo(f"📚 合并 {len(pdf_files)} 个PDF文件...")
PDFProcessor.merge_pdfs(pdf_files, output)
@cli.command()
@click.argument('pdf_file', type=click.Path(exists=True))
@click.option('--output-dir', '-d', default='split_output', help='输出目录')
@click.option('--pages', '-p', default=1, help='每个文件的页数')
def split(pdf_file, output_dir, pages):
"""拆分PDF文件"""
click.echo(f"✂️ 拆分PDF文件...")
PDFProcessor.split_pdf(pdf_file, output_dir, pages)
@cli.command()
@click.argument('pdf_file', type=click.Path(exists=True))
@click.option('--output', '-o', required=True, help='输出文件名')
@click.option('--password', '-p', prompt=True, hide_input=True, help='密码')
def encrypt(pdf_file, output, password):
"""加密PDF文件"""
click.echo("🔒 加密PDF文件...")
PDFProcessor.encrypt_pdf(pdf_file, output, password)
@cli.command()
@click.argument('pdf_file', type=click.Path(exists=True))
@click.option('--output', '-o', required=True, help='输出文件名')
@click.option('--password', '-p', prompt=True, hide_input=True, help='密码')
def decrypt(pdf_file, output, password):
"""解密PDF文件"""
click.echo("🔓 解密PDF文件...")
try:
PDFProcessor.decrypt_pdf(pdf_file, output, password)
except:
click.echo("❌ 密码错误或文件未加密")
@cli.command()
@click.argument('pdf_file', type=click.Path(exists=True))
@click.option('--output', '-o', required=True, help='输出文件名')
@click.option('--pages', '-p', required=True, help='页码(用逗号分隔,如: 1,3,5)')
def extract(pdf_file, output, pages):
"""提取指定页面"""
# 解析页码
page_numbers = [int(p) - 1 for p in pages.split(',')]
click.echo(f"📄 提取页面: {pages}")
PDFProcessor.extract_pages(pdf_file, output, page_numbers)
@cli.command()
@click.argument('pdf_file', type=click.Path(exists=True))
def info(pdf_file):
"""查看PDF信息"""
info = PDFProcessor.get_pdf_info(pdf_file)
click.echo("\n" + "=" * 50)
click.echo("📋 PDF文件信息")
click.echo("=" * 50)
for key, value in info.items():
click.echo(f"{key:12s}: {value}")
click.echo("=" * 50 + "\n")
@cli.command()
@click.argument('directory', type=click.Path(exists=True))
@click.option('--output', '-o', default='batch_merged.pdf', help='输出文件名')
def batch_merge(directory, output):
"""批量合并目录中的所有PDF"""
pdf_files = sorted(Path(directory).glob('*.pdf'))
if not pdf_files:
click.echo("❌ 目录中没有找到PDF文件")
return
click.echo(f"📂 找到 {len(pdf_files)} 个PDF文件")
with tqdm(total=len(pdf_files), desc="合并进度") as pbar:
merger = PdfMerger()
for pdf_file in pdf_files:
try:
merger.append(str(pdf_file))
pbar.update(1)
except Exception as e:
click.echo(f"\n✗ {pdf_file.name}: {e}")
merger.write(output)
merger.close()
click.echo(f"\n✅ 批量合并完成: {output}")
if __name__ == '__main__':
cli()
项目15:简易ORM框架
项目简介
从零实现一个简单的ORM框架,深入理解Python元类、描述符等高级特性,以及ORM的工作原理。
技术栈
- 数据库:SQLite
- 核心技术:元类、描述符、反射
- SQL生成:字符串格式化、参数绑定
- 类型映射:Python类型与SQL类型转换
核心功能实现
1. 字段描述符
python
class Field:
"""字段基类"""
def __init__(self, column_type, primary_key=False, default=None, nullable=True):
self.column_type = column_type
self.primary_key = primary_key
self.default = default
self.nullable = nullable
self.name = None # 字段名,由元类设置
def __get__(self, instance, owner):
if instance is None:
return self
return instance.__dict__.get(self.name)
def __set__(self, instance, value):
instance.__dict__[self.name] = value
class IntegerField(Field):
"""整数字段"""
def __init__(self, primary_key=False, default=None):
super().__init__('INTEGER', primary_key, default)
class StringField(Field):
"""字符串字段"""
def __init__(self, max_length=255, default=None):
super().__init__(f'VARCHAR({max_length})', False, default)
self.max_length = max_length
class FloatField(Field):
"""浮点数字段"""
def __init__(self, default=None):
super().__init__('REAL', False, default)
class BooleanField(Field):
"""布尔字段"""
def __init__(self, default=False):
super().__init__('BOOLEAN', False, default)
class DateTimeField(Field):
"""日期时间字段"""
def __init__(self, default=None):
super().__init__('DATETIME', False, default)
2. Model元类
python
class ModelMetaclass(type):
"""Model元类"""
def __new__(mcs, name, bases, attrs):
# 跳过Model基类本身
if name == 'Model':
return super().__new__(mcs, name, bases, attrs)
# 获取表名
table_name = attrs.get('__table__', name.lower())
# 收集字段
fields = {}
primary_key = None
for key, value in list(attrs.items()):
if isinstance(value, Field):
fields[key] = value
value.name = key
if value.primary_key:
if primary_key:
raise ValueError(f"Duplicate primary key for field: {key}")
primary_key = key
# 如果没有主键,自动添加id字段
if not primary_key:
id_field = IntegerField(primary_key=True)
id_field.name = 'id'
fields['id'] = id_field
primary_key = 'id'
attrs['id'] = id_field
# 保存元数据
attrs['__table__'] = table_name
attrs['__fields__'] = fields
attrs['__primary_key__'] = primary_key
return super().__new__(mcs, name, bases, attrs)
3. Model基类
python
import sqlite3
from datetime import datetime
import logging
from typing import Optional, List, Any
# 配置日志
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
class Model(metaclass=ModelMetaclass):
"""
ORM Model基类
提供数据库操作的基础功能,所有模型类都应继承此类
使用元类自动处理字段定义和表结构
"""
_db_connection = None
@classmethod
def set_db(cls, db_path: str):
"""
设置数据库连接
Args:
db_path: SQLite数据库文件路径
"""
cls._db_connection = sqlite3.connect(db_path)
cls._db_connection.row_factory = sqlite3.Row
logger.info(f"✓ 数据库连接成功: {db_path}")
@classmethod
def create_table(cls):
"""创建表"""
fields_sql = []
for name, field in cls.__fields__.items():
field_sql = f"{name} {field.column_type}"
if field.primary_key:
field_sql += " PRIMARY KEY AUTOINCREMENT"
if not field.nullable:
field_sql += " NOT NULL"
if field.default is not None:
field_sql += f" DEFAULT {field.default}"
fields_sql.append(field_sql)
sql = f"CREATE TABLE IF NOT EXISTS {cls.__table__} ({', '.join(fields_sql)})"
cls._db_connection.execute(sql)
cls._db_connection.commit()
print(f"✓ 表已创建: {cls.__table__}")
def save(self):
"""保存记录"""
fields = []
values = []
for name, field in self.__fields__.items():
if field.primary_key and getattr(self, name, None) is None:
continue # 跳过自增主键
value = getattr(self, name, field.default)
fields.append(name)
values.append(value)
placeholders = ','.join(['?' for _ in values])
sql = f"INSERT INTO {self.__table__} ({','.join(fields)}) VALUES ({placeholders})"
cursor = self._db_connection.execute(sql, values)
self._db_connection.commit()
# 设置主键值
if getattr(self, self.__primary_key__) is None:
setattr(self, self.__primary_key__, cursor.lastrowid)
return self
def update(self):
"""更新记录"""
primary_key = self.__primary_key__
pk_value = getattr(self, primary_key)
if pk_value is None:
raise ValueError("Cannot update: primary key is not set")
fields = []
values = []
for name, field in self.__fields__.items():
if field.primary_key:
continue
value = getattr(self, name, field.default)
fields.append(f"{name} = ?")
values.append(value)
values.append(pk_value)
sql = f"UPDATE {self.__table__} SET {','.join(fields)} WHERE {primary_key} = ?"
self._db_connection.execute(sql, values)
self._db_connection.commit()
return self
def delete(self):
"""删除记录"""
primary_key = self.__primary_key__
pk_value = getattr(self, primary_key)
if pk_value is None:
raise ValueError("Cannot delete: primary key is not set")
sql = f"DELETE FROM {self.__table__} WHERE {primary_key} = ?"
self._db_connection.execute(sql, (pk_value,))
self._db_connection.commit()
@classmethod
def get(cls, pk):
"""通过主键获取记录"""
sql = f"SELECT * FROM {cls.__table__} WHERE {cls.__primary_key__} = ?"
cursor = cls._db_connection.execute(sql, (pk,))
row = cursor.fetchone()
if row:
return cls._row_to_object(row)
return None
@classmethod
def all(cls):
"""获取所有记录"""
sql = f"SELECT * FROM {cls.__table__}"
cursor = cls._db_connection.execute(sql)
return [cls._row_to_object(row) for row in cursor.fetchall()]
@classmethod
def filter(cls, **kwargs) -> List['Model']:
"""
条件查询(使用参数化查询,防止SQL注入)
Args:
**kwargs: 字段名=值 的过滤条件
Returns:
List[Model]: 符合条件的模型对象列表
Raises:
ValueError: 当传入的字段名不存在时
Examples:
>>> User.filter(age=25, name='张三')
>>> Product.filter(price=99.9)
安全说明:
- 验证字段名是否在模型定义中(防止SQL注入)
- 使用参数化查询绑定值(防止SQL注入)
"""
if not kwargs:
logger.warning("filter() 未提供过滤条件,返回所有记录")
return cls.all()
conditions = []
values = []
for key, value in kwargs.items():
# ⚠️ 安全检查:验证字段名是否存在(防止SQL注入)
if key not in cls.__fields__:
raise ValueError(
f"❌ 无效的字段名: '{key}' 不在 {cls.__name__} 模型中\n"
f" 可用字段: {', '.join(cls.__fields__.keys())}"
)
conditions.append(f"{key} = ?")
values.append(value)
where_clause = " AND ".join(conditions)
sql = f"SELECT * FROM {cls.__table__} WHERE {where_clause}"
logger.debug(f"执行查询: {sql} with values {values}")
cursor = cls._db_connection.execute(sql, values)
results = [cls._row_to_object(row) for row in cursor.fetchall()]
logger.info(f"✓ 查询完成: 找到 {len(results)} 条记录")
return results
@classmethod
def _row_to_object(cls, row):
"""将数据库行转换为对象"""
obj = cls()
for name in cls.__fields__.keys():
setattr(obj, name, row[name])
return obj
def __repr__(self):
fields_str = ', '.join(
f"{name}={getattr(self, name)}"
for name in self.__fields__.keys()
)
return f"<{self.__class__.__name__}({fields_str})>"
4. 使用示例
python
# 定义Model
class User(Model):
__table__ = 'users'
name = StringField(max_length=100)
email = StringField(max_length=100)
age = IntegerField(default=0)
is_active = BooleanField(default=True)
class Post(Model):
__table__ = 'posts'
title = StringField(max_length=200)
content = StringField(max_length=5000)
user_id = IntegerField()
created_at = StringField() # 简化处理,实际应用可扩展DateTimeField
# 使用ORM
if __name__ == '__main__':
# 设置数据库
Model.set_db('test.db')
# 创建表
User.create_table()
Post.create_table()
# 创建记录
user1 = User()
user1.name = 'Alice'
user1.email = 'alice@example.com'
user1.age = 25
user1.save()
user2 = User()
user2.name = 'Bob'
user2.email = 'bob@example.com'
user2.age = 30
user2.save()
# 查询所有
all_users = User.all()
print("\n所有用户:")
for user in all_users:
print(f" {user}")
# 条件查询
alice = User.filter(name='Alice')
print(f"\n查询结果: {alice}")
# 通过主键查询
user = User.get(1)
print(f"\nID=1的用户: {user}")
# 更新
if user:
user.age = 26
user.update()
print(f"更新后: {user}")
# 删除
# user.delete()
# print("用户已删除")
print("\n✅ ORM演示完成!")
参考资料
Web开发框架
- Flask官方文档:https://flask.palletsprojects.com/
- Django官方文档:https://docs.djangoproject.com/
- FastAPI官方文档:https://fastapi.tiangolo.com/
- Tornado文档:https://www.tornadoweb.org/
- Sanic异步框架:https://sanic.dev/
数据库与ORM
- SQLAlchemy文档:https://www.sqlalchemy.org/
- PyMongo文档:https://pymongo.readthedocs.io/
- Peewee轻量ORM:http://docs.peewee-orm.com/
- Redis-py文档:https://redis-py.readthedocs.io/
- 数据库设计规范:https://www.sqlstyle.guide/
爬虫与数据采集
- Scrapy官方文档:https://docs.scrapy.org/
- Beautiful Soup文档:https://www.crummy.com/software/BeautifulSoup/bs4/doc/
- Selenium文档:https://selenium-python.readthedocs.io/
- Requests库文档:https://requests.readthedocs.io/
- lxml文档:https://lxml.de/
数据分析与可视化
- Pandas官方文档:https://pandas.pydata.org/docs/
- NumPy文档:https://numpy.org/doc/
- Matplotlib文档:https://matplotlib.org/stable/contents.html
- Seaborn文档:https://seaborn.pydata.org/
- Plotly文档:https://plotly.com/python/
- Bokeh交互可视化:https://docs.bokeh.org/
测试与质量
- pytest文档:https://docs.pytest.org/
- unittest文档:https://docs.python.org/3/library/unittest.html
- coverage.py代码覆盖:https://coverage.readthedocs.io/
- black代码格式化:https://black.readthedocs.io/
- pylint静态检查:https://pylint.pycqa.org/
- mypy类型检查:https://mypy.readthedocs.io/
异步编程
- asyncio官方文档:https://docs.python.org/3/library/asyncio.html
- aiohttp文档:https://docs.aiohttp.org/
- httpx异步HTTP:https://www.python-httpx.org/
- Trio异步框架:https://trio.readthedocs.io/
任务调度与队列
- Celery文档:https://docs.celeryq.dev/
- RQ(Redis Queue):https://python-rq.org/
- APScheduler定时任务:https://apscheduler.readthedocs.io/
- Dramatiq消息队列:https://dramatiq.io/
API开发
- RESTful API设计指南:https://restfulapi.net/
- GraphQL-Python:https://graphql-python.github.io/
- API设计最佳实践:https://github.com/microsoft/api-guidelines
- Swagger/OpenAPI规范:https://swagger.io/specification/
部署与运维
- Docker官方文档:https://docs.docker.com/
- Gunicorn文档:https://docs.gunicorn.org/
- Nginx配置:https://nginx.org/en/docs/
- Supervisor进程管理:http://supervisord.org/
- systemd服务:https://www.freedesktop.org/software/systemd/man/
安全
- OWASP Top 10:https://owasp.org/www-project-top-ten/
- Python安全最佳实践:https://python.readthedocs.io/en/stable/library/security_warnings.html
- JWT认证:https://pyjwt.readthedocs.io/
- cryptography加密库:https://cryptography.io/
书籍推荐
- 《Flask Web开发实战》
- 《Django企业开发实战》
- 《Python网络数据采集》
- 《利用Python进行数据分析》(第2版)
- 《流畅的Python》
- 《Effective Python》(第2版)
- 《Python并发编程实战》
- 《Python高性能编程》
在线课程
- Real Python:https://realpython.com/
- Talk Python Training:https://training.talkpython.fm/
- Udemy Python进阶课程
- Coursera专项课程
- 极客时间Python专栏
社区与博客
- Stack Overflow:https://stackoverflow.com/questions/tagged/python
- Python Weekly周报:https://www.pythonweekly.com/
- Awesome Python项目:https://github.com/vinta/awesome-python
- Python Bytes播客:https://pythonbytes.fm/
- Talk Python播客:https://talkpython.fm/
代码规范与最佳实践
- PEP 8风格指南:https://pep8.org/
- Google Python风格指南:https://google.github.io/styleguide/pyguide.html
- The Hitchhiker's Guide to Python:https://docs.python-guide.org/
- Python最佳实践:https://gist.github.com/sloria/7001839
工具与库
- PyPI包仓库:https://pypi.org/
- Awesome Python列表:https://github.com/vinta/awesome-python
- Python Wheels:https://pythonwheels.com/
- Python 3 Module of the Week:https://pymotw.com/3/