第6章: 元编程 --- Python的"超能力"
Java/Kotlin 的元编程能力被严格限制在编译期:注解处理器、编译器插件、字节码增强------每一个都需要工具链支持,运行时能做的极其有限。Python 则把元编程能力直接暴露给了语言使用者:描述符协议控制属性访问、元类控制类的创建、
__getattr__拦截一切属性查找、AST 操作在运行时变换代码。这些能力不是框架的特权,而是每个 Python 开发者都可以直接使用的语言原语。本章逐个拆解,每个知识点都从 Java/Kotlin 的认知出发,帮你建立正确的心理模型。
6.1 描述符协议: Python OOP 的底层机制
Java/Kotlin 对比
java
// Java: 没有描述符协议。属性访问就是字段读写,无法拦截。
// 如果要拦截,只能用 getter/setter(手动)或 Lombok 生成。
public class User {
private String name;
// 想验证?只能手动写 getter/setter
public void setName(String name) {
if (name == null || name.isBlank()) {
throw new IllegalArgumentException("name cannot be blank");
}
this.name = name;
}
public String getName() {
return this.name;
}
}
// Java 的字段访问 obj.name 永远是直接内存访问,无法被拦截。
// 即使用了 getter,调用者仍然可以绕过(反射)。
kotlin
// Kotlin: 属性委托(property delegation)是最接近描述符的概念
// 但 Kotlin 的委托是语法层面的,不是协议层面的
class User {
// by 委托给一个提供了 getValue/setValue 的对象
var name: String by NameValidator()
// 但这个委托是编译期决定的,不能动态改变行为
// 而且委托对象必须实现特定的 operator fun
}
class NameValidator {
private var value: String = ""
operator fun getValue(thisRef: Any?, property: KProperty<*>): String = value
operator fun setValue(thisRef: Any?, property: KProperty<*>, value: String) {
require(value.isNotBlank()) { "name cannot be blank" }
this.value = value
}
}
核心差异 : Kotlin 的属性委托是编译期语法糖,Python 的描述符是运行时协议。描述符是 Python 属性访问的底层机制------property、classmethod、staticmethod 全部基于描述符实现。
Python 实现
python
# === 描述符协议的本质 ===
# 只要一个类实现了 __get__, __set__, __delete__ 中的任意一个,
# 它就是描述符。定义在类属性中时,会自动触发协议。
# === 1. 最简描述符 ===
class SimpleDescriptor:
"""一个只读描述符:每次访问都返回动态计算的值"""
def __get__(self, obj, objtype=None):
if obj is None:
# 通过类访问时,obj 是 None
return self
return f"value for {obj.__class__.__name__}"
class MyClass:
attr = SimpleDescriptor() # 定义为类属性
obj = MyClass()
print(obj.attr) # value for MyClass
print(MyClass.attr) # <SimpleDescriptor object>
# === 2. 数据描述符 vs 非数据描述符 ===
# 数据描述符:实现了 __get__ + __set__/__delete__
# 非数据描述符:只实现了 __get__
# 优先级:数据描述符 > 实例字典 > 非数据描述符
class DataDescriptor:
"""数据描述符:优先级高于实例属性"""
def __get__(self, obj, objtype=None):
if obj is None:
return self
return getattr(self, '_value', 'default')
def __set__(self, obj, value):
self._value = value
class NonDataDescriptor:
"""非数据描述符:优先级低于实例属性"""
def __get__(self, obj, objtype=None):
if obj is None:
return self
return "non-data value"
class PriorityDemo:
data = DataDescriptor()
non_data = NonDataDescriptor()
demo = PriorityDemo()
demo.data = "from instance" # 调用 DataDescriptor.__set__
print(demo.data) # from instance(描述符存储了值)
demo.non_data = "from instance" # 直接写入实例字典,绕过描述符
print(demo.non_data) # from instance(实例字典优先于非数据描述符)
del demo.non_data # 删除实例属性
print(demo.non_data) # non-data value(回退到非数据描述符)
# === 3. 实战:类型检查描述符 ===
class TypedField:
"""通用类型检查描述符"""
def __init__(self, field_name: str, field_type: type):
self.field_name = field_name
self.field_type = field_type
self.storage_name = f'_typed_{field_name}'
def __set_name__(self, owner, name):
# 6.4 节详细讲 __set_name__
self.storage_name = f'_typed_{name}'
def __get__(self, obj, objtype=None):
if obj is None:
return self
return getattr(obj, self.storage_name, None)
def __set__(self, obj, value):
if not isinstance(value, self.field_type):
raise TypeError(
f"{self.field_name} must be {self.field_type.__name__}, "
f"got {type(value).__name__}"
)
setattr(obj, self.storage_name, value)
class Product:
name = TypedField("name", str)
price = TypedField("price", (int, float))
in_stock = TypedField("in_stock", bool)
p = Product()
p.name = "Widget"
p.price = 9.99
p.in_stock = True
print(p.name, p.price, p.in_stock) # Widget 9.99 True
try:
p.price = "free"
except TypeError as e:
print(e) # price must be int, got str
# === 4. 实战:延迟计算描述符 ===
class LazyProperty:
"""只在首次访问时计算,之后缓存结果"""
def __init__(self, func):
self.func = func
self.attrname = func.__name__
def __get__(self, obj, objtype=None):
if obj is None:
return self
value = self.func(obj)
# 缓存到实例字典,绕过后续的描述符查找
setattr(obj, self.attrname, value)
return value
class HeavyResource:
def __init__(self, data_source: str):
self.data_source = data_source
@LazyProperty
def parsed_data(self):
print(f"[模拟耗时操作] 解析 {self.data_source}...")
return f"parsed: {self.data_source.upper()}"
resource = HeavyResource("large_dataset.csv")
print(resource.parsed_data) # [模拟耗时操作] 解析 large_dataset.csv... parsed: LARGE_DATASET.CSV
print(resource.parsed_data) # parsed: LARGE_DATASET.CSV(第二次没有重新计算)
# === 5. 揭秘:property, classmethod, staticmethod 都是描述符 ===
print(type(property())) # <class 'property'> --- 数据描述符
print(type(classmethod)) # <class 'classmethod'> --- 描述符
print(type(staticmethod)) # <class 'staticmethod'> --- 描述符
# property 的本质就是一个数据描述符
class ManualProperty:
def __init__(self, getter):
self.getter = getter
def __get__(self, obj, objtype=None):
if obj is None:
return self
return self.getter(obj)
def setter(self, setter):
# 返回一个新的描述符,同时拥有 getter 和 setter
self.setter_fn = setter
return self
def __set__(self, obj, value):
self.setter_fn(obj, value)
class Circle:
def __init__(self, radius):
self._radius = radius
@ManualProperty
def area(self):
return 3.14159 * self._radius ** 2
@area.setter
def area(self, value):
# 反向计算半径
self._radius = (value / 3.14159) ** 0.5
c = Circle(5)
print(c.area) # 78.53975
c.area = 50
print(c._radius) # 3.9894...
核心差异
| 维度 | Java/Kotlin | Python |
|---|---|---|
| 属性访问拦截 | getter/setter 或 Kotlin 委托(编译期) | 描述符协议(运行时) |
| 作用范围 | 单个属性 | 可复用的通用协议 |
| 动态性 | 编译期固定 | 运行时可替换 |
| 底层机制 | 编译器生成字节码 | __getattribute__ 内部调用描述符协议 |
常见陷阱
python
# 陷阱1: 描述符必须定义为类属性,不能是实例属性
class Wrong:
def __init__(self):
self.desc = TypedField("x", int) # 错!这是实例属性,不会触发协议
w = Wrong()
w.desc = 42 # 只是普通赋值,TypedField.__set__ 不会被调用
# 陷阱2: 数据描述符会覆盖实例字典
class AlwaysTen:
def __get__(self, obj, objtype=None):
return 10
def __set__(self, obj, value):
pass # 静默忽略所有赋值
class Stubborn:
x = AlwaysTen()
s = Stubborn()
s.x = 999
print(s.x) # 10!__set__ 没有存储值,__get__ 返回 10
# 陷阱3: __get__ 中 obj 为 None 的情况
class MyDescriptor:
def __get__(self, obj, objtype=None):
# 忘记检查 obj is None 会导致意外行为
return obj.some_attr # 通过类访问时 obj 是 None,这里会报错
元类实际应用场景
场景1: ORM 模型基类(简化版 Django Model)
python
class ModelMeta(type):
"""自动为模型类创建表结构和字段描述"""
def __new__(mcs, name, bases, namespace):
# 跳过基类本身
if name == 'Model':
return super().__new__(mcs, name, bases, namespace)
# 收集字段定义
fields = {}
for key, value in list(namespace.items()):
if isinstance(value, Field):
value.name = key
fields[key] = value
namespace['_fields'] = fields
namespace['_table'] = name.lower()
cls = super().__new__(mcs, name, bases, namespace)
print(f"[ModelMeta] 注册模型: {name}, 表: {cls._table}, 字段: {list(fields.keys())}")
return cls
class Field:
def __init__(self, field_type, primary_key=False):
self.field_type = field_type
self.name = None
self.primary_key = primary_key
def __repr__(self):
return f"Field({self.name}, {self.field_type.__name__})"
class Model(metaclass=ModelMeta):
@classmethod
def create_table_sql(cls):
columns = []
for name, field in cls._fields.items():
col = f"{name} {field.field_type.__name__.upper()}"
if field.primary_key:
col += " PRIMARY KEY"
columns.append(col)
return f"CREATE TABLE {cls._table} ({', '.join(columns)});"
# 定义模型
class User(Model):
id = Field(int, primary_key=True)
name = Field(str)
email = Field(str)
class Post(Model):
id = Field(int, primary_key=True)
title = Field(str)
user_id = Field(int)
print(User.create_table_sql())
# CREATE TABLE user (id INT PRIMARY KEY, name STR, email STR);
print(Post.create_table_sql())
# CREATE TABLE post (id INT PRIMARY KEY, title STR, user_id INT);
场景2: 单例注册表
python
class RegistryMeta(type):
"""所有实例自动注册到类级别的注册表"""
def __init__(cls, name, bases, namespace):
super().__init__(name, bases, namespace)
if not hasattr(cls, '_registry'):
cls._registry = {}
elif name != 'RegistryBase':
cls._registry[name] = cls
class RegistryBase(metaclass=RegistryMeta):
@classmethod
def get_all(cls):
return dict(cls._registry)
@classmethod
def get(cls, name):
return cls._registry.get(name)
class PluginA(RegistryBase):
def run(self):
return "PluginA"
class PluginB(RegistryBase):
def run(self):
return "PluginB"
print(f"已注册插件: {list(RegistryBase.get_all().keys())}")
# ['PluginA', 'PluginB']
plugin = RegistryBase.get('PluginA')
print(f"PluginA.run(): {plugin.run()}")
何时用元类 vs 其他方案
| 需求 | 推荐方案 | 原因 |
|---|---|---|
| 自动注册 | __init_subclass__ |
更简单,Python 3.6+ |
| 属性验证 | 描述符 / __setattr__ |
更精确,可按属性定制 |
| API 强制约束 | ABC + @abstractmethod |
标准做法 |
| ORM/模型定义 | 元类 | Django/SQLAlchemy 都用 |
| 单例 | 模块级变量 / 装饰器 | Pythonic,更简单 |
| 插件系统 | __init_subclass__ + 注册表 |
可读性更好 |
经验法则 : 如果
__init_subclass__或描述符能解决问题,就不要用元类。元类是最后的手段。
何时使用
- 使用描述符: 需要在多个属性上复用相同的访问逻辑(验证、类型检查、延迟计算、权限控制)
- 使用
property: 单个属性的特殊逻辑,不需要复用 - 避免过度使用 : 简单的 getter/setter 用
property就够了,不需要自定义描述符类
6.2 元类: 控制类的创建
Java/Kotlin 对比
java
// Java: ClassLoader 负责加载 .class 文件,但它不修改类结构
// 注解处理器在编译期生成代码,不是运行时
// 字节码增强(ASM、ByteBuddy)可以修改类,但需要额外的工具链
// Java 没有等价于元类的概念。
// 最接近的是注解处理器 + AbstractProcessor:
@SupportedAnnotationTypes("com.example.Entity")
@SupportedSourceVersion(SourceVersion.RELEASE_17)
public class EntityProcessor extends AbstractProcessor {
@Override
public boolean process(Set<? extends TypeElement> annotations,
RoundEnvironment roundEnv) {
// 编译期扫描注解,生成新代码
// 但这只是代码生成,不是控制类的创建过程
return true;
}
}
// 注解处理器:编译期、需要构建工具、只能生成新文件
// Python 元类:运行时、零配置、直接修改类本身
kotlin
// Kotlin: 编译器插件(如 kotlinx.serialization)最接近
// 但同样是编译期,需要插件基础设施
// Kotlin 没有运行时元类概念
核心差异: Java/Kotlin 的元编程是编译期的、需要工具链的。Python 的元类是运行时的、语言内置的。
Python 实现
python
# === 1. type 是默认元类 ===
# 在 Python 中,"类"也是对象,创建它的"类"就是元类
class MyClass:
pass
print(type(MyClass)) # <class 'type'> --- MyClass 的类型是 type
print(type(MyClass())) # <class 'MyClass'> --- 实例的类型是 MyClass
print(type(type)) # <class 'type'> --- type 自身的类型还是 type
# type 是所有类的元类,除非你显式指定其他元类
# 类定义的本质就是调用 type(name, bases, dict)
MyClass2 = type('MyClass2', (), {'x': 10, 'greet': lambda self: 'hi'})
obj = MyClass2()
print(obj.x) # 10
print(obj.greet()) # hi
# === 2. 自定义元类 ===
class ValidationMeta(type):
"""元类:自动验证所有类属性是否为特定类型"""
def __new__(mcs, name, bases, namespace):
# mcs: 元类自身
# name: 类名
# bases: 基类元组
# namespace: 类的属性字典
# 跳过基类本身的验证
if not bases:
return super().__new__(mcs, name, bases, namespace)
# 验证:所有 _FIELD 后缀的属性必须是 str
for key, value in namespace.items():
if key.endswith('_FIELD') and not isinstance(value, str):
raise TypeError(
f"{name}.{key} must be str, got {type(value).__name__}"
)
# 自动添加 created_at 类属性
namespace['created_at'] = '2026-04-13'
return super().__new__(mcs, name, bases, namespace)
# 使用 metaclass= 指定元类
class Config(metaclass=ValidationMeta):
host_FIELD = "localhost"
port_FIELD = "8080"
print(Config.created_at) # 2026-04-13
print(Config.host_FIELD) # localhost
try:
class BadConfig(metaclass=ValidationMeta):
port_FIELD = 8080 # int,不是 str!
except TypeError as e:
print(e) # BadConfig.port_FIELD must be str, got int
# === 3. 元类的三个钩子: __new__, __init__, __call__ ===
class TracingMeta(type):
"""
__new__: 控制类的创建(返回类对象)
__init__: 控制类的初始化(类已创建,可以修改)
__call__: 控制实例的创建过程(ClassName() 时触发)
"""
def __new__(mcs, name, bases, namespace):
print(f"[TracingMeta.__new__] 创建类: {name}")
cls = super().__new__(mcs, name, bases, namespace)
return cls
def __init__(cls, name, bases, namespace):
print(f"[TracingMeta.__init__] 初始化类: {name}")
super().__init__(name, bases, namespace)
def __call__(cls, *args, **kwargs):
print(f"[TracingMeta.__call__] 创建实例: {cls.__name__}({args}, {kwargs})")
instance = super().__call__(*args, **kwargs)
return instance
class Service(metaclass=TracingMeta):
def __init__(self, name: str):
self.name = name
# 输出:
# [TracingMeta.__new__] 创建类: Service
# [TracingMeta.__init__] 初始化类: Service
s = Service("my-service")
# 输出: [TracingMeta.__call__] 创建实例: Service(('my-service',), {})
# === 4. 实战:单例元类 ===
class SingletonMeta(type):
"""线程安全的单例元类"""
_instances: dict[type, object] = {}
def __call__(cls, *args, **kwargs):
if cls not in cls._instances:
# 使用 super().__call__ 调用类的 __new__ 和 __init__
cls._instances[cls] = super().__call__(*args, **kwargs)
return cls._instances[cls]
class Database(metaclass=SingletonMeta):
def __init__(self, connection_string: str):
self.connection_string = connection_string
def query(self, sql: str) -> str:
return f"[{self.connection_string}] executing: {sql}"
db1 = Database("postgres://localhost/mydb")
db2 = Database("postgres://localhost/other") # 不会创建新实例
print(db1 is db2) # True
print(db1.query("SELECT 1")) # [postgres://localhost/mydb] executing: SELECT 1
# === 5. 实战:注册表元类 ===
class PluginMeta(type):
"""自动注册所有子类"""
_registry: dict[str, type] = {}
def __new__(mcs, name, bases, namespace):
cls = super().__new__(mcs, name, bases, namespace)
# 跳过基类
if not bases or bases[0] is object:
return cls
# 从类属性获取插件名,默认用类名
plugin_name = namespace.get('plugin_name', name)
mcs._registry[plugin_name] = cls
return cls
@classmethod
def get_registry(mcs) -> dict[str, type]:
return dict(mcs._registry)
@classmethod
def create(mcs, name: str, *args, **kwargs):
cls = mcs._registry.get(name)
if cls is None:
raise ValueError(f"Unknown plugin: {name}")
return cls(*args, **kwargs)
class Formatter(metaclass=PluginMeta):
"""格式化插件基类"""
def format(self, data: str) -> str:
raise NotImplementedError
class JsonFormatter(Formatter):
plugin_name = "json"
def format(self, data: str) -> str:
return f'{{"data": "{data}"}}'
class XmlFormatter(Formatter):
plugin_name = "xml"
def format(self, data: str) -> str:
return f"<data>{data}</data>"
class CsvFormatter(Formatter):
plugin_name = "csv"
def format(self, data: str) -> str:
return f'"data"\n"{data}"'
# 所有子类自动注册
print(PluginMeta.get_registry())
# {'json': <class 'JsonFormatter'>, 'xml': <class 'XmlFormatter'>, 'csv': <class 'CsvFormatter'>}
# 通过名称创建
json_fmt = PluginMeta.create("json")
print(json_fmt.format("hello")) # {"data": "hello"}
核心差异
| 维度 | Java 注解处理器 | Python 元类 |
|---|---|---|
| 执行时机 | 编译期 | 运行时(类定义时) |
| 能力 | 只能生成新源文件 | 直接修改类本身 |
| 依赖 | 需要构建工具(javac/maven/gradle) | 零依赖,语言内置 |
| 调试 | 困难(生成的代码不可见) | 简单(直接 print) |
常见陷阱
python
# 陷阱1: 元类继承
# 子类会继承父类的元类
class Parent(metaclass=SingletonMeta):
pass
class Child(Parent):
pass # Child 也是 SingletonMeta 的实例!
# 如果 Parent 和另一个类有不同的元类,会冲突
class OtherMeta(type):
pass
try:
class BadChild(Parent, metaclass=OtherMeta):
pass # TypeError: metaclass conflict
except TypeError as e:
print(e)
# 解决:让 OtherMeta 继承 SingletonMeta
class CompatibleMeta(SingletonMeta):
pass
class GoodChild(Parent, metaclass=CompatibleMeta):
pass # OK
# 陷阱2: __new__ vs __init__
# __new__ 必须返回类对象,__init__ 不需要返回值
# 如果 __new__ 返回了非类对象,__init__ 不会被调用
# 陷阱3: 元类中 namespace 的 key
# namespace 包含的是类定义体中的所有内容,包括方法和属性
# 但不包括继承来的属性
何时使用
- 使用元类: 需要在类创建时自动执行操作(注册、验证、单例、ORM 映射)
- 考虑
__init_subclass__: 如果只是需要子类注册,6.4 节的方案更简单 - 避免 : 大多数框架级的需求可以用装饰器 +
__init_subclass__替代,元类应该是最后的手段
6.3 __getattr__ 与 __getattribute__: 属性访问拦截
Java/Kotlin 对比
java
// Java: 没有运行时属性访问拦截机制。
// 最接近的是 java.lang.reflect.Proxy(接口代理):
interface UserService {
String getName();
void setName(String name);
}
UserService proxy = (UserService) Proxy.newProxyInstance(
UserService.class.getClassLoader(),
new Class<?>[] { UserService.class },
(obj, method, args) -> {
System.out.println("拦截: " + method.getName());
return null; // 简化
}
);
// Proxy 只能代理接口,不能代理类。
// 如果要代理类,需要 CGLIB 或 ByteBuddy(字节码生成)。
// Java Agent/Instrumentation 可以在类加载时修改字节码,
// 但需要 -javaagent 参数,重量级且复杂。
kotlin
// Kotlin: by delegation 可以代理接口
interface Printer {
fun print(msg: String)
}
class PrinterProxy(private val real: Printer) : Printer by real {
override fun print(msg: String) {
println("[proxy] $msg")
real.print(msg)
}
}
// 同样只能代理接口,不能拦截任意属性访问
核心差异 : Java/Kotlin 的代理只能拦截方法调用(且限于接口),Python 的 __getattr__/__getattribute__ 能拦截一切属性访问------包括不存在的属性。
Python 实现
python
# === 1. __getattr__: 仅在属性未找到时调用 ===
class DynamicAttrs:
"""动态属性:访问不存在的属性时自动计算"""
def __init__(self, data: dict):
self._data = data
def __getattr__(self, name):
# 只在常规查找失败后调用
# 注意:不要在这里访问 self.name,会无限递归!
# 用 self._data 或 object.__getattribute__(self, '_data')
if name in self._data:
return self._data[name]
raise AttributeError(f"'{self.__class__.__name__}' has no attribute '{name}'")
obj = DynamicAttrs({"x": 1, "y": 2})
print(obj.x) # 1(通过 __getattr__ 从 _data 获取)
print(obj.y) # 2
print(obj._data) # {'x': 1, 'y': 2}(_data 存在于实例字典,不走 __getattr__)
try:
print(obj.z) # AttributeError
except AttributeError as e:
print(e)
# === 2. __getattribute__: 每次属性访问都调用 ===
class StrictAccess:
"""所有属性访问都经过拦截"""
def __init__(self):
self._allowed = {'name', 'age'}
def __getattribute__(self, name):
# 每次访问都会到这里!包括 self._allowed
# 必须用 super().__getattribute__ 或 object.__getattribute__ 获取基础值
allowed = object.__getattribute__(self, '_allowed')
if name.startswith('_'):
# 内部属性放行
return object.__getattribute__(self, name)
if name not in allowed:
raise AttributeError(f"Access to '{name}' is not allowed")
return object.__getattribute__(self, name)
def __setattr__(self, name, value):
if name == '_allowed':
object.__setattr__(self, name, value)
elif name in object.__getattribute__(self, '_allowed'):
object.__setattr__(self, name, value)
else:
raise AttributeError(f"Cannot set '{name}'")
s = StrictAccess()
s.name = "Alice"
s.age = 30
print(s.name) # Alice
try:
s.email = "a@b.com"
except AttributeError as e:
print(e) # Cannot set 'email'
# === 3. 属性查找顺序(关键!)===
"""
Python 属性查找顺序:
1. __getattribute__(如果定义了,每次都调用)
2. 数据描述符(定义在类中,有 __get__ + __set__)
3. 实例字典 __dict__
4. 非数据描述符(定义在类中,只有 __get__)
5. __getattr__(如果定义了,作为兜底)
"""
class LookupDemo:
x = 10 # 类属性(非数据描述符)
def __getattr__(self, name):
print(f"__getattr__ called: {name}")
return f"default for {name}"
d = LookupDemo()
print(d.x) # 10(实例字典没有,类属性有,不走 __getattr__)
print(d.y) # __getattr__ called: y \n default for y
d.x = 20 # 写入实例字典
print(d.x) # 20(实例字典优先于类属性)
# === 4. 实战:动态属性代理 ===
class LazyProxy:
"""延迟初始化代理:在首次访问真实属性时才创建被代理对象"""
def __init__(self, factory):
# 不能用 self._factory,会触发 __setattr__
object.__setattr__(self, '_factory', factory)
object.__setattr__(self, '_target', None)
object.__setattr__(self, '_initialized', False)
def _get_target(self):
if not object.__getattribute__(self, '_initialized'):
factory = object.__getattribute__(self, '_factory')
object.__setattr__(self, '_target', factory())
object.__setattr__(self, '_initialized', True)
return object.__getattribute__(self, '_target')
def __getattr__(self, name):
target = self._get_target()
return getattr(target, name)
def __setattr__(self, name, value):
target = self._get_target()
setattr(target, name)
def __delattr__(self, name):
target = self._get_target()
delattr(target, name)
def __repr__(self):
if object.__getattribute__(self, '_initialized'):
return f"<Proxy to {self._get_target()!r}>"
return f"<Uninitialized Proxy>"
class ExpensiveResource:
def __init__(self):
print("[创建 ExpensiveResource...]")
self.data = "heavy data"
def process(self):
return f"processing: {self.data}"
# 代理创建时不会初始化
proxy = LazyProxy(ExpensiveResource)
print(proxy) # <Uninitialized Proxy>
# 首次访问属性时才初始化
print(proxy.process()) # [创建 ExpensiveResource...] \n processing: heavy data
print(proxy) # <Proxy to <ExpensiveResource object>>
# === 5. 实战:链式 API 构建器 ===
class QueryBuilder:
"""动态方法名构建 SQL 查询"""
def __init__(self, table: str):
self._table = table
self._conditions: list[str] = []
self._columns = "*"
def __getattr__(self, name):
# where_name, where_age, order_by_created_at 等
if name.startswith('where_'):
field = name[6:] # 去掉 'where_' 前缀
def comparator(value):
self._conditions.append(f"{field} = '{value}'")
return self # 支持链式调用
return comparator
if name.startswith('order_by_'):
field = name[8:]
self._order = field
return self
raise AttributeError(f"'{self.__class__.__name__}' has no attribute '{name}'")
def select(self, *columns):
self._columns = ', '.join(columns)
return self
def build(self) -> str:
sql = f"SELECT {self._columns} FROM {self._table}"
if self._conditions:
sql += " WHERE " + " AND ".join(self._conditions)
if hasattr(self, '_order'):
sql += f" ORDER BY {self._order}"
return sql
query = (
QueryBuilder("users")
.select("name", "email")
.where_age(25)
.where_name("Alice")
.order_by_created_at()
)
print(query.build())
# SELECT name, email FROM users WHERE age = '25' AND name = 'Alice' ORDER BY created_at
核心差异
| 维度 | Java Proxy | Python __getattr__ |
|---|---|---|
| 拦截范围 | 只能代理接口方法 | 拦截一切属性访问 |
| 实现复杂度 | 需要 InvocationHandler | 一个方法搞定 |
| 类限制 | 只能代理接口 | 任何类都可以 |
| 性能 | 反射调用,较慢 | 直接调用,比 Java Proxy 快 |
常见陷阱
python
# 陷阱1: __getattr__ 中的无限递归
class Bad:
def __getattr__(self, name):
return self.name # 无限递归!self.name 又会触发 __getattr__
# 正确做法:
class Good:
def __getattr__(self, name):
return object.__getattribute__(self, 'name') # 或 self.__dict__['name']
# 陷阱2: __getattribute__ 中忘记用 object.__getattribute__
class Oops:
def __getattribute__(self, name):
return self._data # 无限递归!self._data 触发 __getattribute__
# 陷阱3: __setattr__ 中的递归
class AlsoBad:
def __setattr__(self, name, value):
self.name = value # 无限递归!
# 正确做法:
class AlsoGood:
def __setattr__(self, name, value):
object.__setattr__(self, name, value)
何时使用
__getattr__: 动态属性、代理模式、链式 API、向后兼容的属性名映射__getattribute__: 需要控制所有属性访问(安全沙箱、权限控制、日志记录)- 优先用
__getattr__: 它更安全,不会意外拦截内部属性访问
6.4 __init_subclass__ 与 __set_name__: 类注册与描述符增强
Java/Kotlin 对比
java
// Java: 没有等价物。
// 最接近的是抽象类的构造函数中做注册:
public abstract class Plugin {
private static final Map<String, Plugin> registry = new HashMap<>();
protected Plugin(String name) {
registry.put(name, this);
}
}
// 但这要求每个子类构造函数都显式调用 super(name)。
// Python 的 __init_subclass__ 是自动的,子类什么都不用做。
kotlin
// Kotlin: 同样没有等价物。
// sealed class 可以枚举子类,但那是编译期的,不是运行时注册。
sealed class Plugin {
data class JsonPlugin(val name: String = "json") : Plugin()
data class XmlPlugin(val name: String = "xml") : Plugin()
}
// sealed class 的子类必须在同一编译单元中定义,不能扩展
核心差异 : __init_subclass__ 让基类在子类定义时自动执行代码,子类完全无感知。这是实现插件系统、验证框架的最优雅方式。
Python 实现
python
# === 1. __init_subclass__: 基类钩子 ===
class EventBase:
"""事件基类:自动注册所有子类"""
# 类变量,存储注册表
_event_types: dict[str, type] = {}
def __init_subclass__(cls, event_name: str = None, **kwargs):
super().__init_subclass__(**kwargs)
# cls 是子类本身
name = event_name or cls.__name__
EventBase._event_types[name] = cls
cls._event_name = name
@classmethod
def create(cls, event_name: str, **data):
event_cls = cls._event_types.get(event_name)
if event_cls is None:
raise ValueError(f"Unknown event: {event_name}")
return event_cls(**data)
@classmethod
def list_events(cls) -> list[str]:
return list(cls._event_types.keys())
# 子类只需指定 event_name,自动注册
class UserCreated(EventBase, event_name="user.created"):
def __init__(self, user_id: int, username: str):
self.user_id = user_id
self.username = username
def __repr__(self):
return f"UserCreated({self.user_id}, {self.username})"
class OrderPlaced(EventBase, event_name="order.placed"):
def __init__(self, order_id: int, amount: float):
self.order_id = order_id
self.amount = amount
def __repr__(self):
return f"OrderPlaced({self.order_id}, {self.amount})"
# 自动注册完成
print(EventBase.list_events()) # ['user.created', 'order.placed']
# 通过名称创建
event = EventBase.create("user.created", user_id=42, username="alice")
print(event) # UserCreated(42, alice)
# === 2. __init_subclass__ 做验证 ===
class StrictBase:
"""要求所有子类必须实现某些方法"""
required_methods = ('handle', 'validate')
def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs)
for method_name in cls.required_methods:
if not hasattr(cls, method_name):
raise NotImplementedError(
f"{cls.__name__} must implement '{method_name}'"
)
# 检查是否真的在子类中定义了,而不是继承来的
for base in cls.__mro__[1:]:
if method_name in base.__dict__:
raise NotImplementedError(
f"{cls.__name__} must override '{method_name}' from {base.__name__}"
)
break
class GoodHandler(StrictBase):
def handle(self, request):
return f"handled: {request}"
def validate(self, request):
return len(request) > 0
try:
class BadHandler(StrictBase):
def handle(self, request):
return request
# 缺少 validate!
except NotImplementedError as e:
print(e) # BadHandler must implement 'validate'
# === 3. __set_name__: 描述符知道自己的名字 ===
class ValidatedField:
"""自动获取属性名的描述符"""
def __init__(self, *, min_length: int = 0, max_length: int = 255,
pattern: str = None):
self.min_length = min_length
self.max_length = max_length
self.pattern = pattern
self.name = None # 将在 __set_name__ 中设置
def __set_name__(self, owner, name):
"""Python 在类创建时自动调用,告诉描述符它的属性名"""
self.name = name
self.storage_name = f'_field_{name}'
def __get__(self, obj, objtype=None):
if obj is None:
return self
return getattr(obj, self.storage_name, None)
def __set__(self, obj, value):
if not isinstance(value, str):
raise TypeError(f"{self.name} must be str")
if len(value) < self.min_length:
raise ValueError(
f"{self.name} must be at least {self.min_length} characters"
)
if len(value) > self.max_length:
raise ValueError(
f"{self.name} must be at most {self.max_length} characters"
)
if self.pattern:
import re
if not re.match(self.pattern, value):
raise ValueError(f"{self.name} must match pattern {self.pattern}")
setattr(obj, self.storage_name, value)
def __repr__(self):
return f"ValidatedField(name={self.name!r})"
class UserProfile:
username = ValidatedField(min_length=3, max_length=20, pattern=r'^[a-zA-Z0-9_]+$')
email = ValidatedField(min_length=5, max_length=100, pattern=r'^[^@]+@[^@]+\.[^@]+$')
bio = ValidatedField(max_length=500)
# __set_name__ 让描述符自动知道自己的名字
print(UserProfile.username) # ValidatedField(name='username')
print(UserProfile.email) # ValidatedField(name='email')
profile = UserProfile()
profile.username = "alice_123"
profile.email = "alice@example.com"
profile.bio = "Hello world"
try:
profile.username = "ab" # 太短
except ValueError as e:
print(e) # username must be at least 3 characters
try:
profile.email = "not-an-email"
except ValueError as e:
print(e) # email must match pattern ^[^@]+@[^@]+\.[^@]+$
# === 4. 实战:完整的插件注册系统 ===
class CommandRegistry:
"""命令注册系统:结合 __init_subclass__ 和 __set_name__"""
_commands: dict[str, type] = {}
def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs)
# 自动注册所有子类
if hasattr(cls, 'name') and cls.name:
CommandRegistry._commands[cls.name] = cls
@classmethod
def get_command(cls, name: str):
return cls._commands.get(name)
@classmethod
def run(cls, name: str, *args, **kwargs):
cmd_cls = cls._commands.get(name)
if cmd_cls is None:
raise ValueError(f"Unknown command: {name}")
return cmd_cls().execute(*args, **kwargs)
class Command(CommandRegistry):
"""命令基类"""
name: str = "" # 子类覆盖
def execute(self, *args, **kwargs):
raise NotImplementedError
class GreetCommand(Command):
name = "greet"
def execute(self, who: str = "world"):
return f"Hello, {who}!"
class CalcCommand(Command):
name = "calc"
def execute(self, a: float, op: str, b: float) -> float:
ops = {'+': lambda x, y: x + y, '-': lambda x, y: x - y,
'*': lambda x, y: x * y, '/': lambda x, y: x / y}
return ops[op](a, b)
class TimeCommand(Command):
name = "time"
def execute(self):
from datetime import datetime
return datetime.now().isoformat()
# 所有命令自动注册
print(CommandRegistry._commands.keys())
# dict_keys(['greet', 'calc', 'time'])
print(CommandRegistry.run("greet", who="Alice")) # Hello, Alice!
print(CommandRegistry.run("calc", 10, "+", 32)) # 42.0
print(CommandRegistry.run("time")) # 2026-04-13T...
核心差异
| 维度 | Java/Kotlin | Python |
|---|---|---|
| 子类自动注册 | 需要子类显式调用 super() 注册 | __init_subclass__ 自动触发 |
| 描述符获取属性名 | 不可能(Kotlin 的 KProperty 是编译期) | __set_name__ 运行时自动调用 |
| 实现复杂度 | 需要注解处理器或手动注册 | 几行代码 |
常见陷阱
python
# 陷阱1: __init_subclass__ 的参数传递
# 子类的 metaclass 关键字参数不会传给 __init_subclass__
class Base:
def __init_subclass__(cls, custom_arg=None, **kwargs):
super().__init_subclass__(**kwargs)
print(f"custom_arg = {custom_arg}")
class Child(Base, custom_arg="hello"):
pass
# custom_arg = hello
# 但 metaclass= 不会作为参数传递
class Child2(Base, metaclass=type):
pass
# custom_arg = None(metaclass 不影响 __init_subclass__ 的参数)
# 陷阱2: __set_name__ 只在类属性上调用
class MyDesc:
def __set_name__(self, owner, name):
print(f"__set_name__: {name}")
class Foo:
x = MyDesc() # 会调用 __set_name__,打印: __set_name__: x
def __init__(self):
self.y = MyDesc() # 不会调用!这是实例属性
何时使用
__init_subclass__: 插件系统、命令注册、自动验证子类、框架基类__set_name__: 需要知道属性名的描述符(验证、序列化、ORM 字段映射)- 优先于元类 : 如果需求只是注册或验证子类,
__init_subclass__比元类简单得多
6.5 动态代码执行: exec, eval, compile
Java/Kotlin 对比
java
// Java: ScriptEngine (JSR 223) 可以执行动态代码
import javax.script.*;
ScriptEngine engine = new ScriptEngineManager().getEngineByName("js");
Object result = engine.eval("1 + 2"); // 3
engine.eval("function add(a, b) { return a + b; }");
// Java 的动态代码执行是隔离的(JS 引擎),不能直接操作 Java 对象
// Nashorn 在 Java 15 被移除,需要用 GraalVM
kotlin
// Kotlin: 可以使用 javax.script 或 Kotlin Scripting(kotlin-script-runtime)
// 但同样需要引擎,不能直接执行 Kotlin 代码
核心差异 : Python 的 exec/eval 直接执行 Python 代码,完全访问当前运行时环境。Java 的 ScriptEngine 执行的是另一种语言(JS/Groovy),与 Java 对象模型隔离。
Python 实现
python
# === 1. eval: 计算表达式 ===
# eval 只能计算单个表达式,不能执行语句
result = eval("1 + 2 * 3")
print(result) # 7
result = eval("[x**2 for x in range(5)]")
print(result) # [0, 1, 4, 9, 16]
# eval 可以接收命名空间
namespace = {"x": 10, "y": 20}
result = eval("x + y", namespace)
print(result) # 30
# eval 的安全版本:限制可用的名字
safe_globals = {"__builtins__": {}}
safe_locals = {"x": 10}
result = eval("x * 2", safe_globals, safe_locals)
print(result) # 20
try:
eval("__import__('os').system('ls')", safe_globals, safe_locals)
except NameError as e:
print(f"安全拦截: {e}")
# === 2. exec: 执行语句 ===
# exec 可以执行任意 Python 代码(赋值、函数定义、类定义等)
namespace = {}
exec("""
def factorial(n):
if n <= 1:
return 1
return n * factorial(n - 1)
class Calculator:
def add(self, a, b):
return a + b
""", namespace)
# 动态定义的函数和类可以直接使用
factorial_fn = namespace['factorial']
print(factorial_fn(5)) # 120
Calc = namespace['Calculator']
calc = Calc()
print(calc.add(3, 4)) # 7
# === 3. compile: 编译代码对象 ===
# compile 将源码编译为代码对象,可以重复执行
code = compile("x + y", "<calc>", "eval")
print(eval(code, {"x": 1, "y": 2})) # 3
print(eval(code, {"x": 10, "y": 20})) # 30
# compile 的三种模式
# "eval": 单个表达式
expr_code = compile("a * b", "<expr>", "eval")
# "exec": 多条语句
stmt_code = compile("""
total = 0
for i in range(n):
total += i
result = total
""", "<loop>", "exec")
ns = {"n": 10}
exec(stmt_code, ns)
print(ns["result"]) # 45
# "single": 单条交互式语句(像 REPL)
single_code = compile("print('hello from compiled code')", "<single>", "single")
exec(single_code)
# === 4. 实战:安全的表达式求值器 ===
import ast
import operator
# 允许的操作
_SAFE_OPERATORS = {
ast.Add: operator.add,
ast.Sub: operator.sub,
ast.Mult: operator.mul,
ast.Div: operator.truediv,
ast.Pow: operator.pow,
ast.USub: operator.neg,
ast.UAdd: operator.pos,
}
_SAFE_NAMES = {
"abs": abs,
"min": min,
"max": max,
"round": round,
"sum": sum,
"len": len,
"True": True,
"False": False,
"None": None,
}
class SafeEvalError(Exception):
pass
def safe_eval(expression: str, variables: dict = None) -> object:
"""
安全的表达式求值器:只允许数学运算,禁止函数调用和属性访问。
通过 AST 解析白名单实现,比 eval + 限制命名空间更安全。
"""
try:
tree = ast.parse(expression, mode="eval")
except SyntaxError as e:
raise SafeEvalError(f"Invalid expression: {e}")
def _eval(node):
if isinstance(node, ast.Constant): # Python 3.8+
return node.value
if isinstance(node, ast.Name):
if variables and node.id in variables:
return variables[node.id]
if node.id in _SAFE_NAMES:
return _SAFE_NAMES[node.id]
raise SafeEvalError(f"Unknown name: {node.id}")
if isinstance(node, ast.BinOp):
left = _eval(node.left)
right = _eval(node.right)
op_type = type(node.op)
if op_type in _SAFE_OPERATORS:
return _SAFE_OPERATORS[op_type](left, right)
raise SafeEvalError(f"Unsupported operator: {op_type.__name__}")
if isinstance(node, ast.UnaryOp):
operand = _eval(node.operand)
op_type = type(node.op)
if op_type in _SAFE_OPERATORS:
return _SAFE_OPERATORS[op_type](operand)
raise SafeEvalError(f"Unsupported operator: {op_type.__name__}")
raise SafeEvalError(f"Unsupported expression: {type(node).__name__}")
return _eval(tree.body)
# 安全使用
print(safe_eval("2 + 3 * 4")) # 14
print(safe_eval("(10 - 3) ** 2")) # 49
print(safe_eval("abs(-5) + max(1, 2, 3)")) # 8
print(safe_eval("price * quantity", {"price": 9.99, "quantity": 3})) # 29.97
# 危险代码被拦截
try:
safe_eval("__import__('os').system('rm -rf /')")
except SafeEvalError as e:
print(f"拦截: {e}")
try:
safe_eval("open('/etc/passwd').read()")
except SafeEvalError as e:
print(f"拦截: {e}")
try:
safe_eval("(lambda: __import__('os'))()")
except SafeEvalError as e:
print(f"拦截: {e}")
# === 5. 实战:配置驱动的规则引擎 ===
class RuleEngine:
"""通过动态代码执行实现灵活的业务规则"""
def __init__(self):
self._rules: list[dict] = []
def add_rule(self, name: str, condition: str, action: str):
"""添加规则:condition 和 action 都是 Python 表达式/语句"""
self._rules.append({
"name": name,
"condition": compile(condition, f"<rule:{name}>", "eval"),
"action": compile(action, f"<rule:{name}>", "exec"),
})
def evaluate(self, context: dict) -> list[str]:
"""评估所有规则,返回触发的规则名"""
triggered = []
for rule in self._rules:
try:
if eval(rule["condition"], {"__builtins__": {}}, context):
exec(rule["action"], {"__builtins__": {}}, context)
triggered.append(rule["name"])
except Exception as e:
print(f"Rule {rule['name']} error: {e}")
return triggered
engine = RuleEngine()
# 从配置文件加载规则(实际中可以从 YAML/JSON 读取)
engine.add_rule(
"discount",
"total > 100 and customer_level == 'gold'",
"total *= 0.9; discount_applied = True"
)
engine.add_rule(
"free_shipping",
"total > 50",
"shipping = 0"
)
engine.add_rule(
"bonus_points",
"total > 200",
"points = int(total * 2)"
)
# 评估
context = {"total": 150.0, "customer_level": "gold", "shipping": 10}
triggered = engine.evaluate(context)
print(f"触发规则: {triggered}")
# 触发规则: ['discount', 'free_shipping']
print(f"最终: total={context['total']}, shipping={context['shipping']}")
# 最终: total=135.0, shipping=0
核心差异
| 维度 | Java ScriptEngine | Python exec/eval |
|---|---|---|
| 执行语言 | 另一种语言(JS/Groovy) | Python 自身 |
| 对象访问 | 需要序列化/桥接 | 直接访问当前环境 |
| 安全性 | 引擎隔离 | 需要手动限制 |
| 性能 | 较慢(引擎启动开销) | 较快(原生执行) |
常见陷阱
python
# 陷阱1: eval/exec 的返回值
result = eval("x = 1") # SyntaxError! 赋值是语句,不是表达式
# 正确:用 exec
ns = {}
exec("x = 1", ns)
print(ns["x"]) # 1
# exec 的返回值永远是 None
result = exec("1 + 2")
print(result) # None
# 陷阱2: 命名空间隔离
x = 10
exec("x = 20")
print(x) # 还是 10!exec 默认在局部命名空间操作
# 要影响外部变量,必须显式传入
ns = {"x": 10}
exec("x = 20", ns)
print(ns["x"]) # 20
# 陷阱3: 永远不要对不可信的输入使用 eval
# 即使用 {"__builtins__": {}} 也不够安全
# 攻击者可以通过各种方式绕过:
# eval("().__class__.__bases__[0].__subclasses__()", {"__builtins__": {}})
# 应该用上面 demo 中的 AST 白名单方案
何时使用
- eval: 计算用户输入的数学表达式(必须用 AST 白名单)
- exec: 加载配置文件中的动态规则、插件系统、REPL 实现
- compile: 需要多次执行同一段代码时预编译
- 避免: 对不可信输入使用 eval/exec,除非做了完整的安全隔离
6.6 types 模块: 动态类型创建
Java/Kotlin 对比
java
// Java: 运行时创建类需要字节码操作库(ASM、ByteBuddy、CGLIB)
import net.bytebuddy.ByteBuddy;
import net.bytebuddy.dynamic.DynamicType;
DynamicType.Unloaded<?> dynamicType = new ByteBuddy()
.subclass(Object.class)
.name("com.example.DynamicClass")
.defineMethod("greet", String.class, Modifier.PUBLIC)
.intercept(FixedValue.value("Hello"))
.make();
Class<?> dynamicClass = dynamicType.load(
getClass().getClassLoader()
).getLoaded();
Object instance = dynamicClass.getDeclaredConstructor().newInstance();
// 需要第三方库,API 复杂
kotlin
// Kotlin: 没有运行时创建类的能力,依赖同样的 Java 字节码库
核心差异 : Python 的 types 模块和 type() 是语言内置的,零依赖。Java 需要第三方字节码操作库。
Python 实现
python
import types
# === 1. types.new_class(): 动态创建类 ===
def _init_method(self, name):
self.name = name
def _repr_method(self):
return f"DynamicUser({self.name!r})"
def _greet_method(self):
return f"Hello, {self.name}!"
# 动态创建类
DynamicUser = types.new_class(
name="DynamicUser",
bases=(object,),
exec_body=lambda ns: ns.update({
"__init__": _init_method,
"__repr__": _repr_method,
"greet": _greet_method,
"species": "Homo sapiens", # 类属性
})
)
user = DynamicUser("Alice")
print(user) # DynamicUser('Alice')
print(user.greet()) # Hello, Alice!
print(user.species) # Homo sapiens
print(isinstance(user, DynamicUser)) # True
# === 2. types.FunctionType(): 动态创建函数 ===
# FunctionType(code, globals, name, argdefs, closure)
# 通常配合 compile 使用
# 方法1: 通过 code 对象创建
code = compile("return a + b", "<dynamic>", "exec")
# 但 compile 生成的是 code 对象,不是函数
# 方法2: 更实用的方式 --- 动态生成函数
def make_operator(op_name: str, op_func):
"""工厂函数:创建运算函数"""
def operator(a, b):
return op_func(a, b)
operator.__name__ = f"op_{op_name}"
operator.__qualname__ = f"make_operator.<locals>.op_{op_name}"
operator.__doc__ = f"Perform {op_name} operation"
return operator
add = make_operator("add", lambda a, b: a + b)
multiply = make_operator("multiply", lambda a, b: a * b)
print(add(3, 4)) # 7
print(multiply(3, 4)) # 12
print(add.__name__) # op_add
# === 3. types.MethodType(): 动态绑定方法 ===
class Greeter:
pass
def say_hello(self, name: str) -> str:
return f"Hello, {name}!"
g = Greeter()
# 将函数绑定为实例方法
g.hello = types.MethodType(say_hello, g)
print(g.hello("World")) # Hello, World!
# 另一个实例不会受到影响
g2 = Greeter()
print(hasattr(g2, 'hello')) # False
# === 4. 实战:动态创建 API 客户端 ===
def _make_request_method(endpoint: str, method: str = "GET"):
"""为每个 API 端点动态创建方法"""
def api_method(self, **params):
url = f"{self.base_url}/{endpoint}"
# 模拟 API 调用
return {
"url": url,
"method": method,
"params": params,
}
api_method.__name__ = endpoint.replace("/", "_").replace("-", "_")
api_method.__doc__ = f"API {method} /{endpoint}"
return api_method
# 动态创建 API 客户端类
API_ENDPOINTS = [
("users", "GET"),
("users/{id}", "GET"),
("users", "POST"),
("orders", "GET"),
("orders/{id}", "DELETE"),
]
ApiClient = types.new_class(
name="ApiClient",
bases=(object,),
exec_body=lambda ns: (
ns.update({
"__init__": lambda self, base_url="https://api.example.com": setattr(self, 'base_url', base_url),
}),
[ns.update({ep[0].replace("/", "_").replace("-", "_"): _make_request_method(ep[0], ep[1])}) for ep in API_ENDPOINTS],
)
)
client = ApiClient("https://api.example.com")
print(client.users())
# {'url': 'https://api.example.com/users', 'method': 'GET', 'params': {}}
print(client.users_id_(id=42))
# {'url': 'https://api.example.com/users/{id}', 'method': 'GET', 'params': {'id': 42}}
# === 5. types 模块的其他实用工具 ===
# 检查函数类型
def regular_func():
pass
class MyClass:
@classmethod
def cm(cls):
pass
@staticmethod
def sm():
pass
def im(self):
pass
print(types.FunctionType) # <class 'function'>
print(types.MethodType) # <class 'method'>
print(types.BuiltinFunctionType) # <class 'builtin_function_or_method'>
print(types.LambdaType) # <class 'function'>(lambda 和 def 是同一类型)
print(isinstance(regular_func, types.FunctionType)) # True
print(isinstance(print, types.BuiltinFunctionType)) # True
# types.SimpleNamespace: 轻量级对象
config = types.SimpleNamespace(host="localhost", port=8080, debug=True)
print(config.host) # localhost
config.timeout = 30 # 动态添加属性
print(config) # namespace(host='localhost', port=8080, debug=True, timeout=30)
核心差异
| 维度 | Java ByteBuddy | Python types |
|---|---|---|
| 依赖 | 第三方库 | 标准库 |
| 创建类 | 复杂的 Builder API | type() 或 types.new_class() |
| 创建方法 | 字节码指令 | 普通函数 |
| 学习曲线 | 陡峭 | 平缓 |
常见陷阱
python
# 陷阱1: types.new_class 的 exec_body
# exec_body 接收一个函数,参数是类的命名空间字典
# 这个函数的返回值被忽略,必须直接修改 ns
# 错误:
types.new_class("Bad", exec_body=lambda ns: {"x": 1}) # ns 被忽略了!
# 正确:
types.new_class("Good", exec_body=lambda ns: ns.update({"x": 1}))
# 陷阱2: MethodType 的第一个参数
# MethodType(func, instance) --- instance 必须提供
unbound = types.MethodType(say_hello, None) # Python 3 中行为可能不符合预期
何时使用
type(): 简单的动态类创建types.new_class(): 需要更精细控制(计算元类、处理__init_subclass__)types.MethodType(): 动态给实例绑定方法types.SimpleNamespace: 替代轻量级字典或 dataclass
6.7 inspect 模块: 运行时自省
Java/Kotlin 对比
java
// Java: Reflection API
import java.lang.reflect.*;
Class<?> clazz = String.class;
Method[] methods = clazz.getDeclaredMethods();
Field[] fields = clazz.getDeclaredFields();
Constructor<?>[] constructors = clazz.getDeclaredConstructors();
// 获取方法参数信息(Java 8+ 需要 -parameters 编译选项)
Method method = String.class.getMethod("substring", int.class, int.class);
Parameter[] params = method.getParameters();
// 没有编译 -parameters 的话,参数名是 arg0, arg1...
// Java 反射的局限:
// 1. 泛型信息在运行时被擦除
// 2. 参数名默认不可用
// 3. 无法获取源代码
// 4. 无法获取调用栈的局部变量
kotlin
// Kotlin: 反射更强大(KReflection),但需要额外依赖 kotlin-reflect
import kotlin.reflect.full.*
val klass = String::class
val members = klass.members
// KReflection 可以获取参数名、可空性、默认值等
// 但 kotlin-reflect 是一个很大的依赖(~2MB)
核心差异 : Python 的 inspect 是标准库,零依赖,能获取源代码、调用栈、参数默认值等 Java Reflection 做不到的信息。
Python 实现
python
import inspect
import functools
# === 1. 检查对象类型 ===
def regular_function():
"""A regular function"""
pass
class MyClass:
@classmethod
def class_method(cls):
pass
@staticmethod
def static_method():
pass
def instance_method(self):
pass
@property
def my_property(self):
return 42
obj = MyClass()
print(inspect.isfunction(regular_function)) # True
print(inspect.ismethod(obj.instance_method)) # True(绑定方法)
print(inspect.ismethod(MyClass.instance_method)) # False(未绑定是函数)
print(inspect.isclass(MyClass)) # True
print(inspect.ismethod(MyClass.class_method)) # True
print(inspect.isroutine(regular_function)) # True(函数、方法、lambda 都是 routine)
print(inspect.isbuiltin(print)) # True
print(inspect.iscoroutinefunction(async def f(): pass) or False) # True for async
# === 2. 获取签名和参数信息 ===
def create_user(name: str, age: int = 0, *, active: bool = True, **extra):
"""Create a new user."""
pass
sig = inspect.signature(create_user)
print(sig) # (name: str, age: int = 0, *, active: bool = True, **extra)
for param_name, param in sig.parameters.items():
kind = str(param.kind).split('.')[-1]
default = param.default
annotation = param.annotation
has_default = default is not inspect.Parameter.empty
print(f" {param_name}: kind={kind}, annotation={annotation}, default={default if has_default else 'REQUIRED'}")
# 输出:
# name: kind=POSITIONAL_OR_KEYWORD, annotation=<class 'str'>, default=REQUIRED
# age: kind=POSITIONAL_OR_KEYWORD, annotation=<class 'int'>, default=0
# active: kind=KEYWORD_ONLY, annotation=<class 'bool'>, default=True
# extra: kind=VAR_KEYWORD, annotation=<class 'inspect._empty'>, default=REQUIRED
# === 3. 获取源代码 ===
def example_function(x, y):
"""This is an example."""
result = x + y
return result
print(inspect.getsource(example_function))
# def example_function(x, y):
# """This is an example."""
# result = x + y
# return result
print(inspect.getdoc(example_function)) # This is an example.
print(inspect.getcomments(example_function)) # None(注释在函数定义之前)
# 获取源文件和行号
print(inspect.getfile(example_function)) # 文件路径
print(inspect.getsourcelines(example_function)[1]) # 起始行号
# === 4. 获取调用栈 ===
def level_3():
frame = inspect.currentframe()
print(f"当前函数: {frame.f_code.co_name}")
# 获取调用栈
stack = inspect.stack()
print(f"\n调用栈(从底到顶):")
for i, frame_info in enumerate(stack):
print(f" [{i}] {frame_info.function} at {frame_info.filename}:{frame_info.lineno}")
def level_2():
level_3()
def level_1():
level_2()
level_1()
# === 5. 实战:自动参数验证装饰器 ===
def validate_types(**type_hints):
"""
装饰器工厂:根据类型提示自动验证参数类型。
用 inspect 获取签名,在运行时检查参数类型。
"""
def decorator(func):
sig = inspect.signature(func)
@functools.wraps(func)
def wrapper(*args, **kwargs):
# 将参数绑定到签名
bound = sig.bind(*args, **kwargs)
bound.apply_defaults()
# 检查每个参数的类型
for param_name, value in bound.arguments.items():
if param_name in type_hints:
expected = type_hints[param_name]
if not isinstance(value, expected):
raise TypeError(
f"{func.__name__}(): parameter '{param_name}' "
f"expected {expected.__name__}, got {type(value).__name__}"
)
return func(*args, **kwargs)
return wrapper
return decorator
@validate_types(name=str, age=int, score=float)
def create_student(name, age, score=0.0):
return {"name": name, "age": age, "score": score}
print(create_student("Alice", 20, 95.5)) # OK
try:
create_student("Bob", "twenty") # age 应该是 int
except TypeError as e:
print(e) # create_student(): parameter 'age' expected int, got str
# === 6. 实战:自动生成 API 文档 ===
def generate_docs(cls) -> str:
"""自动从类的方法签名和文档字符串生成文档"""
lines = [f"# {cls.__name__}", ""]
for name, method in inspect.getmembers(cls, inspect.isfunction):
if name.startswith('_'):
continue
sig = inspect.signature(method)
doc = inspect.getdoc(method) or "No description."
lines.append(f"## {name}{sig}")
lines.append("")
lines.append(doc)
lines.append("")
return '\n'.join(lines)
class Calculator:
"""A simple calculator."""
def add(self, a: float, b: float) -> float:
"""Add two numbers."""
return a + b
def subtract(self, a: float, b: float) -> float:
"""Subtract b from a."""
return a - b
def multiply(self, a: float, b: float) -> float:
"""Multiply two numbers."""
return a * b
def divide(self, a: float, b: float) -> float:
"""Divide a by b. Raises ZeroDivisionError if b is zero."""
return a / b
print(generate_docs(Calculator))
# # Calculator
#
# ## add(self, a: float, b: float) -> float
#
# Add two numbers.
#
# ## subtract(self, a: float, b: float) -> float
#
# Subtract b from a.
# ...
核心差异
| 维度 | Java Reflection | Python inspect |
|---|---|---|
| 依赖 | 标准库,但功能有限 | 标准库,功能丰富 |
| 源代码 | 无法获取 | getsource() 直接获取 |
| 参数名 | 需要 -parameters 编译选项 | 默认可用 |
| 调用栈 | Thread.getStackTrace() |
inspect.stack(),信息更丰富 |
| 泛型 | 运行时擦除 | 保留类型提示(__annotations__) |
常见陷阱
python
# 陷阱1: getsource 对内置函数和 C 扩展无效
try:
inspect.getsource(len)
except TypeError as e:
print(e) # <built-in function len> is not a module, class, method, function, traceback, frame, or code object
# 陷阱2: ismethod vs isfunction
class Foo:
def bar(self):
pass
print(inspect.isfunction(Foo.bar)) # True(未绑定)
print(inspect.ismethod(Foo().bar)) # True(已绑定)
# 陷阱3: signature.bind 在参数不匹配时抛 TypeError
def f(a, b):
pass
try:
sig = inspect.signature(f)
sig.bind(1, 2, 3) # 多了参数
except TypeError as e:
print(e) # too many positional arguments
何时使用
inspect.signature: 编写通用装饰器、序列化框架、参数验证inspect.getsource: 文档生成、代码分析工具inspect.stack: 调试、日志记录(注意性能开销)inspect.getmembers: 遍历类的属性和方法
6.8 AST 操作: 代码变换
Java/Kotlin 对比
java
// Java: Annotation Processing 在编译期操作 AST
// 但只能生成新文件,不能修改现有代码
@SupportedAnnotationTypes("*")
public class MyProcessor extends AbstractProcessor {
@Override
public boolean process(Set<? extends TypeElement> annotations,
RoundEnvironment roundEnv) {
// 只能读取注解,生成新文件
// 不能修改被注解的类本身
return true;
}
}
// Java Compiler Tree API 可以操作 AST,但极其复杂
// 通常用 Lombok、MapStruct 等工具封装
kotlin
// Kotlin: 编译器插件可以操作 Kotlin 的 IR(中间表示)
// kotlinx.serialization、KSP(Kotlin Symbol Processing)都是基于此
// 但需要完整的 Gradle 插件基础设施,开发成本高
核心差异 : Python 的 ast 模块可以在运行时解析、变换、执行代码。Java/Kotlin 的代码变换只能在编译期,且需要复杂的工具链。
Python 实现
python
import ast
import textwrap
# === 1. ast.parse: 解析源码为 AST ===
source = """
def greet(name):
return f'Hello, {name}!'
class User:
def __init__(self, name):
self.name = name
"""
tree = ast.parse(source)
print(type(tree)) # <class 'ast.Module'>
# === 2. ast.dump: 查看 AST 结构 ===
# indent=True 让输出可读(Python 3.9+)
simple_source = "x = 1 + 2"
simple_tree = ast.parse(simple_source, mode="exec")
print(ast.dump(simple_tree, indent=2))
# Module(
# body=[
# Assign(
# targets=[Name(id='x', ctx=Store())],
# value=BinOp(
# left=Constant(value=1),
# op=Add(),
# right=Constant(value=2)))],
# type_ignores=[])
# === 3. ast.NodeVisitor: 遍历 AST ===
class FunctionCollector(ast.NodeVisitor):
"""收集源码中所有的函数定义"""
def __init__(self):
self.functions: list[dict] = []
def visit_FunctionDef(self, node):
# 获取参数列表
args = [arg.arg for arg in node.args.args]
# 获取装饰器
decorators = []
for dec in node.decorator_list:
if isinstance(dec, ast.Name):
decorators.append(dec.id)
elif isinstance(dec, ast.Attribute):
decorators.append(ast.dump(dec))
self.functions.append({
"name": node.name,
"args": args,
"lineno": node.lineno,
"decorators": decorators,
"docstring": ast.get_docstring(node),
})
# 继续遍历子节点(比如函数内部的嵌套函数)
self.generic_visit(node)
source = textwrap.dedent("""
@staticmethod
def helper(x, y):
'''A helper function.'''
return x + y
def main():
def inner():
pass
return inner
""")
collector = FunctionCollector()
collector.visit(ast.parse(source))
for func in collector.functions:
print(func)
# {'name': 'helper', 'args': ['x', 'y'], 'lineno': 2, 'decorators': ['staticmethod'], 'docstring': 'A helper function.'}
# {'name': 'main', 'args': [], 'lineno': 7, 'decorators': [], 'docstring': None}
# {'name': 'inner', 'args': [], 'lineno': 8, 'decorators': [], 'docstring': None}
# === 4. ast.NodeTransformer: 变换 AST ===
class CallLogger(ast.NodeTransformer):
"""
在每个函数体的开头插入一行日志代码。
等价于在每个函数定义后自动加 print(f"calling: {func_name}")
"""
def visit_FunctionDef(self, node):
# 先递归处理子节点
self.generic_visit(node)
# 构造插入的代码:print(f"calling: {func_name}")
log_call = ast.Expr(
value=ast.Call(
func=ast.Name(id='print', ctx=ast.Load()),
args=[
ast.JoinedStr(values=[
ast.Constant(value=f"calling: "),
ast.FormattedValue(
value=ast.Constant(value=node.name),
conversion=-1,
format_spec=None,
),
])
],
keywords=[],
)
)
# 在函数体开头插入
node.body.insert(0, log_call)
# 修复行号(让错误信息指向正确的位置)
ast.fix_missing_locations(node)
return node
# 变换代码
original_code = textwrap.dedent("""
def add(a, b):
return a + b
def multiply(a, b):
return a * b
""")
tree = ast.parse(original_code)
transformer = CallLogger()
new_tree = transformer.visit(tree)
# 编译并执行变换后的代码
code_obj = compile(new_tree, "<transformed>", "exec")
namespace = {}
exec(code_obj, namespace)
# 调用变换后的函数
print(namespace["add"](3, 4))
# calling: add
# 7
print(namespace["multiply"](3, 4))
# calling: multiply
# 12
# === 5. 实战:自动性能插桩 ===
class TimingTransformer(ast.NodeTransformer):
"""
自动在函数中插入计时代码。
变换前: def foo(): ...
变换后: def foo():
import time
_start = time.perf_counter()
try:
... (原函数体)
finally:
_elapsed = time.perf_counter() - _start
print(f"foo took {_elapsed:.4f}s")
"""
def visit_FunctionDef(self, node):
self.generic_visit(node)
func_name = node.name
# import time
import_stmt = ast.Import(names=[ast.alias(name='time', asname=None)])
# _start = time.perf_counter()
start_assign = ast.Assign(
targets=[ast.Name(id='_start', ctx=ast.Store())],
value=ast.Call(
func=ast.Attribute(
value=ast.Name(id='time', ctx=ast.Load()),
attr='perf_counter',
ctx=ast.Load(),
),
args=[],
keywords=[],
),
)
# _elapsed = time.perf_counter() - _start
elapsed_assign = ast.Assign(
targets=[ast.Name(id='_elapsed', ctx=ast.Store())],
value=ast.BinOp(
left=ast.Call(
func=ast.Attribute(
value=ast.Name(id='time', ctx=ast.Load()),
attr='perf_counter',
ctx=ast.Load(),
),
args=[],
keywords=[],
),
op=ast.Sub(),
right=ast.Name(id='_start', ctx=ast.Load()),
),
)
# print(f"{func_name} took {_elapsed:.4f}s")
print_stmt = ast.Expr(
value=ast.Call(
func=ast.Name(id='print', ctx=ast.Load()),
args=[
ast.JoinedStr(values=[
ast.Constant(value=f"{func_name} took "),
ast.FormattedValue(
value=ast.Name(id='_elapsed', ctx=ast.Load()),
conversion=-1,
format_spec=ast.JoinedStr(values=[
ast.Constant(value='.4f'),
]),
),
ast.Constant(value='s'),
])
],
keywords=[],
)
)
# try: ... finally: ...
try_body = node.body # 原函数体
final_body = [elapsed_assign, print_stmt]
try_finally = ast.Try(
body=try_body,
handlers=[],
orelse=[],
finalbody=final_body,
)
# 替换函数体
node.body = [import_stmt, start_assign, try_finally]
ast.fix_missing_locations(node)
return node
# 测试
test_code = textwrap.dedent("""
def slow_function():
total = 0
for i in range(100000):
total += i
return total
def fast_function():
return 42
""")
tree = ast.parse(test_code)
timed_tree = TimingTransformer().visit(tree)
code_obj = compile(timed_tree, "<timed>", "exec")
ns = {}
exec(code_obj, ns)
print(ns["slow_function"]())
# slow_function took 0.00XXs
# 4999950000
print(ns["fast_function"]())
# fast_function took 0.0000s
# 42
# === 6. 实战:代码静态分析 ===
class CodeAnalyzer(ast.NodeVisitor):
"""静态分析:检测代码中的潜在问题"""
def __init__(self):
self.issues: list[str] = []
self._current_function = None
def visit_FunctionDef(self, node):
old_func = self._current_function
self._current_function = node.name
# 检查:函数参数过多
arg_count = len(node.args.args)
if arg_count > 5:
self.issues.append(
f"Line {node.lineno}: '{node.name}' has {arg_count} parameters "
f"(consider refactoring)"
)
# 检查:缺少文档字符串
if not ast.get_docstring(node):
self.issues.append(
f"Line {node.lineno}: '{node.name}' missing docstring"
)
self.generic_visit(node)
self._current_function = old_func
def visit_ExceptHandler(self, node):
# 检查:bare except(捕获所有异常)
if node.type is None:
self.issues.append(
f"Line {node.lineno}: bare 'except:' catches all exceptions, "
f"use 'except Exception:' instead"
)
self.generic_visit(node)
def visit_Call(self, node):
# 检查:使用 eval/exec
if isinstance(node.func, ast.Name) and node.func.id in ('eval', 'exec'):
self.issues.append(
f"Line {node.lineno}: use of '{node.func.id}()' is a security risk"
)
self.generic_visit(node)
def visit_Compare(self, node):
# 检查:用 == 比较 None(应该用 is)
for op, comparator in zip(node.ops, node.comparators):
if isinstance(op, ast.Eq) and isinstance(comparator, ast.Constant) and comparator.value is None:
self.issues.append(
f"Line {node.lineno}: use 'is None' instead of '== None'"
)
self.generic_visit(node)
# 分析代码
sample_code = textwrap.dedent("""
def process_data(a, b, c, d, e, f, g):
result = eval(input("Enter expression: "))
try:
do_something()
except:
pass
if x == None:
return None
""")
# 注意:上面的代码本身不能运行,只是用来做 AST 分析
analyzer = CodeAnalyzer()
analyzer.visit(ast.parse(sample_code))
for issue in analyzer.issues:
print(issue)
# Line 2: 'process_data' has 7 parameters (consider refactoring)
# Line 2: 'process_data' missing docstring
# Line 3: use of 'eval()' is a security risk
# Line 5: bare 'except:' catches all exceptions, use 'except Exception:' instead
# Line 7: use 'is None' instead of '== None'
# === 7. ast.unparse: AST 还原为源码(Python 3.9+)===
source = "x = [i ** 2 for i in range(10)]"
tree = ast.parse(source)
# 修改 AST
tree.body[0].value.elt.right = ast.Constant(value=3) # i**2 -> i**3
print(ast.unparse(tree)) # x = [i ** 3 for i in range(10)]
核心差异
| 维度 | Java Annotation Processing | Python ast |
|---|---|---|
| 执行时机 | 编译期 | 运行时 |
| 修改能力 | 只能生成新文件 | 可以修改现有代码 |
| 调试 | 困难(生成的代码) | 简单(直接 print AST) |
| 工具链 | 需要 javac/maven/gradle | 零依赖 |
| 粒度 | 注解驱动的粗粒度 | AST 节点级别的细粒度 |
常见陷阱
python
# 陷阱1: 修改 AST 后必须调用 ast.fix_missing_locations
# 否则错误信息和行号会错乱
new_node = ast.parse("x = 1")
# 如果手动构造或修改了节点,行号信息可能丢失
ast.fix_missing_locations(new_node)
# 陷阱2: ast.parse 默认 mode="exec"
# 如果解析表达式,用 mode="eval"
ast.parse("1 + 2", mode="eval") # OK
try:
ast.parse("x = 1", mode="eval") # SyntaxError! 赋值不是表达式
except SyntaxError:
pass
# 陷阱3: ast.Constant 在 Python 3.8+ 统一了之前的 Num, Str, Bytes, NameConstant
# Python 3.7 及以下需要处理不同的节点类型
# 3.10+ 只需要处理 ast.Constant
# 陷阱4: ast.unparse 在 Python 3.9+ 才可用
# 3.8 及以下需要自己实现或用第三方库(astor、codegen)
何时使用
- 代码分析: 静态检查、代码度量、依赖分析
- 代码变换: 自动插桩(日志、计时、权限检查)、代码混淆
- 代码生成: 从 DSL 生成 Python 代码、从配置生成 API 客户端
ast.parse+compile+exec: 实现自定义的代码加载和执行流程
本章总结: 元编程能力对比
| 能力 | Java | Kotlin | Python |
|---|---|---|---|
| 属性访问拦截 | 无(需 ByteBuddy) | 属性委托(编译期) | 描述符协议(运行时) |
| 类创建控制 | 注解处理器(编译期) | 编译器插件(编译期) | 元类(运行时) |
| 动态属性 | 无 | 无 | __getattr__/__getattribute__ |
| 子类自动注册 | 无 | sealed class(编译期) | __init_subclass__ |
| 动态代码执行 | ScriptEngine(隔离) | 无 | exec/eval(原生) |
| 动态类型创建 | ByteBuddy(第三方) | 无 | type()/types(标准库) |
| 运行时自省 | Reflection(有限) | KReflection(重量级) | inspect(轻量级) |
| 代码变换 | APT(编译期) | 编译器插件(编译期) | ast(运行时) |
关键认知: Python 的元编程全部是运行时的、语言内置的、零依赖的。Java/Kotlin 的元编程全部是编译期的、需要工具链的、有门槛的。这不是"更好"或"更差"的区别,而是设计哲学的不同:Python 信任开发者,把能力暴露出来;Java/Kotlin 通过限制来保证安全。
实践建议:
- 优先用
property和__init_subclass__,它们覆盖 80% 的元编程需求 - 描述符用于可复用的属性逻辑(验证、类型检查、延迟计算)
- 元类是最后的手段,大多数场景可以用装饰器替代
__getattr__优先于__getattribute__,后者容易出递归 bugexec/eval对不可信输入必须用 AST 白名单ast模块是 Python 最独特的超能力------运行时代码变换,Java/Kotlin 做不到