15.Python正则表达式入门：掌握文本处理的利器

正则表达式（Regular Expression）是处理文本数据的强大工具。在Python中，re模块提供了完整的正则表达式功能。本文将介绍5个核心方法及其应用场景，助您快速上手。

一、正则表达式快速入门

正则表达式通过特殊字符组合实现以下功能：

文本搜索
数据验证
字符串替换
数据提取

基础元字符：

. 匹配任意字符（除换行符）
\d 匹配数字
\w 匹配字母/数字/下划线
^ 匹配字符串开头
$ 匹配字符串结尾
* 匹配0次或多次
+ 匹配1次或多次
? 匹配0次或1次

二、Python re模块核心方法

1. re.match() - 字符串开头匹配

作用：从字符串起始位置匹配模式
返回值：成功返回Match对象，失败返回None

python 复制代码

import re

pattern = r"Hello"
text = "Hello World"

result = re.match(pattern, text)
if result:
    print("Match found:", result.group())  # 输出: Hello
else:
    print("No match")

2. re.search() - 全局搜索匹配

作用：扫描整个字符串寻找第一个匹配项

python 复制代码

text = "Python version: 3.11.5"
pattern = r"\d+\.\d+\.\d+"

result = re.search(pattern, text)
print(result.group())  # 输出: 3.11.5

3. re.findall() - 查找所有匹配

作用：返回所有非重叠匹配的字符串列表

python 复制代码

text = "Emails: user@test.com, admin@demo.org"
pattern = r"\w+@\w+\.\w+"

emails = re.findall(pattern, text)
print(emails)  # ['user@test.com', 'admin@demo.org']

4. re.sub() - 字符串替换

作用：替换匹配内容

python 复制代码

text = "2023-08-15"
new_text = re.sub(r"-", "/", text)
print(new_text)  # 2023/08/15

5. re.split() - 模式分割

作用：根据模式分割字符串

python 复制代码

text = "Apple,Banana;Cherry|Date"
items = re.split(r"[,;|]", text)
print(items)  # ['Apple', 'Banana', 'Cherry', 'Date']

三、高级技巧

1. 分组提取

使用()捕获特定内容

python 复制代码

text = "Phone: 010-12345678"
pattern = r"(\d{3})-(\d{4,8})"

match = re.search(pattern, text)
if match:
    print("区号:", match.group(1))  # 010
    print("号码:", match.group(2))  # 12345678

2. 预编译正则表达式

提升重复使用效率

python 复制代码

pattern = re.compile(r"\b[A-Z]+\b")
text = "THIS is a TEST"
matches = pattern.findall(text)
print(matches)  # ['THIS', 'TEST']

3. 非贪婪匹配

使用?进行最小匹配

python 复制代码

text = "<div>content</div><p>text</p>"
matches = re.findall(r"<.*?>", text)
print(matches)  # ['<div>', '</div>', '<p>', '</p>']

四、注意事项

原始字符串 ：始终使用r""前缀避免转义问题
匹配优先级 ：
- match()只匹配开头
- search()匹配第一个出现
- findall()返回所有匹配
特殊字符 ：需要转义的字符：\.^$*+?{}[]|()

五、实战应用场景

数据清洗

python 复制代码

# 移除非数字字符
dirty_data = "Price: $12,345.67"
clean = re.sub(r"[^\d.]", "", dirty_data)
print(clean)  # 12345.67

日志分析

python 复制代码

log = "ERROR [2023-08-15 14:23:45] Connection timeout"
pattern = r"(ERROR|WARN) \[(.*?)\] (.*)"
match = re.search(pattern, log)

表单验证

python 复制代码

def validate_email(email):
    pattern = r"^[\w\.-]+@[\w\.-]+\.\w+$"
    return re.match(pattern, email) is not None

总结

正则表达式是每个Python开发者必备的技能。掌握这些方法后，您可以： ✅ 快速处理文本数据

✅ 实现复杂格式验证

✅ 高效提取关键信息

进阶建议：

学习更多元字符（如\s、\b）
了解正则表达式修饰符（如re.IGNORECASE）
练习复杂模式匹配

官方文档参考：re模块文档

通过这篇指南，您已经掌握了Python正则表达式的核心使用方法。现在可以尝试用正则表达式解决实际工作中的文本处理问题啦！