Python编程实战 - Python实用工具与库 - 正则表达式匹配（re 模块）

正则表达式（Regular Expression ，简称 Regex ）是用于字符串模式匹配和文本处理 的强大工具。在 Python 中，我们使用标准库 re 来执行匹配、搜索、替换等操作。

一、re 模块简介

导入方式：

python 复制代码

import re

常见功能：

匹配字符串：判断是否符合某种模式
提取内容：从文本中抽取特定信息
替换内容：按规则替换文本
分割字符串：按正则规则拆分字符串

二、基本匹配方法

1. `re.match()` ------ 从字符串起始位置匹配

python 复制代码

import re

result = re.match(r"Hello", "Hello Python")
if result:
    print("匹配成功:", result.group())

输出：

makefile 复制代码

匹配成功: Hello

说明：re.match 只匹配字符串开头部分。

2. `re.search()` ------ 搜索整个字符串

python 复制代码

result = re.search(r"Python", "Hello Python World")
print(result.group())

输出：

复制代码

Python

说明：search 会从整个字符串中查找第一个匹配项。

3. `re.findall()` ------ 返回所有匹配结果列表

python 复制代码

text = "电话：12345，传真：67890"
numbers = re.findall(r"\d+", text)
print(numbers)

输出：

css 复制代码

['12345', '67890']

4. `re.sub()` ------ 替换匹配的内容

python 复制代码

text = "我喜欢Java，也喜欢JavaScript"
new_text = re.sub(r"Java", "Python", text)
print(new_text)

输出：

复制代码

我喜欢Python，也喜欢PythonScript

5. `re.split()` ------ 按模式分割字符串

python 复制代码

text = "苹果,香蕉;橘子|葡萄"
fruits = re.split(r"[,;|]", text)
print(fruits)

输出：

css 复制代码

['苹果', '香蕉', '橘子', '葡萄']

三、正则表达式常用符号表

符号	含义	示例
`.`	任意一个字符（除换行）	`a.c` → 匹配 `abc`, `a9c`
`\d`	数字 [0-9]	`\d+` → 匹配一个或多个数字
`\w`	单词字符（字母/数字/下划线）	`\w+` → 匹配单词
`\s`	空白字符（空格、制表符等）	`\s+`
`^`	匹配字符串开始	`^Hello`
`$`	匹配字符串结尾	`world$`
`[]`	字符集	`[abc]` → 匹配 a、b 或 c
`[^]`	非字符集	`[^0-9]` → 非数字
`*`	重复零次或多次	`a*`
`+`	重复一次或多次	`a+`
`?`	重复零次或一次	`a?`
`{m,n}`	重复 m 到 n 次	`\d{3,5}` → 匹配 3~5 位数字
`()`	分组	`(ab)+`
`	`	或关系	`cat	dog`

四、分组与提取

1. 提取邮箱地址示例

python 复制代码

text = "请联系我：email1@test.com 或 email2@sample.org"
emails = re.findall(r"[\w.-]+@[\w.-]+\.\w+", text)
print(emails)

输出：

css 复制代码

['email1@test.com', 'email2@sample.org']

2. 使用括号捕获分组

python 复制代码

text = "价格：¥99.5"
match = re.search(r"¥(\d+\.\d+)", text)
print("提取价格：", match.group(1))

输出：

复制代码

提取价格： 99.5

3. 命名分组

python 复制代码

text = "日期：2025-11-11"
match = re.search(r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})", text)
print(match.groupdict())

输出：

arduino 复制代码

{'year': '2025', 'month': '11', 'day': '11'}

五、实战案例：提取网页链接与标题

python 复制代码

import re

html = """
<a href="https://example.com/page1">第一页</a>
<a href="https://example.com/page2">第二页</a>
"""

pattern = r'<a href="(https?://[^"]+)">([^<]+)</a>'
matches = re.findall(pattern, html)

for url, title in matches:
    print(f"标题：{title}，链接：{url}")

输出：

arduino 复制代码

标题：第一页，链接：https://example.com/page1
标题：第二页，链接：https://example.com/page2

六、提升性能的小技巧

使用原始字符串 ：避免转义混乱，如 r"\d+"。
编译正则表达式（多次使用时提升性能）：
python 复制代码
```
pattern = re.compile(r"\d+")
print(pattern.findall("123abc456"))
```

懒惰匹配 （最短匹配）： *?, +?, {m,n}?

示例：

python 复制代码

text = "<p>Python</p><p>Regex</p>"
print(re.findall(r"<p>.*?</p>", text))

输出：

xml 复制代码

['<p>Python</p>', '<p>Regex</p>']

七、总结

功能	方法	说明
起始匹配	`re.match()`	从字符串开头匹配
搜索匹配	`re.search()`	搜索第一个匹配
全部匹配	`re.findall()`	返回所有结果列表
替换	`re.sub()`	替换匹配内容
拆分	`re.split()`	按模式拆分字符串
编译优化	`re.compile()`	预编译正则表达式

Python编程实战 - Python实用工具与库 - 正则表达式匹配（re 模块）

一、re 模块简介

二、基本匹配方法

1. re.match() ------ 从字符串起始位置匹配

2. re.search() ------ 搜索整个字符串

3. re.findall() ------ 返回所有匹配结果列表

4. re.sub() ------ 替换匹配的内容

5. re.split() ------ 按模式分割字符串

三、正则表达式常用符号表

四、分组与提取

1. 提取邮箱地址示例

2. 使用括号捕获分组

3. 命名分组

五、实战案例：提取网页链接与标题

六、提升性能的小技巧

七、总结

1. `re.match()` ------ 从字符串起始位置匹配

2. `re.search()` ------ 搜索整个字符串

3. `re.findall()` ------ 返回所有匹配结果列表

4. `re.sub()` ------ 替换匹配的内容

5. `re.split()` ------ 按模式分割字符串