【Python基础入门 re模块实现正则表达式操作】

文章目录

1、匹配字符串

（1）使用match()方法进行匹配

re.match(pattern, string, flags=0)

从字符串的起始位置开始进行匹配，若开头的部分符合模式则返回匹配对象，否则返回 None

flags：I表示不区分大小写的匹配

bash 复制代码

# 从字符串的起始位置开始
import re
result = re.match(r'Hello', 'Hello world!')
print(result.group())

输出结果：Hello

不区分大小写的：

bash 复制代码

import re
result = re.match(r'Hello', 'hello world!', re.I)
print(result.group())

输出结果：hello

练习题：匹配中国移动手机号码

bash 复制代码

import re

pattern = r'(13[4-9]\d{8})$|(15[01289]\d{8})$'
mobile = '13634232323'
match = re.match(pattern, mobile)
if match is None:
    print(mobile, '不是有效的电话号码！')
else:
    print(mobile, '是有效的电话号码！')

关于pattern的解释：

|：这是一个逻辑或操作符，它将整个正则表达式分成了两部分，表示可以匹配左边的模式或者右边的模式。

(13 $4-9$ \d{8})$：

13：直接匹配字符串 "13"，代表中国移动的 13 号段。

$4-9$ ：匹配第三位数字，范围是 4 到 9，涵盖了 134 - 139 号段。

\d{8}：匹配后面的 8 位数字，这样整个号码长度就是 11 位。
KaTeX parse error: Undefined control sequence: \d at position 50: ...字符串。 (15 $01289$ \̲d̲{8})：

15：匹配字符串 "15"，代表中国移动的 15 号段。

$01289$ ：匹配第三位数字，允许是 0、1、2、8、9，也就是 150、151、152、158、159 号段。

\d{8} 和 $：含义与前面相同，分别表示匹配 8 位数字和字符串结束位置。

（2）使用searc()方法进行匹配

re.search(pattern, string, flags=0)

在字符串中查找首个与模式匹配的位置，若匹配成功则返回一个匹配对象，若未找到则返回 None。相当于从字符串的任意位置开始匹配

bash 复制代码

# 检测是否出现了危险字符
import re
pattern = r'黑客|抓包|监听'
about = '我是一名程序员，我喜欢看计算机相关的图书，想研究下监听'
match = re.search(pattern, about)
if match is None:
    print(about, '@安全词汇')
else:
    print(about, '@出现了危险词汇：', match.group())

（3）使用findall()方法进行匹配

re.findall(pattern, string, flags=0)

找出字符串中所有与模式匹配的非重叠子串，并以列表的形式返回。

bash 复制代码

import re

pattern = r'hello'
about = 'hello word Hello python hello java'
match = re.findall(pattern, about, re.I)
print(match)

运行结果： $'hello', 'Hello', 'hello'$

没找到返回空的列表：\[\]

2、替换字符串

bash 复制代码

import re

pattern = r'1[34578]\d{9}'
string = '中奖号码为：84978987 联系电话为：13612121212'
result = re.sub(pattern, '1**********', string)
print(result)

运行结果：中奖号码为：84978987 联系电话为：1**********

3、分割字符串

bash 复制代码

import re
pattern = r'[? | &]'
url = 'http://www.baidu.com/login.jsp?username="mr"&pwd="123456'
result = re.split(pattern, url)
print(result)