Python进阶--正则表达式

[1. 基础匹配](#1. 基础匹配)

[2. 元字符匹配](#2. 元字符匹配)

1. 基础匹配

正则表达式，又称规则表达式（Regular Expression），是使用单个字符串来描述、匹配某个句法规则的字符串，常被用来检索、替换那些符合某个模式（规则）的文本。

简单来说，正则表达式就是使用字符串定义规则，并通过规则去验证字符串是否匹配。

比如，验证一个字符串是否是符合条件的电子邮箱地址，只需要配置好正则规则，即可匹配任意邮箱。比如通过正则规则： (^[\w-]+(\.[\w-]+)*@[\w-]+(\.[\w-]+)+$) 即可匹配一个字符串是否是标准邮箱格式。但如果不使用正则，使用if else来对字符串做判断就非常困难了。

Python正则表达式，使用re模块，并基于re模块中三个基础方法来做正则匹配。分别是：match、search、findall 三个基础方法。

python 复制代码

# 要导入re模块
import re

s = 'hello Python ,you should study hard'

res = re.match('hello',s)
print(res) # <re.Match object; span=(0, 5), match='hello'>
# 可以通过span方法看到匹配的下标范围
print(res.span()) # (0,5)
# 通过group方法可以看到匹配的内容
print(res.group()) # hello

python 复制代码

import re

s = '1hello Python ,you should study hard'

res = re.match('hello',s)
print(res) #None
# 不能以为是s里面有hello就能匹配
# match是跟s的头部匹配，头部不匹配就不继续往后看

python 复制代码

import re

s = 'hello Python ,you can say hello'

res1 = re.search('hello',s)
print(res1) # <re.Match object; span=(0, 5), match='hello'>

res2 = re.search('tell',s)
print(res2) # None

python 复制代码

import re

s = 'hello Python ,you can say hello'

res1 = re.findall('hello',s)
print(res1) # ['hello', 'hello']

res2 = re.findall('tell',s)
print(res2) # []

2. 元字符匹配

正则最强大的功能在于元字符匹配规则。

python 复制代码

import re

s = 'hello Python 123456@163.com'

# 字符串的r标记，表示当前字符串是原始字符串
# 即内部的转义字符无效,而是普通字符
res1 = re.findall(r'\d', s)
print(res1) # ['1', '2', '3', '4', '5', '6', '1', '6', '3']

res2 = re.findall(r'[a-zA-Z]',s)
print(res2)
# ['h', 'e', 'l', 'l', 'o', 'P', 'y', 't', 'h', 'o', 'n', 'c', 'o', 'm']

python 复制代码

import re
str = "c5d252dD"
# 设置匹配规则：只能是字母或数字，且长度限制在6~10位
r = '^[0-9a-zA-Z]{6,10}$'
print(re.findall(r,str)) # ['c5d252dD']
# 返回结果不为空，说明符合条件

python 复制代码

import re
str = "03254265"
# 设置匹配规则：要求纯数字，长度5-11，第一位不为0
# 第一个设置[1,9]，说明第一位数字不为0
# 第二个设置[0,9]，说明内容必须是纯数字
# 第三个设置{4,10}，因为我们已经设置了第一位的范围，所以剩余长度就是4-10
# ^表示从头开始找，$表示找到尾
r = '^[1,9][0,9]{4,10}$'
print(re.findall(r,str)) # []
# 返回结果为空，说明不符合条件

python 复制代码

import re
# 匹配邮箱地址，只允许qq、163、gmail这三种邮箱地址
# [\w-]表示匹配单词字符和-
# +表示匹配的字符出现1到无数次
# (\.[\w-]+)* 匹配零个或多个以点(.)开头后跟一个或多个单词字符
# @ 用于分隔用户名和域名
# (qq|163|gmail)匹配"qq"、"163"或"gmail"中的任意一个
r=r'(^[\w-]+(\.[\w-]+)*@(qq|163|gmail)(\.[\w-]+)+$)'
s ='a.b.c.d.e.f.g@qq.com.a.z.c.d.e'
# 对于findall，如果r内部用了()进行分组
# 那么findall会把每一个满足内部规则的内容返回
# 因此我们要在最外面再加上一个()，用来找到满足内部所有规则的结果
print(re.findall(r,s))
# [('a.b.c.d.e.f.g@qq.com.a.z.c.d.e', '.g', 'qq', '.e')]