第7章模式匹配与正则表达式

[1. 不用正则表达式来查找文本模式](#1. 不用正则表达式来查找文本模式)
[2. 用正则表达式来查找文本模式](#2. 用正则表达式来查找文本模式)
- [2.1 创建正则表达式（Regex）对象](#2.1 创建正则表达式（Regex）对象)
- [2.2 匹配Regex对象](#2.2 匹配Regex对象)
[3. 用正则表达式匹配更多模式](#3. 用正则表达式匹配更多模式)
- [3.1 利用括号分组](#3.1 利用括号分组)
- [3.2 用管道匹配多个分组](#3.2 用管道匹配多个分组)
- [3.3 用问号实现可选匹配](#3.3 用问号实现可选匹配)
- [3.4 用星号匹配零次或多次](#3.4 用星号匹配零次或多次)
- [3.5 用加号匹配一次或多次](#3.5 用加号匹配一次或多次)
- [3.6 用花括号匹配特定次数](#3.6 用花括号匹配特定次数)
[4. 贪心和非贪心匹配](#4. 贪心和非贪心匹配)
[5. findall() 方法](#5. findall() 方法)
[6. 字符分类](#6. 字符分类)
[7. 建立自己的字符分类](#7. 建立自己的字符分类)
[8. 插入字符和美元字符](#8. 插入字符和美元字符)
[9. 通配字符](#9. 通配字符)
- [9.1 用点-星匹配所有字符](#9.1 用点-星匹配所有字符)
- [9.2 用句点字符匹配换行符](#9.2 用句点字符匹配换行符)
[10. 不区分大小写的匹配](#10. 不区分大小写的匹配)
[11. 用 sub() 方法替换字符串](#11. 用 sub() 方法替换字符串)

1. 不用正则表达式来查找文本模式

python 复制代码

def isPhoneNumber(text):
    if len(text) != 11:
        return False

    for i in range(0, len(text)):
        if (i == 3 or i == 7) and text[i] != "-":
            return False
        elif i != 3 and i != 7 and not text[i].isdecimal():
            return False

    return True


text = "123-456-789"
print(text)
print(isPhoneNumber(text))

2. 用正则表达式来查找文本模式

2.1 创建正则表达式（Regex）对象

python 复制代码

import re

text = re.compile(r'\d\d\d-\d\d\d-\d\d\d')

2.2 匹配Regex对象

python 复制代码

import re

text = re.compile(r'\d\d\d-\d\d\d-\d\d\d')
match = text.search("~~~123-456-789~~~")
print(match.group())

3. 用正则表达式匹配更多模式

3.1 利用括号分组

python 复制代码

import re

text = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d)')
match = text.search("~~~123-456-789~~~")
print(match.group(1))
# 123
print(match.group(2))
# 456-789
print(match.groups())
# ('123', '456-789')

3.2 用管道匹配多个分组

| ：管道

python 复制代码

import re

text = re.compile(r'456|123')
match = text.search("123-456-789")
print(match.group())
# 123

3.3 用问号实现可选匹配

python 复制代码

import re

text = re.compile(r'\d\d\d(~)?\d\d\d')
match = text.search("123123")
print(match.group())
# 123123

match = text.search("123~123")
print(match.group())
# 123~123

3.4 用星号匹配零次或多次

python 复制代码

import re

text = re.compile(r'\d\d\d(~)*\d\d\d')
match = text.search("123123")
print(match.group())
# 123123

match = text.search("123~~~123")
print(match.group())
# 123~~~123

3.5 用加号匹配一次或多次

python 复制代码

import re

text = re.compile(r'\d\d\d(~)+\d\d\d')
match = text.search("123~123")
print(match.group())
# 123~123

match = text.search("123~~~123")
print(match.group())
# 123~~~123

3.6 用花括号匹配特定次数

python 复制代码

import re

text = re.compile(r'\d\d\d(~){3,5}\d\d\d')
match = text.search("123~~~123")
print(match.group())
# 123~~~123

match = text.search("123~~~~~123")
print(match.group())
# 123~~~~~123

4. 贪心和非贪心匹配

贪心匹配：尽可能匹配最长的字符串
非贪心匹配：尽可能匹配最短的字符串

python 复制代码

import re

text = re.compile(r'(123 ){2,4}')
match = text.search("123 123 123 123 123 ")
print(match.group())
# 123 123 123 123

text = re.compile(r'(123 ){2,4}?')
match = text.search("123 123 123 123 123 ")
print(match.group())
# 123 123

5. findall() 方法

python 复制代码

import re

text = re.compile(r'\d\d\d-\d\d\d-\d\d\d')
match = text.search("~~~123-456-789~~~111-222-333~~~")
print(match.group())
# 123-456-789

match = text.findall("~~~123-456-789~~~111-222-333~~~")
print(match)
# ['123-456-789', '111-222-333']

6. 字符分类

编写字符分类	表示
\d	0~9的任何数字
\D	除0~9的数字以外的任何字符
\w	任何字母、数字和下划线字符
\W	除字母、数字和下划线以外的任何字符
\s	空格、制表符或换行符
\S	除空格、制表符和换行符以外的任何字符

7. 建立自己的字符分类

python 复制代码

import re

text = re.compile(r'[0-5]')
match = text.findall("1a2b3c4d")
print(match)
# ['1', '2', '3', '4']

text = re.compile(r'[abc]')
match = text.findall("1a2b3c4d")
print(match)
# ['a', 'b', 'c']

text = re.compile(r'[^abc]')
match = text.findall("1a2b3c4d")
print(match)
# ['1', '2', '3', '4', 'd']

8. 插入字符和美元字符

^ ：以指定文本开始
$ ：以指定文本结束

python 复制代码

import re

text = re.compile(r'^\d\d\d')
match = text.search("123abc456")
print(match)
# <re.Match object; span=(0, 3), match='123'>

text = re.compile(r'\d\d\d$')
match = text.search("123abc456")
print(match)
# <re.Match object; span=(6, 9), match='456'>

9. 通配字符

. ：匹配换行符之外的所有字符

python 复制代码

import re

text = re.compile(r'..23')
match = text.findall("123abc23")
print(match)
# ['bc23']

9.1 用点-星匹配所有字符

python 复制代码

import re

text = re.compile(r'123(.*)456(.*)')
match = text.findall("123abc456def")
print(match)
# [('abc', 'def')]

9.2 用句点字符匹配换行符

re.DOTALL ：让句点字符匹配所有字符（包括换行符）

python 复制代码

import re

text = re.compile(r'.*')
match = text.search("123abc\n456def")
print(match.group())
# 123abc

text = re.compile(r'.*', re.DOTALL)
match = text.search("123abc\n456def")
print(match.group())
# 123abc\n456def

10. 不区分大小写的匹配

re.I ：不区分大小写

python 复制代码

import re

text = re.compile(r'abc', re.I)
match = text.findall("abcABC")
print(match)
# ['abc', 'ABC']

11. 用 sub() 方法替换字符串

python 复制代码

import re

text = re.compile(r'ABC\w*')
match = text.sub("abc", "ABC : 123")
print(match)
# abc : 123

第7章 模式匹配与正则表达式

目录

1. 不用正则表达式来查找文本模式

2. 用正则表达式来查找文本模式

2.1 创建正则表达式（Regex）对象

2.2 匹配Regex对象

3. 用正则表达式匹配更多模式

3.1 利用括号分组

3.2 用管道匹配多个分组

3.3 用问号实现可选匹配

3.4 用星号匹配零次或多次

3.5 用加号匹配一次或多次

3.6 用花括号匹配特定次数

4. 贪心和非贪心匹配

5. findall() 方法

6. 字符分类

7. 建立自己的字符分类

8. 插入字符和美元字符

9. 通配字符

9.1 用点-星匹配所有字符

9.2 用句点字符匹配换行符

10. 不区分大小写的匹配

11. 用 sub() 方法替换字符串

第7章模式匹配与正则表达式