20260414 正则表达式及shell三剑客

一、正则表达式及shell三剑客

二、正则表达式概述

正则表达式作为一个 pattern，将 pattern 与要搜索的字符串进行匹配，以便查找一个或多个字符串。
正则表达式，自成体系，由普通字符（例如字符 a 到 z）和元字符组成的文字模式。
普通字符：没有显式指定为元字符的所有可打印和不可打印字符字符，包括所有大写和小写字母、所有数字、所有标点符号和其他一些符号。
元字符：除了普通字符之外的字符。
正则表达式，工具（vim、grep、less等）和程序语言（Perl、Python、C等）都使用正则表达式。

正则表达式分类：

普通正则表达式
扩展正则表示，支持更多的元字符。

2.1 环境准备

bash 复制代码

[root@centos7 ~ 18:57:12]# cat > words <<'EOF'
> cat
> category
> acat
> concatenate
> cbt
> c1t
> cCt
> c-t
> c.t
> dog
> EOF

2.2 普通字符

bash 复制代码

[root@centos7 ~ 19:05:09]# cat words | grep 'cat'

2.3 字符集

匹配除换行符（\n、\r）之外的任何单个字符，相等于 $\^\\n\\r$ 。

bash 复制代码

[root@centos7 ~ 19:05:41]# cat words | grep 'c.t'

...

匹配 $...$ 中的任意一个字符。

bash 复制代码

[root@centos7 ~ 19:06:42]# cat words | grep 'c[ab]t'

2.4 $a-z$ $A-Z$ $0-9$

$a-z$ ，匹配所有小写字母。
$A-Z$ ，匹配所有大写字母。
$0-9$ ，匹配所有数字

bash 复制代码

[root@centos7 ~ 19:07:39]# cat words | grep 'c[a-z]t'

bash 复制代码

[root@centos7 ~ 19:08:43]# cat words | grep 'c[A-Z]t'

复制代码

[root@centos7 ~ 19:09:20]# cat words | grep 'c[0-9]t'

bash 复制代码

[root@centos7 ~ 19:09:53]# cat words | grep 'c[a-z0-9]t'

bash 复制代码

[root@centos7 ~ 19:10:38]# cat words | grep 'c[a-zA-Z0-9]t'

复制代码

# 要想匹配-符号，将改符号写在第一个位置
[root@centos7 ~ 19:11:10]# cat words | grep 'c[-a-zA-Z0-9]t'

三、三剑客

grep：过滤
sed：修改
awk：格式化输出

3.1 grep

3.1.1 grep 介绍

grep 是 Linux 系统中最重要的命令之一，其功能是从文本文件或管道数据流中筛选匹配的行及数据

3.1.2 grep 命令语法

过滤管道：command | grep $OPTION$ ... PATTERNS
过滤文件：grep $OPTION$ ... PATTERNS $FILE$ ...

bash 复制代码

[root@centos7 ~ 17:04:38]# grep --help
Usage: grep [OPTION]... PATTERN [FILE]...
Search for PATTERN in each FILE or standard input.
PATTERN is, by default, a basic regular expression (BRE).
Example: grep -i 'hello world' menu.h main.c

Regexp selection and interpretation:
  -E, --extended-regexp     PATTERN is an extended regular expression (ERE)
  -F, --fixed-strings       PATTERN is a set of newline-separated fixed strings
  -G, --basic-regexp        PATTERN is a basic regular expression (BRE)
  -P, --perl-regexp         PATTERN is a Perl regular expression
  -e, --regexp=PATTERN      use PATTERN for matching
  -f, --file=FILE           obtain PATTERN from FILE
  -i, --ignore-case         ignore case distinctions
  -w, --word-regexp         force PATTERN to match only whole words
  -x, --line-regexp         force PATTERN to match only whole lines
  -z, --null-data           a data line ends in 0 byte, not newline

Miscellaneous:
  -s, --no-messages         suppress error messages
  -v, --invert-match        select non-matching lines
  -V, --version             display version information and exit
      --help                display this help text and exit

Output control:
  -m, --max-count=NUM       stop after NUM matches
  -b, --byte-offset         print the byte offset with output lines
  -n, --line-number         print line number with output lines
      --line-buffered       flush output on every line
  -H, --with-filename       print the file name for each match
  -h, --no-filename         suppress the file name prefix on output
      --label=LABEL         use LABEL as the standard input file name prefix
  -o, --only-matching       show only the part of a line matching PATTERN
  -q, --quiet, --silent     suppress all normal output
      --binary-files=TYPE   assume that binary files are TYPE;
                            TYPE is 'binary', 'text', or 'without-match'
  -a, --text                equivalent to --binary-files=text
  -I                        equivalent to --binary-files=without-match
  -d, --directories=ACTION  how to handle directories;
                            ACTION is 'read', 'recurse', or 'skip'
  -D, --devices=ACTION      how to handle devices, FIFOs and sockets;
                            ACTION is 'read' or 'skip'
  -r, --recursive           like --directories=recurse
  -R, --dereference-recursive
                            likewise, but follow all symlinks
      --include=FILE_PATTERN
                            search only files that match FILE_PATTERN
      --exclude=FILE_PATTERN
                            skip files and directories matching FILE_PATTERN
      --exclude-from=FILE   skip files matching any file pattern from FILE
      --exclude-dir=PATTERN directories that match PATTERN will be skipped.
  -L, --files-without-match print only names of FILEs containing no match
  -l, --files-with-matches  print only names of FILEs containing matches
  -c, --count               print only a count of matching lines per FILE
  -T, --initial-tab         make tabs line up (if needed)
  -Z, --null                print 0 byte after FILE name

Context control:
  -B, --before-context=NUM  print NUM lines of leading context
  -A, --after-context=NUM   print NUM lines of trailing context
  -C, --context=NUM         print NUM lines of output context
  -NUM                      same as --context=NUM
      --group-separator=SEP use SEP as a group separator
      --no-group-separator  use empty string as a group separator
      --color[=WHEN],
      --colour[=WHEN]       use markers to highlight the matching strings;
                            WHEN is 'always', 'never', or 'auto'
  -U, --binary              do not strip CR characters at EOL (MSDOS/Windows)
  -u, --unix-byte-offsets   report offsets as if CRs were not there
                            (MSDOS/Windows)

3.1.3 文件准备

bash 复制代码

[root@centos7 ~ 17:11:18]# vim words
[root@centos7 ~ 17:12:15]# cat words 
cat
category
acat
concatenate
cbt
c1t
cCt
c-t
c.t
dog

3.1.4 模式选择和解释选项

-E 选项

支持扩展正则表达式，相当于 egrep 命令。

bash 复制代码

[root@centos7 ~ 17:15:18]# cat words | grep -E '(dog){3}'
# 或者
[root@centos7 ~ 17:15:33]# cat words | egrep '(dog){3}'
dogdogdog
dogdogdogdog

-e 选项

使用多个 -e 选项匹配多个PATTERNS。

bash 复制代码

[root@centos7 ~ 17:15:40]# cat words | grep -e 'cat' -e 'dog'
[root@centos7 ~ 17:16:35]#  cat words | egrep 'cat|dog'
cat
category
acat
concatenate
dog
dogdog
dogdogdog
dogdogdogdog

-f 选项

从文件读取多个 PATTERNS。

bash 复制代码

[root@centos7 ~ 17:16:39]# echo -e 'cat\ndog' > pattens_file
[root@centos7 ~ 17:17:53]# cat pattens_file 
cat
dog
[root@centos7 ~ 17:17:58]# cat words | grep -f pattens_file 
cat
category
acat
concatenate
dog
dogdog
dogdogdog
dogdogdogdog

-i 选项

忽略大小写匹配。

b 复制代码

[root@centos7 ~ 17:18:13]# cat words | grep -i 'cBt'
cbt

-w 选项

匹配整个单词。

bash 复制代码

[root@centos7 ~ 17:19:48]#  cat words | grep -w 'cat'
cat
[root@centos7 ~ 17:20:41]# cat words | grep '\bcat\b'
cat

-x 选项

匹配整行。

bash 复制代码

[root@centos7 ~ 18:30:03]# cat words | grep -x 'cat'
cat
[root@centos7 ~ 18:30:22]# cat words | grep '^cat$'
cat

3.2 sed

sed，英文全称 stream editor ，是一种非交互式的流编辑器，能够实现对文本非交互式的处理，功能很强大。

3.2.1 sed 命令语法

sed 帮助

bash 复制代码

[root@centos7 ~ 17:00:39]# sed --help
Usage: sed [OPTION]... {script-only-if-no-other-script} [input-file]...

  -n, --quiet, --silent
                 suppress automatic printing of pattern space
  -e script, --expression=script
                 add the script to the commands to be executed
  -f script-file, --file=script-file
                 add the contents of script-file to the commands to be executed
  --follow-symlinks
                 follow symlinks when processing in place
  -i[SUFFIX], --in-place[=SUFFIX]
                 edit files in place (makes backup if SUFFIX supplied)
  -c, --copy
                 use copy instead of rename when shuffling files in -i mode
  -b, --binary
                 does nothing; for compatibility with WIN32/CYGWIN/MSDOS/EMX (
                 open files in binary mode (CR+LFs are not treated specially))
  -l N, --line-length=N
                 specify the desired line-wrap length for the `l' command
  --posix
                 disable all GNU extensions.
  -r, --regexp-extended
                 use extended regular expressions in the script.
  -s, --separate
                 consider files as separate rather than as a single continuous
                 long stream.
  -u, --unbuffered
                 load minimal amounts of data from the input files and flush
                 the output buffers more often
  -z, --null-data
                 separate lines by NUL characters
  --help
                 display this help and exit
  --version
                 output version information and exit

sed 命令语法简写格式如下:

bash 复制代码

sed [option] [sed-command] [input-file]

总的来说 sed 命令主要由四部分构成:

sed 命令。
$option$ 命令行选项，用于改变 sed 的工作流程。
$sed-command$ 是具体的 sed 命令。
$input-file$ 输入数据，如果不指定，则默认从标准输入中读取。

示例1：模拟cat命令打印文件内容

bash 复制代码

[root@centos7 ~ 17:00:42]# cat > data.txt <<'EOF'
> I am studing sed
> I am www.laoma.cloud
> I am a Superman
> I am so handsome
> EOF

[root@centos7 ~ 17:02:00]# sed '' data.txt
I am studing sed
I am www.laoma.cloud
I am a Superman
I am so handsome

对照着 sed 命令的语法格式：

这里未使用 option。
' ' ，对应着 $sed-command$ 为具体的 sed 语句。
data.txt ，对应着 $input-file$ ，用于提供输入数据。

示例2：从标准输入中读取数据

如果我们没有提供输入文件，那么 sed 默认会冲标准输入中读取数据。

bash 复制代码

[root@centos7 ~ 18:37:07]# sed ''
## 输入hello world，并回车
hello world 
## 输出hello world
hello world
## 按ctrl+d推出

-e 选项

从命令行读取sed命令，我们需要将 sed 命令使用单引号 ( '' ) 引起来。

bash 复制代码

## 打印data.txt文件内容
[root@centos7 ~ 18:37:32]# sed -e '' data.txt
I am studing sed
I am www.laoma.cloud
I am a Superman
I am so handsome

## 如果只有一个命令，-e选项可以省略
[root@centos7 ~ 18:38:42]# sed '' data.txt
I am studing sed
I am www.laoma.cloud
I am a Superman
I am so handsome

## -e 选项可以多次使用，1d是作用是删除第一行
[root@centos7 ~ 18:38:56]# sed -e '1d' -e '2d' -e '5d' data.txt
I am a Superman
I am so handsome
## 因为不存在第五行，所以也就没删除的效果

## 使用分号 （；）分开多个命令
[root@centos7 ~ 18:39:26]# sed -e '1d;2d;5d' data.txt
I am a Superman
I am so handsome

-n 选项（重要）

如果指定了该选项，那么模式空间数据将不会自动打印，需要明确指明打印才会输出记录

bash 复制代码

## 以下命令没有任何输出
[root@centos7 ~ 18:39:51]# sed -n '' data.txt

## 打印第一行记录
[root@centos7 ~ 18:43:30]# sed -n '1p' data.txt
I am studing sed

3.2.2 sed 行寻址

作用

通过行寻址匹配要处理的输入流。

语法

这里以打印命令p为例。

语法： $address1\[,address2$ ]p

address1 和 address2 分别是 起始地址 和 结束地址 ，可以是行号或 模式字符串。
address1 和 address2 都是可选参数，可以都不填，这时候就是打印所有行，从文件的开头到文件结束。
如果存在一个，那么就是打印单行。也就是只打印 address1 指定的那行。
p 命令仅从 模式缓冲区 中打印行，也就是该行不会发送到输出流，原始文件保持不变。

示例文件

bash 复制代码

[root@centos7 ~ 18:43:47]# echo ' This is 1
> This is 2    
> This is 3
> This is 4
> This is 5 ' > test

演示
示例1：打印所有行

bash 复制代码

## 打印所有行
[root@centos7 ~ 18:49:53]# cat test | sed ''
## 输出结果
This is 1
This is 2    
This is 3
This is 4
This is 5 

## sed 默认打印模式缓冲区中所有内容。

## 等效于
[root@centos7 ~ 18:49:56]# cat test | sed -n 'p'
## -n 关闭sed打印模式缓冲区中所有内容。
## p命令，明确打印输出模式缓冲区中所有内容。

示例2：打印特定行

bash 复制代码

## 打印第1行
[root@centos7 ~ 18:50:46]# cat test | sed -n '1p'
## 输出结果
 This is 1
 
 ## 打印第最后一行
[root@centos7 ~ 18:51:29]# cat test | sed -n '$p'
## 输出结果
This is 5

示例3：打印第1行到3行

bash 复制代码

[root@centos7 ~ 18:51:42]# cat test | sed -n '1,3p'

## 输出结果
This is 1
This is 2    
This is 3

示例4：打印第3行到最后一行

bash 复制代码

[root@centos7 ~ 18:52:54]# cat test | sed -n '3,$p'

## 输出结果
This is 3
This is 4
This is 5

示例5：连续输出，打印第2行以及后续两行

bash 复制代码

[root@centos7 ~ 18:53:46]# cat test | sed -n '2,+2p'

## 输出结果
This is 2    
This is 3
This is 4

3.2.3 sed 子命令

打印
作用

p，打印模式空间所有记录。
P，打印模式空间第一行记录。

语法

格式： $address1\[,address2$ ]p

address1 和 address2 分别是起始地址和结束地址，可以是行号或模式字符串。
address1 和 address2 都是可选参数，可以都不填，这时候就是打印所有行，从文件的开头到文件结束。
如果存在一个，那么就是打印单行。也就是只打印 address1 指定的那行。
p 命令仅从模式缓冲区中打印行，也就是该行不会发送到输出流，原始文件保持不变。

替换（必须）
示例1：把test文件中的root替换成tankzhang，只不过只替换一次即终止在这一行的操作，并转到下一行

bash 复制代码

[laoma@shell ~]$ sed 's/root/tankzhang/' test | grep tankzhang
tankzhang:x:0:0:root:/root:/bin/bash

关闭 SELinux

bash 复制代码

[laoma@shell ~]$ sed -i 's/^SELINUX=.*/SELINUX=disabled/g' config

有记录。

P，打印模式空间第一行记录。

语法

格式： $address1\[,address2$ ]p

address1 和 address2 分别是起始地址和结束地址，可以是行号或模式字符串。
address1 和 address2 都是可选参数，可以都不填，这时候就是打印所有行，从文件的开头到文件结束。
如果存在一个，那么就是打印单行。也就是只打印 address1 指定的那行。
p 命令仅从模式缓冲区中打印行，也就是该行不会发送到输出流，原始文件保持不变。

替换（必须）
示例1：把test文件中的root替换成tankzhang，只不过只替换一次即终止在这一行的操作，并转到下一行

bash 复制代码

[laoma@shell ~]$ sed 's/root/tankzhang/' test | grep tankzhang
tankzhang:x:0:0:root:/root:/bin/bash

关闭 SELinux

bash 复制代码

[laoma@shell ~]$ sed -i 's/^SELINUX=.*/SELINUX=disabled/g' config

20260414 正则表达式及shell三剑客

一、正则表达式及shell三剑客

二、正则表达式概述

2.1 环境准备

2.2 普通字符

2.3 字符集

2.4 a-z A-Z 0-9

三、三剑客

3.1 grep

3.1.1 grep 介绍

3.1.2 grep 命令语法

3.1.3 文件准备

3.1.4 模式选择和解释选项

3.2 sed

3.2.1 sed 命令语法

3.2.2 sed 行寻址

3.2.3 sed 子命令

2.4 $a-z$ $A-Z$ $0-9$