项目五 项目实训(sed流编辑器与awk文本处理工具)
### 文章目录
- [项目五 项目实训(sed流编辑器与awk文本处理工具)](#文章目录 项目五 项目实训(sed流编辑器与awk文本处理工具) @[toc] 项目实施 任务一 正则表达式提取文本 任务二 sed案例 任务三 awk案例)
- [@[toc]](#文章目录 项目五 项目实训(sed流编辑器与awk文本处理工具) @[toc] 项目实施 任务一 正则表达式提取文本 任务二 sed案例 任务三 awk案例)
- [项目实施](#文章目录 项目五 项目实训(sed流编辑器与awk文本处理工具) @[toc] 项目实施 任务一 正则表达式提取文本 任务二 sed案例 任务三 awk案例)
- [任务一 正则表达式提取文本](#文章目录 项目五 项目实训(sed流编辑器与awk文本处理工具) @[toc] 项目实施 任务一 正则表达式提取文本 任务二 sed案例 任务三 awk案例)
- [任务二 sed案例](#文章目录 项目五 项目实训(sed流编辑器与awk文本处理工具) @[toc] 项目实施 任务一 正则表达式提取文本 任务二 sed案例 任务三 awk案例)
- [任务三 awk案例](#文章目录 项目五 项目实训(sed流编辑器与awk文本处理工具) @[toc] 项目实施 任务一 正则表达式提取文本 任务二 sed案例 任务三 awk案例)
【实训任务】
本实训的主要任务是编写正则表达式文本以根据给定的模式从文本中提取特定信息,使用sed命令批量替换文本中的内容或执行其他编辑操作,以及编写awk脚本处理文本数据,进行数据提取、计算、格式化输出等操作。
【实训目的】
(1)理解正则表达式的基本语法和常见元字符。
(2)掌握正则表达式在文本匹配和搜索中的使用方法。
(3)掌握sed命令中的替换、删除、插入等编辑操作。
(4)理解awk中的模式匹配和动作执行的结构。
(5)掌握使用awk命令对文本进行分隔、过滤、计算和格式化处理的方法。
【实训内容】
(1)通过编写适当的正则表达式,从给定文本中提取所需数据。
(2)使用grep命令根据给定模式搜索文本,输出符合条件的行。
(3)使用sed命令对文本进行修改、删除、插入等操作,实现对文本的批量处理。
(4)使用awk脚本进行数据提取、格式化输出等操作,通过指定模式和动作,对文本进行灵活的处理和转换。
【实训环境】
在进行本项目的实训操作前,提前准备好Linux操作系统环境,RHEL、CentOS Stream、Debian、Ubuntu、华为openEuler、麒麟openKylin等常见Linux发行版都可以进行项目实训。
项目实施
任务一 正则表达式提取文本
1.任务描述
(1)在Linux操作系统中创建测试文件file.txt。
(2)以file.txt的内容为示例文本,使用不同的元字符及其组合,结合正则表达式来提取数据。
2.任务实施
(1)在用户家目录中,创建测试文件file.txt。
bash
[root@redhat04 ~]# vim file.txt
[root@redhat04 ~]# cat file.txt
This is a sample text for testing regular expressions.
The quick brown fox jumps over the lazy dog.
1234567890
E-mail: example@example.com
Regular expressions is too sample
(2)提取包含字母"o"具其后为任意字符的行,并显示在终端上。
bash
[root@redhat04 ~]# grep "o." file.txt
This is a sample text for testing regular expressions.
The quick brown fox jumps over the lazy dog.
E-mail: example@example.com
Regular expressions is too sample
(3)提取包含字母"o"的0个、1个或多个实例的行,并显示在终端上。
bash
[root@redhat04 ~]# grep "o*" file.txt
This is a sample text for testing regular expressions.
The quick brown fox jumps over the lazy dog.
1234567890
E-mail: example@example.com
Regular expressions is too sample
(4)提取以字母"b"开头、以字母"n"结尾的行,并显示在终端上。
bash
[root@redhat04 ~]# grep "b.*n" file.txt
The quick brown fox jumps over the lazy dog.
(5)提取包含字母"u"的一个或多个实例的行,并显示在终端上。
bash
[root@redhat04 ~]# egrep "u+" file.txt
This is a sample text for testing regular expressions.
The quick brown fox jumps over the lazy dog.
Regular expressions is too sample
(6)提取包含单词"This"且其后有0个或一个字母"s"的行,并显示在终端上。
bash
[root@redhat04 ~]# egrep "This?" file.txt
This is a sample text for testing regular expressions.
(7)提取以单词"The"开头的行,并显示在终端上。
bash
[root@redhat04 ~]# grep "^The" file.txt
The quick brown fox jumps over the lazy dog.
(8)提取以"com"结尾的行,并显示在终端上。
bash
[root@redhat04 ~]# grep "com$" file.txt
E-mail: example@example.com
(9)提取包含"fox"或"dog"的行,并显示在终端上。
bash
[root@redhat04 ~]# egrep "fox|dog" file.txt
The quick brown fox jumps over the lazy dog.
(10)提取包含元音字母的行,并显示在终端上。
bash
[root@redhat04 ~]# grep "[aeiou]" file.txt
This is a sample text for testing regular expressions.
The quick brown fox jumps over the lazy dog.
E-mail: example@example.com
Regular expressions is too sample
(11)提取包含点号的行,并显示在终端上。
bash
[root@redhat04 ~]# grep "\." file.txt
This is a sample text for testing regular expressions.
The quick brown fox jumps over the lazy dog.
E-mail: example@example.com
(12)提取包含1~3个字母"o"的行,并显示在终端上。
bash
[root@redhat04 ~]# egrep "o{1,3}" file.txt
This is a sample text for testing regular expressions.
The quick brown fox jumps over the lazy dog.
E-mail: example@example.com
Regular expressions is too sample
(13)提取包含字母"f"且其后接0个或一个字母"o",最后是字母"x"的行,并显示在终端上。
bash
[root@redhat04 ~]# egrep "fo?x" file.txt
The quick brown fox jumps over the lazy dog.
(14)提取包含"fox jumps"或"dog jumps"的行,并显示在终端上。
bash
[root@redhat04 ~]# egrep "(fox|dog) jumps" file.txt
The quick brown fox jumps over the lazy dog.
(15)提取包含连续两个字母"o"的行,并显示在终端上。
bash
[root@redhat04 ~]# egrep "o{2}" file.txt
Regular expressions is too sample
(16)提取以"dog."结尾的行,并显示在终端上。
bash
[root@redhat04 ~]# egrep "dog\.$" file.txt
[root@redhat04 ~]#
任务二 sed案例
1.任务描述
(1)在Linux操作系统中创建测试文件file1.txt。
(2)以file1.txt的内容为示例文本,使用sed命令和正则表达式对文本进行操作。
2.任务实施
(1)在用户家目录中,创建测试文件file1.txt。
bash
[root@redhat04 ~]# vim file1.txt
[root@redhat04 ~]# cat file1.txt
Distribution Kernel Version Community
Ubuntu 6.3.4 Ubuntu Community
Fedora 6.3.4 Fedora Project
Debian 6.1.30 Debian Project
Arch Linux 6.2.16 Arch Linux Community
Mint 5.4.243 Mint Community
Manjaro 5.10.180 Manjaro Community
openSUSE 5.10.180 openSUSE Community
openEuler 6.2.16 openEuler Community
RHEL 5.15.113 Red Hat Inc.
(2)将文本中的"Ubuntu"替换为"Ubuntu Linux",将结果显示在终端上。
bash
[root@redhat04 ~]# sed 's/Ubuntu/Ubuntu Linux/g' file1.txt
Distribution Kernel Version Community
Ubuntu Linux 6.3.4 Ubuntu Linux Community
Fedora 6.3.4 Fedora Project
Debian 6.1.30 Debian Project
Arch Linux 6.2.16 Arch Linux Community
Mint 5.4.243 Mint Community
Manjaro 5.10.180 Manjaro Community
openSUSE 5.10.180 openSUSE Community
openEuler 6.2.16 openEuler Community
RHEL 5.15.113 Red Hat Inc.
(3)删除包含"Community"的行,将结果显示在终端上。
bash
[root@redhat04 ~]# sed '/Community/d' file1.txt
Fedora 6.3.4 Fedora Project
Debian 6.1.30 Debian Project
RHEL 5.15.113 Red Hat Inc.
(4)提取以"Mint"开头的行,将结果显示在终端上。
bash
[root@redhat04 ~]# sed -n '/^Mint/p' file1.txt
Mint 5.4.243 Mint Community
(5)在每行的末尾添加" - Stable Version",将结果显示在终端上。
bash
[root@redhat04 ~]# sed 's/$/ - Stable Version/' file1.txt
Distribution Kernel Version Community - Stable Version
Ubuntu 6.3.4 Ubuntu Community - Stable Version
Fedora 6.3.4 Fedora Project - Stable Version
Debian 6.1.30 Debian Project - Stable Version
Arch Linux 6.2.16 Arch Linux Community - Stable Version
Mint 5.4.243 Mint Community - Stable Version
Manjaro 5.10.180 Manjaro Community - Stable Version
openSUSE 5.10.180 openSUSE Community - Stable Version
openEuler 6.2.16 openEuler Community - Stable Version
RHEL 5.15.113 Red Hat Inc. - Stable Version
(6)在以"Debian"开头的行前插入一行内容"New Line",将结果显示在终端上。
bash
[root@redhat04 ~]# sed '/^Debian/i New Line' file1.txt
Distribution Kernel Version Community
Ubuntu 6.3.4 Ubuntu Community
Fedora 6.3.4 Fedora Project
New Line
Debian 6.1.30 Debian Project
Arch Linux 6.2.16 Arch Linux Community
Mint 5.4.243 Mint Community
Manjaro 5.10.180 Manjaro Community
openSUSE 5.10.180 openSUSE Community
openEuler 6.2.16 openEuler Community
RHEL 5.15.113 Red Hat Inc.
(7)提取每行中匹配模式"5.x.x"的内容,将结果显示在终端上。
bash
[root@redhat04 ~]# sed -n 's/.*\(5\.[0-9]\+\.[0-9]\+\).*$/\1/p' file1.txt
5.4.243
5.10.180
5.10.180
5.15.113
(8)将动作指令写入文件,通过sed选项读取指令,将结果显示在终端上。
bash
[root@redhat04 ~]# echo "p" > commands
[root@redhat04 ~]# sed -n -f commands file1.txt
Distribution Kernel Version Community
Ubuntu 6.3.4 Ubuntu Community
Fedora 6.3.4 Fedora Project
Debian 6.1.30 Debian Project
Arch Linux 6.2.16 Arch Linux Community
Mint 5.4.243 Mint Community
Manjaro 5.10.180 Manjaro Community
openSUSE 5.10.180 openSUSE Community
openEuler 6.2.16 openEuler Community
RHEL 5.15.113 Red Hat Inc.
(9)删除第1行、第2行和第5行,将结果显示在终端上。
bash
[root@redhat04 ~]# sed -e '1d' -e '2d' -e '5d' file1.txt
Fedora 6.3.4 Fedora Project
Debian 6.1.30 Debian Project
Mint 5.4.243 Mint Community
Manjaro 5.10.180 Manjaro Community
openSUSE 5.10.180 openSUSE Community
openEuler 6.2.16 openEuler Community
RHEL 5.15.113 Red Hat Inc.
(10)将文本中的所有数字替换为字母"X",将结果显示在终端上。
bash
[root@redhat04 ~]# sed 's/[0-9]/X/g' file1.txt
Distribution Kernel Version Community
Ubuntu X.X.X Ubuntu Community
Fedora X.X.X Fedora Project
Debian X.X.XX Debian Project
Arch Linux X.X.XX Arch Linux Community
Mint X.X.XXX Mint Community
Manjaro X.XX.XXX Manjaro Community
openSUSE X.XX.XXX openSUSE Community
openEuler X.X.XX openEuler Community
RHEL X.XX.XXX Red Hat Inc.
(11)提取每行的第二列内容,将结果显示在终端上。
bash
[root@redhat04 ~]# sed 's/^[^[:blank:]]\+[[:blank:]]\+\([^[:blank:]]\+\).*/\1/' file1.txt
Kernel
6.3.4
6.3.4
6.1.30
Linux
5.4.243
5.10.180
5.10.180
6.2.16
5.15.113
(12)在文件中的所有行的开头添加行号,将结果显示在终端上。
bash
[root@redhat04 ~]# sed = file1.txt | sed 'N; s/\n/ /'
1 Distribution Kernel Version Community
2 Ubuntu 6.3.4 Ubuntu Community
3 Fedora 6.3.4 Fedora Project
4 Debian 6.1.30 Debian Project
5 Arch Linux 6.2.16 Arch Linux Community
6 Mint 5.4.243 Mint Community
7 Manjaro 5.10.180 Manjaro Community
8 openSUSE 5.10.180 openSUSE Community
9 openEuler 6.2.16 openEuler Community
10 RHEL 5.15.113 Red Hat Inc.
任务三 awk案例
1.任务描述
(1)在Linux操作系统中创建测试文件awkfile.txt。
(2)以awkfile.txt的内容为示例文本,使用awk命令对文本进行操作。
2.任务实施
(1)在用户家目录中,创建测试文件awkfile.txt。
bash
[root@redhat04 ~]# cat awkfile.txt
EmployeeID,FirstName,LastName,Age,Position,Company,StartDate
101,John,Doe,30,Software Engineer,Huawei Technologies Co.Ltd.,2022-01-15
102,Alice,Smith,28,Data Scientist,Red Hat Inc.,2021-11-20
103,Bob,Johnson,35,Project Manager,CentOS Project,2022-03-10
104,Emma,Williams,25,UX Designer,Example Company,2020-09-05
105,David,Anderson,40,HR Manager,Another Company,2023-02-28
106,Tommy,Alex,45,HR Manager,Other Company,2019-02-28
(2)输出EmployeeID 和 FirstName信息,将结果显示在终端上。
bash
[root@redhat04 ~]# awk -F',' '{print $1, $2}' awkfile.txt
EmployeeID FirstName
101 John
102 Alice
103 Bob
104 Emma
105 David
106 Tommy
(3)输出年龄大于等于 30 的员工信息,将结果显示在终端上。
bash
[root@redhat04 ~]# awk -F',' '$4 >= 30 {print}' awkfile.txt
EmployeeID,FirstName,LastName,Age,Position,Company,StartDate
101,John,Doe,30,Software Engineer,Huawei Technologies Co.Ltd.,2022-01-15
103,Bob,Johnson,35,Project Manager,CentOS Project,2022-03-10
105,David,Anderson,40,HR Manager,Another Company,2023-02-28
106,Tommy,Alex,45,HR Manager,Other Company,2019-02-28
(4)输出所有职位为 "Project Manager" 的员工信息,将结果显示在终端上。
bash
[root@redhat04 ~]# awk -F',' '$5 == "Project Manager" {print}' awkfile.txt
103,Bob,Johnson,35,Project Manager,CentOS Project,2022-03-10
(5)输出员工平均年龄,将结果显示在终端上。
bash
[root@redhat04 ~]# awk -F',' '{sum += $4; count++} END {print "Average Age:", sum/count}' awkfile.txt
Average Age: 25.375
(6)输出入职日期在 2022 年之后的员工信息,将结果显示在终端上。
bash
[root@redhat04 ~]# awk -F',' '$7 > "2022-01-01" {print}' awkfile.txt
EmployeeID,FirstName,LastName,Age,Position,Company,StartDate
101,John,Doe,30,Software Engineer,Huawei Technologies Co.Ltd.,2022-01-15
103,Bob,Johnson,35,Project Manager,CentOS Project,2022-03-10
105,David,Anderson,40,HR Manager,Another Company,2023-02-28
(7)按照部门统计员工数量并打印,将结果显示在终端上。
bash
[root@redhat04 ~]# awk -F',' '{dept[$5]++} END {for (d in dept) print "Department:", d, "Employee Count:", dept[d]}' awkfile.txt
Department: Employee Count: 1
Department: HR Manager Employee Count: 2
Department: UX Designer Employee Count: 1
Department: Software Engineer Employee Count: 1
Department: Project Manager Employee Count: 1
Department: Position Employee Count: 1
Department: Data Scientist Employee Count: 1
(8)输出每个员工姓名长度,将结果显示在终端上。
bash
[root@redhat04 ~]# awk -F',' ' NR != 1 {len = length($2 $3); print "EmployeeID:", $1, "FullName Length:", len}' awkfile.txt
EmployeeID: EmployeeID FullName Length: 17
EmployeeID: 101 FullName Length: 7
EmployeeID: 102 FullName Length: 10
EmployeeID: 103 FullName Length: 10
EmployeeID: 104 FullName Length: 12
EmployeeID: 105 FullName Length: 13
EmployeeID: 106 FullName Length: 9
(9)使用正则表达式"[,.]"作为字段分隔符,提取第1个字段和第4个字段,并输出,将结果显示在终端上。
bash
[root@redhat04 ~]# awk -F'[,.]' '{print $1, $4}' awkfile.txt
EmployeeID Age
101 30
102 28
103 35
104 25
105 40
106 45
(10)使用内置变量NF输出每行的最后一个字段内容,将结果显示在终端上。
bash
[root@redhat04 ~]# awk -F',' '{print "Last Field:" $NF}' awkfile.txt
Last Field:
Last Field:StartDate
Last Field:2022-01-15
Last Field:2021-11-20
Last Field:2022-03-10
Last Field:2020-09-05
Last Field:2023-02-28
Last Field:2019-02-28
(11)使用内置变量NR输出每行的行号和内容,将结果显示在终端上。
bash
[root@redhat04 ~]# awk '{ print "Line:", NR, "Content:", $0 }' awkfile.txt
Line: 1 Content:
Line: 2 Content: EmployeeID,FirstName,LastName,Age,Position,Company,StartDate
Line: 3 Content: 101,John,Doe,30,Software Engineer,Huawei Technologies Co.Ltd.,2022-01-15
Line: 4 Content: 102,Alice,Smith,28,Data Scientist,Red Hat Inc.,2021-11-20
Line: 5 Content: 103,Bob,Johnson,35,Project Manager,CentOS Project,2022-03-10
Line: 6 Content: 104,Emma,Williams,25,UX Designer,Example Company,2020-09-05
Line: 7 Content: 105,David,Anderson,40,HR Manager,Another Company,2023-02-28
Line: 8 Content: 106,Tommy,Alex,45,HR Manager,Other Company,2019-02-28
(12)使用内置函数split将字段内容按逗号分隔成数组,并输出数组元素。
bash
awk -F ',' '{ split($0, arr, ","); for (i in arr) print "Element:", arr[i] }' awkfile.txt
Element: EmployeeID
Element: FirstName
Element: LastName
Element: Age
Element: Position
Element: Company
Element: StartDate
Element: 101
Element: John
Element: Doe
Element: 30
Element: Software Engineer
Element: Huawei Technologies Co.Ltd.
Element: 2022-01-15
Element: 102
Element: Alice
Element: Smith
Element: 28
Element: Data Scientist
Element: Red Hat Inc.
Element: 2021-11-20
Element: 103
Element: Bob
Element: Johnson
Element: 35
Element: Project Manager
Element: CentOS Project
Element: 2022-03-10
Element: 104
Element: Emma
Element: Williams
Element: 25
Element: UX Designer
Element: Example Company
Element: 2020-09-05
Element: 105
Element: David
Element: Anderson
Element: 40
Element: HR Manager
Element: Another Company
Element: 2023-02-28
Element: 106
Element: Tommy
Element: Alex
Element: 45
Element: HR Manager
Element: Other Company
Element: 2019-02-28
(13)使用内置函数substr提取版本号中的主要版本部分,将结果显示在终端上。
bash
[root@redhat04 ~]# awk -F',' 'NR > 1 {print $1, "FirstName (3 chars):", substr($2, 1, 3)}' awkfile.txt
EmployeeID FirstName (3 chars): Fir
101 FirstName (3 chars): Joh
102 FirstName (3 chars): Ali
103 FirstName (3 chars): Bob
104 FirstName (3 chars): Emm
105 FirstName (3 chars): Dav
106 FirstName (3 chars): Tom
(14)使用内置函数tolower将发行版名称转换为小写形式,将结果显示在终端上。
bash
[root@redhat04 ~]# awk -F ',' '{ print "Lowercase Distribution:", tolower($2) }' awkfile.txt
Lowercase Distribution:
Lowercase Distribution: firstname
Lowercase Distribution: john
Lowercase Distribution: alice
Lowercase Distribution: bob
Lowercase Distribution: emma
Lowercase Distribution: david
Lowercase Distribution: tommy
(15)创建awk脚本,以逗号作为分隔符,输出员工ID,并计算每个员工的工作年限。
bash
[root@redhat04 ~]# vim define-func.awk
[root@redhat04 ~]# cat define-func.awk
BEGIN {
FS = ","
print "EmployeeID,WorkYears" # 输出标题行
}
NR > 1 {
# 将StartDate字段拆分为年、月、日
split($7, start_date, "-")
start_timestamp = mktime(start_date[1] " " start_date[2] " " start_date[3] " 0 0 0")
current_timestamp = systime()
work_years = int((current_timestamp - start_timestamp) / (365 * 24 * 3600))
print $1, work_years
}
# 执行awk脚本
[root@redhat04 ~]# awk -f define-func.awk awkfile.txt
EmployeeID,WorkYears
EmployeeID 54
101 2
102 2
103 2
104 4
105 1
106 5