go 正则表达式

目录

  • [1. go 正则表达式](#1. go 正则表达式)
    • [1.1. Check if the string contains the desired value](#1.1. Check if the string contains the desired value)
    • [1.2. MustCompile should not be used](#1.2. MustCompile should not be used)
    • [1.3. Make the regex string always valid by QuoteMeta](#1.3. Make the regex string always valid by QuoteMeta)
    • [1.4. Find the desired word in a string by FindAllString](#1.4. Find the desired word in a string by FindAllString)
    • [1.5. Extract the desired word/value from a string by Submatch](#1.5. Extract the desired word/value from a string by Submatch)

1. go 正则表达式

A regular expression is a useful feature in a programming language to check whether or not the string contains the desired value. It can not only check but also extract the data from the string.

In this post, we'll go through the basic usage of regexp.

1.1. Check if the string contains the desired value

Let's start with an easy example. The first one only checks if the value is contained in a string. regexp must be imported to use regular expression, which will be written as regex in the following.

go 复制代码
import (
    "fmt"
    "regexp"
)

func runMatch() {
    reg, err := regexp.Compile("ID_\\d")
    if err != nil {
        fmt.Println("error in regex string")
    }

    fmt.Println(reg.Match([]byte("ID_1")))   // true
    fmt.Println(reg.Match([]byte("ID_ONE"))) // false

    fmt.Println(reg.MatchString("ID_1"))   // true
    fmt.Println(reg.MatchString("ID_123")) // true
    fmt.Println(reg.MatchString("something_ID_123")) // true
    fmt.Println(reg.MatchString("ID_ONE")) // false
}

We have to compile the regex string first by regexp.Compile("string here"). It returns a regex instance which can actually be used to do something with the regex string.

The example search for ID_<number>. \d is a metacharacter that has a special meaning. It is the same as [0-9] which means one-digit number. If it is [2-9], it expects numbers in the range of 2 to 9.

A backslash is handled as an escape character. The next character is handled as a metacharacter. If the metacharacter isn't defined, Compile method returns an error.

regexp.Compile("ID_\\d") used in the example has two backslashes because the second one must be handled as a normal character. We can use back quotes instead.

go 复制代码
regexp.Compile(`ID_\d`)

It handles the string as a raw string. A backslash is handled as a normal character.

The Match method accepts only a byte array but it also provides a string version.

go 复制代码
fmt.Println(reg.Match([]byte("ID_1")))   // true
fmt.Println(reg.Match([]byte("ID_ONE"))) // false

fmt.Println(reg.MatchString("ID_1"))   // true
fmt.Println(reg.MatchString("ID_ONE")) // false

Use the right one depending on the data type.

1.2. MustCompile should not be used

There is MustCompile method. It can be used to get an instance of regex but it panics if the regex string is invalid.

go 复制代码
func panicCompile() {
    defer func() {
        if r := recover(); r != nil {
            // panic:  regexp: Compile(`ID_\p`): error parsing regexp: invalid character class range: `\p`
            fmt.Println("panic: ", r)
        }
    }()

    regexp.MustCompile("ID_\\p")
}

\p is not a defined meta character and thus it's invalid. Generally, MustCompile should not be used because it panics. It's much better to use the normal Compile method and handle the error result correctly.

Don't use MustCompile without any good reason!

1.3. Make the regex string always valid by QuoteMeta

QuoteMeta should be used if the regex string needs to be generated depending on input. Note that metacharacters can't be used in this case because it escapes the special characters.

go 复制代码
func useQuoteMeta() {
    originalStr := "ID_\\p"
    quotedStr := regexp.QuoteMeta(originalStr)
    fmt.Println(originalStr) // ID_\p
    fmt.Println(quotedStr)   // ID_\\p
    _, err := regexp.Compile(quotedStr)
    if err != nil {
        fmt.Println("error in regex string")
    }
}

The original string is invalid as we saw in the previous section but it becomes a valid string by QuoteMeta(). It escapes all special characters.

1.4. Find the desired word in a string by FindAllString

FindString or FindAllString can be used to find the specified word in a string. The string is not a fixed word when regex is necessary. So metacharacters should be used in this case. In the following case, it finds all matches with ID_X.

go 复制代码
func runFindSomething() {
    reg, _ := regexp.Compile(`ID_\d`)

    text := "ID_1, ID_42, RAW_ID_52, "
    fmt.Printf("%q\n", reg.FindAllString(text, 2))  // ["ID_1" "ID_4"]
    fmt.Printf("%q\n", reg.FindAllString(text, 5))  // ["ID_1" "ID_4" "ID_5"]
    fmt.Printf("%q\n", reg.FindString(text))        // "ID_1"
}

Set the desired value to the second parameter if you want to limit the number of results. If the value is bigger than the number of the matched string, the result contains all the results. The behavior is the same as FindString if it's set to 1.

Set -1 if all the matched string needs to be used.

go 复制代码
result := reg.FindAllString(text, -1)
fmt.Printf("%q\n", result)    // ["ID_1" "ID_4" "ID_5"]
fmt.Printf("%q\n", result[2]) // "ID_5"

Use Index method if the index is needed to cut the string for example.

go 复制代码
fmt.Printf("%v\n", reg.FindAllStringIndex(text, 5)) // [[0 4] [6 10] [17 21]]

1.5. Extract the desired word/value from a string by Submatch

How can we implement it if we want to know the key and the value? If the string is ID_1, the key is ID and value is 1. Submatch method needs to be used in this case but the following way doesn't work well.

go 复制代码
text := "ID_1, ID_42, RAW_ID_52, "
reg, _ := regexp.Compile(`ID_\d`)
fmt.Printf("%q\n", reg.FindAllStringSubmatch(text, -1)) // [["ID_1"] ["ID_4"] ["ID_5"]]

The last key must be RAW_ID and the matched strings need to be split by an underbar _. It's not a good way. The target values can be extracted by using parenthesis (). To get the key-value, it can be written in the following way. There are some metacharacters. If you are not familiar with the syntax, go to the official site.

go 复制代码
reg2, err := regexp.Compile(`([^,\s]+)_(\d+)`)
if err != nil {
    fmt.Println("error in regex string")
}

fmt.Printf("%q\n", reg2.FindAllString(text, -1)) // ["ID_1" "ID_42" "RAW_ID_52"]
result2 := reg2.FindAllStringSubmatch(text, -1)
fmt.Printf("%q\n", result2) // [["ID_1" "ID" "1"] ["ID_42" "ID" "42"] ["RAW_ID_52" "RAW_ID" "52"]]
fmt.Printf("Key: %s, ID: %s\n", result2[2][1], result2[2][2])

FindAllString just returns the matched string but we need additional work for it to get the key-value. Submatch method returns a nice value. The first index is always the whole matched string that appears in FindAllString. The second index corresponds to the first parenthesis. It's key in this case. Then, the third one corresponds to the value. With this result, we can easily use the key-value in the following code.

相关推荐
Jess071 小时前
MySQL操作库 —— 库的操作
数据库·mysql
一只自律的鸡1 小时前
【MySQL】第十二章 变量 ,中断处理机制,流程控制
mysql
m0_748248651 小时前
C++正则表达式攻略:从基础到高级应用
java·c++·正则表达式
仍然.2 小时前
MySQL--数据库基础
数据库·mysql
是三好2 小时前
MySQL
数据库·mysql·oracle
会飞的灰大狼3 小时前
MySQL增量备份实战指南
数据库·mysql
宸津-代码粉碎机3 小时前
用MySQL玩转数据可视化
数据库·mysql·信息可视化
吴老弟i3 小时前
Go 多版本管理实战指南
golang·go
李慕婉学姐4 小时前
【开题答辩过程】以《基于uniapp的养宠互助服务程序设计与实现》为例,不知道这个选题怎么做的,不知道这个选题怎么开题答辩的可以进来看看
android·mysql·uni-app
码农水水4 小时前
京东Java面试被问:分布式会话的一致性和容灾方案
java·开发语言·数据库·分布式·mysql·面试·职场和发展