使用Python处理JSON数据
25.1 JSON简介
25.1.1 什么是JSON
JSON全称为JavaScript Object Notation ,一般翻译为JS标记,是一种轻量级的数据交换格式。是基于ECMAScript的一个子集,采用完全独立于编程语言的文本格式来存储和表示数据。简洁和清晰的层次结构使得JSON成为理想的数据交换语言,其主要特点有:易于阅读 、易于机器生成 、有效提升网络速度等。
25.1.2 JSON的两种结构
JSON简单来说,可以理解为JavaScript中的数组 和对象,通过这两种结构,可以表示各种复杂的结构。
25.1.2.1 数组
数组在JavaScript是使用中括号**[ ]**来定义的,一般定义格式如下所示:
let array=["Surpass","28","Shanghai"];
若要对数组取值,则需要使用索引 。元素的类型可以是数字 、字符串 、数组 和对象等。
25.1.2.2 对象
对象在JavaScript是使用大括号**{ }**来定义的,一般定义格式如下所示:
let personInfo={
name:"Surpass",
age:28,
location:"Shanghai"
}
对象一般是基于key 和value ,在JavaScript中,其取值方式也非常简单variable.key 即可。元素value的类型可以是数字 、字符串 、数组 和对象等。
25.1.3 支持的数据格式
JSON支持的主要数据格式如下所示:
- 数组:使用中括号
- 对象:使用大括号
- 整型 、浮点型 、布尔类型 和null
- 字符串类型 :必须使用双引号,不能使用单引号
多个数据之间使用逗号做为分隔符,基与Python中的数据类型对应表如下所示:
JSON | Python |
---|---|
Object | dict |
array | list |
string | str |
number(int) | int |
number(real) | float |
true | True |
false | False |
null | None |
25.2 Python对JSON的支持
25.2.1 Python 和 JSON 数据类型
在Python中主要使用json模块来对JSON数据进行处理。在使用前,需要导入json模块,用法如下所示:
import json
json模块中主要包含以下四个操作函数,如下所示:
在json的处理过种中,Python中的原始类型与JSON类型会存在相互转换,具体的转换表如下所示:
- Python 转换为 JSON
Python | JSON |
---|---|
dict | Object |
list | array |
tuple | array |
str | string |
int | number |
float | number |
True | true |
False | false |
None | null |
- JSON 转换为 Python
JSON | Python |
---|---|
Object | dict |
array | list |
string | str |
number(int) | int |
number(real) | float |
true | True |
false | False |
null | None |
25.2.2 json模块常用方法
关于Python 内置的json模块,可以查看之前我写的文章:https://www.cnblogs.com/surpassme/p/13034972.html
25.3 使用JSONPath处理JSON数据
内置的json模块,在处理简单的JSON数据时,易用且非常非常方便,但在处理比较复杂且特别大的JSON数据,还是有一些费力,今天我们使用一个第三方的工具来处理JSON数据,叫JSONPath。
25.3.1 什么是JSONPath
JSONPath是一种用于解析JSON数据的表达语言。经常用于解析和处理多层嵌套的JSON数据,其用法与解析XML数据的XPath表达式语言非常相似。
25.3.2 安装
安装方法如下所示:
# pip install -U jsonpath
25.3.3 JSONPath语法
JSONPath语法与XPath非常相似,其对应参照表如下所示:
XPath | JSONPath | 描述 |
---|---|---|
/ | $ | 根节点/元素 |
. | @ | 当前节点/元素 |
/ | . or [] | 子元素 |
.. | n/a | 父元素 |
// | .. | 递归向下搜索子元素 |
* | * | 通配符,表示所有元素 |
@ | n/a | 访问属性,JSON结构的数据没有这种属性 |
[] | [] | 子元素操作符(可以在里面做简单的迭代操作,如数据索引,根据内容选值等) |
| | [,] | 支持迭代器中做多选 |
n/a | [start :end :step] | 数组分割操作 |
[] | ?() | 筛选表达式 |
n/a | () | 支持表达式计算 |
() | n/a | 分组,JSONPath不支持 |
以上内容可查阅官方文档:JSONPath - XPath for JSON
我们以下示例数据为例,来进行对比,如下所示:
{ "store":
{
"book": [
{ "category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{ "category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
},
{ "category": "fiction",
"author": "Herman Melville",
"title": "Moby Dick",
"isbn": "0-553-21311-3",
"price": 8.99
},
{ "category": "fiction",
"author": "J. R. R. Tolkien",
"title": "The Lord of the Rings",
"isbn": "0-395-19395-8",
"price": 22.99
}
],
"bicycle": {
"color": "red",
"price": 19.95
}
}
}
XPath | JSONPath | 结果 |
---|---|---|
/store/book/author | $.store.book[*].author | 获取book节点中所有author |
//author | $..author | 获取所有author |
/store/* | $.store.* | 获取store的元素,包含book和bicycle |
/store//price | $.store..price | 获取store中的所有price |
//book[3] | $..book[2] | 获取第三本书所有信息 |
//book[last()] | ..����[(@.�����ℎ−1)]..book[-1:] | 获取最后一本书的信息 |
//book[position()❤️] | ..����[0,1]..book[:2] | 获取前面的两本书 |
//book[isbn] | $..book[?(@.isbn)] | 根据isbn进行过滤 |
//book[price<10] | $..book[?(@.price<10)] | 根据price进行筛选 |
//* | $..* | 所有元素 |
在XPath中,下标是1开始,而在JSONPath中是从0开始
JSONPath在线练习网址:JSONPath Online Evaluator
25.3.4 JSONPath用法
其基本用法形式如下所示:
jsonPath(obj, expr [, args])
基参数如下所示:
- obj (object|array):
JSON数据对象
- expr (string):
JSONPath表达式
- args (object|undefined):
改变输出格式,比如是输出是值还是路径,
args.resultType可选的输出格式为:"VALUE"、"PATH"、"IPATH"
- 返回类型为(array|false):
若返回array,则代表成功匹配到数据,false则代表未匹配到数据。
25.3.5 在Python中的使用
from jsonpath import jsonpath
import json
data = {
"store":
{
"book": [
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
},
{
"category": "fiction",
"author": "Herman Melville",
"title": "Moby Dick",
"isbn": "0-553-21311-3",
"price": 8.99
},
{
"category": "fiction",
"author": "J. R. R. Tolkien",
"title": "The Lord of the Rings",
"isbn": "0-395-19395-8",
"price": 22.99
}
],
"bicycle": {
"color": "red",
"price": 19.95
}
}
}
# 获取book节点中所有author
getAllBookAuthor=jsonpath(data,"$.store.book[*].author")
print(f"getAllBookAuthor is :{json.dumps(getAllBookAuthor,indent=4)}")
# 获取book节点中所有author
getAllAuthor=jsonpath(data,"$..author")
print(f"getAllAuthor is {json.dumps(getAllAuthor,indent=4)}")
# 获取store的元素,包含book和bicycle
getAllStoreElement=jsonpath(data,"$.store.*")
print(f"getAllStoreElement is {json.dumps(getAllStoreElement,indent=4)}")
# 获取store中的所有price
getAllStorePriceA=jsonpath(data,"$[store]..price")
getAllStorePriceB=jsonpath(data,"$.store..price")
print(f"getAllStorePrictA is {getAllStorePriceA}\ngetAllStorePriceB is {getAllStorePriceB}")
# 获取第三本书所有信息
getThirdBookInfo=jsonpath(data,"$..book[2]")
print(f"getThirdBookInfo is {json.dumps(getThirdBookInfo,indent=4)}")
# 获取最后一本书的信息
getLastBookInfo=jsonpath(data,"$..book[-1:]")
print(f"getLastBookInfo is {json.dumps(getLastBookInfo,indent=4)}")
# 获取前面的两本书
getFirstAndSecondBookInfo=jsonpath(data,"$..book[:2]")
print(f"getFirstAndSecondBookInfo is {json.dumps(getFirstAndSecondBookInfo,indent=4)}")
# 根据isbn进行过滤
getWithFilterISBN=jsonpath(data,"$..book[?(@.isbn)]")
print(f"getWithFilterISBN is {json.dumps(getWithFilterISBN,indent=4)}")
# 根据price进行筛选
getWithFilterPrice=jsonpath(data,"$..book[?(@.price<10)]")
print(f"getWithFilterPrice is {json.dumps(getWithFilterPrice,indent=4)}")
# 所有元素
getAllElement=jsonpath(data,"$..*")
print(f"getAllElement is {json.dumps(getAllElement,indent=4)}")
# 未能匹配到元素时
noMatchElement=jsonpath(data,"$..surpass")
print(f"noMatchElement is {noMatchElement}")
# 调整输出格式
controlleOutput=jsonpath(data,expr="$..author",result_type="PATH")
print(f"controlleOutput is {json.dumps(controlleOutput,indent=4)}")
最终输出结果如下扬尘:
getAllBookAuthor is :[
"Nigel Rees",
"Evelyn Waugh",
"Herman Melville",
"J. R. R. Tolkien"
]
getAllAuthor is [
"Nigel Rees",
"Evelyn Waugh",
"Herman Melville",
"J. R. R. Tolkien"
]
getAllStoreElement is [
[
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
},
{
"category": "fiction",
"author": "Herman Melville",
"title": "Moby Dick",
"isbn": "0-553-21311-3",
"price": 8.99
},
{
"category": "fiction",
"author": "J. R. R. Tolkien",
"title": "The Lord of the Rings",
"isbn": "0-395-19395-8",
"price": 22.99
}
],
{
"color": "red",
"price": 19.95
}
]
getAllStorePrictA is [8.95, 12.99, 8.99, 22.99, 19.95]
getAllStorePriceB is [8.95, 12.99, 8.99, 22.99, 19.95]
getThirdBookInfo is [
{
"category": "fiction",
"author": "Herman Melville",
"title": "Moby Dick",
"isbn": "0-553-21311-3",
"price": 8.99
}
]
getLastBookInfo is [
{
"category": "fiction",
"author": "J. R. R. Tolkien",
"title": "The Lord of the Rings",
"isbn": "0-395-19395-8",
"price": 22.99
}
]
getFirstAndSecondBookInfo is [
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
}
]
getWithFilterISBN is [
{
"category": "fiction",
"author": "Herman Melville",
"title": "Moby Dick",
"isbn": "0-553-21311-3",
"price": 8.99
},
{
"category": "fiction",
"author": "J. R. R. Tolkien",
"title": "The Lord of the Rings",
"isbn": "0-395-19395-8",
"price": 22.99
}
]
getWithFilterPrice is [
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Herman Melville",
"title": "Moby Dick",
"isbn": "0-553-21311-3",
"price": 8.99
}
]
getAllElement is [
{
"book": [
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
},
{
"category": "fiction",
"author": "Herman Melville",
"title": "Moby Dick",
"isbn": "0-553-21311-3",
"price": 8.99
},
{
"category": "fiction",
"author": "J. R. R. Tolkien",
"title": "The Lord of the Rings",
"isbn": "0-395-19395-8",
"price": 22.99
}
],
"bicycle": {
"color": "red",
"price": 19.95
}
},
[
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
},
{
"category": "fiction",
"author": "Herman Melville",
"title": "Moby Dick",
"isbn": "0-553-21311-3",
"price": 8.99
},
{
"category": "fiction",
"author": "J. R. R. Tolkien",
"title": "The Lord of the Rings",
"isbn": "0-395-19395-8",
"price": 22.99
}
],
{
"color": "red",
"price": 19.95
},
{
"category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{
"category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
},
{
"category": "fiction",
"author": "Herman Melville",
"title": "Moby Dick",
"isbn": "0-553-21311-3",
"price": 8.99
},
{
"category": "fiction",
"author": "J. R. R. Tolkien",
"title": "The Lord of the Rings",
"isbn": "0-395-19395-8",
"price": 22.99
},
"reference",
"Nigel Rees",
"Sayings of the Century",
8.95,
"fiction",
"Evelyn Waugh",
"Sword of Honour",
12.99,
"fiction",
"Herman Melville",
"Moby Dick",
"0-553-21311-3",
8.99,
"fiction",
"J. R. R. Tolkien",
"The Lord of the Rings",
"0-395-19395-8",
22.99,
"red",
19.95
]
noMatchElement is False
controlleOutput is [
"$['store']['book'][0]['author']",
"$['store']['book'][1]['author']",
"$['store']['book'][2]['author']",
"$['store']['book'][3]['author']"
]