【Rust中的序列化:Serde(一)】

Rust中的序列化:Serde

Serde是什么?

Serde is a framework for ser ializing and de serializing Rust data structures efficiently and generically.
名字是序列化和反序列化的缩写,serde就是一种高效且通用的Rust数据结构序列化反序列化框架。


什么是序列化序列化?

序列化指的是将定义的结构化数据转换成更容易存储和或传输的形式,如字节流。而反序列化则是将如流式数据重新恢复成本来的样子,方便开发者解析和处理逻辑。通常情况下序列化反序列化使用在网络通信上。如我们熟知的protobuf等。

Serde运行机制

如下图:

Serde Data Model

The Serde data model is the API by which data structures and data formats interact. You can think of it as Serde's type system.

Serde 数据模型是DataType(DataStruct)与 DataFormat交互的Api,你可以认为它就是Serde的类型系统,

其中包含了Serialze与DeSerialze的Api,同时也有Vistor的Api,可以说,每一种类型的Api都对应了一批Api函数,每一个Api函数又会对应一种类型。

也就是Serde Data Model 是整个转换过程中的中间环节,DataType和DataFormat之间是不互知的,双方都只需要将各自的数据通过Serde Data Model Api转换成Serde Data类型。

Vistor Api

rust 复制代码
	fn expecting(&self, formatter: &mut Formatter<'_>) -> Result;
    fn visit_bool<E>(self, v: bool) -> Result<Self::Value, E>
       where E: Error { ... }
    fn visit_i8<E>(self, v: i8) -> Result<Self::Value, E>
       where E: Error { ... }
    fn visit_i16<E>(self, v: i16) -> Result<Self::Value, E>
       where E: Error { ... }
    fn visit_i32<E>(self, v: i32) -> Result<Self::Value, E>
       where E: Error { ... }
    fn visit_i64<E>(self, v: i64) -> Result<Self::Value, E>
       where E: Error { ... }
    fn visit_i128<E>(self, v: i128) -> Result<Self::Value, E>
       where E: Error { ... }
    fn visit_u8<E>(self, v: u8) -> Result<Self::Value, E>
       where E: Error { ... }
    fn visit_u16<E>(self, v: u16) -> Result<Self::Value, E>
       where E: Error { ... }
    fn visit_u32<E>(self, v: u32) -> Result<Self::Value, E>
       where E: Error { ... }
    fn visit_u64<E>(self, v: u64) -> Result<Self::Value, E>
       where E: Error { ... }
    fn visit_u128<E>(self, v: u128) -> Result<Self::Value, E>
       where E: Error { ... }
    fn visit_f32<E>(self, v: f32) -> Result<Self::Value, E>
       where E: Error { ... }
    fn visit_f64<E>(self, v: f64) -> Result<Self::Value, E>
       where E: Error { ... }
    fn visit_char<E>(self, v: char) -> Result<Self::Value, E>
       where E: Error { ... }
    fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
       where E: Error { ... }
    fn visit_borrowed_str<E>(self, v: &'de str) -> Result<Self::Value, E>
       where E: Error { ... }
    fn visit_string<E>(self, v: String) -> Result<Self::Value, E>
       where E: Error { ... }
    fn visit_bytes<E>(self, v: &[u8]) -> Result<Self::Value, E>
       where E: Error { ... }
    fn visit_borrowed_bytes<E>(self, v: &'de [u8]) -> Result<Self::Value, E>
       where E: Error { ... }
    fn visit_byte_buf<E>(self, v: Vec<u8>) -> Result<Self::Value, E>
       where E: Error { ... }
    fn visit_none<E>(self) -> Result<Self::Value, E>
       where E: Error { ... }
    fn visit_some<D>(self, deserializer: D) -> Result<Self::Value, D::Error>
       where D: Deserializer<'de> { ... }
    fn visit_unit<E>(self) -> Result<Self::Value, E>
       where E: Error { ... }
    fn visit_newtype_struct<D>(
        self,
        deserializer: D,
    ) -> Result<Self::Value, D::Error>
       where D: Deserializer<'de> { ... }
    fn visit_seq<A>(self, seq: A) -> Result<Self::Value, A::Error>
       where A: SeqAccess<'de> { ... }
    fn visit_map<A>(self, map: A) -> Result<Self::Value, A::Error>
       where A: MapAccess<'de> { ... }
    fn visit_enum<A>(self, data: A) -> Result<Self::Value, A::Error>
       where A: EnumAccess<'de> { ... }

Serializer Api

rust 复制代码
    // Provided methods
    fn serialize_i128(self, v: i128) -> Result<Self::Ok, Self::Error> { ... }
    fn serialize_u128(self, v: u128) -> Result<Self::Ok, Self::Error> { ... }
    fn collect_seq<I>(self, iter: I) -> Result<Self::Ok, Self::Error>
       where I: IntoIterator,
             <I as IntoIterator>::Item: Serialize { ... }
    fn collect_map<K, V, I>(self, iter: I) -> Result<Self::Ok, Self::Error>
       where K: Serialize,
             V: Serialize,
             I: IntoIterator<Item = (K, V)> { ... }
    fn collect_str<T>(self, value: &T) -> Result<Self::Ok, Self::Error>
       where T: ?Sized + Display { ... }
    fn is_human_readable(&self) -> bool { ... }
	fn serialize_bool(self, v: bool) -> Result<Self::Ok, Self::Error>;
    fn serialize_i8(self, v: i8) -> Result<Self::Ok, Self::Error>;
    fn serialize_i16(self, v: i16) -> Result<Self::Ok, Self::Error>;
    fn serialize_i32(self, v: i32) -> Result<Self::Ok, Self::Error>;
    fn serialize_i64(self, v: i64) -> Result<Self::Ok, Self::Error>;
    fn serialize_u8(self, v: u8) -> Result<Self::Ok, Self::Error>;
    fn serialize_u16(self, v: u16) -> Result<Self::Ok, Self::Error>;
    fn serialize_u32(self, v: u32) -> Result<Self::Ok, Self::Error>;
    fn serialize_u64(self, v: u64) -> Result<Self::Ok, Self::Error>;
    fn serialize_f32(self, v: f32) -> Result<Self::Ok, Self::Error>;
    fn serialize_f64(self, v: f64) -> Result<Self::Ok, Self::Error>;
    fn serialize_char(self, v: char) -> Result<Self::Ok, Self::Error>;
    fn serialize_str(self, v: &str) -> Result<Self::Ok, Self::Error>;
    fn serialize_bytes(self, v: &[u8]) -> Result<Self::Ok, Self::Error>;
    fn serialize_none(self) -> Result<Self::Ok, Self::Error>;
    fn serialize_some<T>(self, value: &T) -> Result<Self::Ok, Self::Error>
       where T: ?Sized + Serialize;
    fn serialize_unit(self) -> Result<Self::Ok, Self::Error>;
    ..

Deserializer Api

rust 复制代码
fn deserialize_any<V>(self, visitor: V) -> Result<V::Value, Self::Error>
       where V: Visitor<'de>;
    fn deserialize_bool<V>(self, visitor: V) -> Result<V::Value, Self::Error>
       where V: Visitor<'de>;
    fn deserialize_i8<V>(self, visitor: V) -> Result<V::Value, Self::Error>
       where V: Visitor<'de>;
    fn deserialize_i16<V>(self, visitor: V) -> Result<V::Value, Self::Error>
       where V: Visitor<'de>;
    fn deserialize_i32<V>(self, visitor: V) -> Result<V::Value, Self::Error>
       where V: Visitor<'de>;
    fn deserialize_i64<V>(self, visitor: V) -> Result<V::Value, Self::Error>
       where V: Visitor<'de>;
    fn deserialize_u8<V>(self, visitor: V) -> Result<V::Value, Self::Error>
       where V: Visitor<'de>;
    fn deserialize_u16<V>(self, visitor: V) -> Result<V::Value, Self::Error>
       where V: Visitor<'de>;
    fn deserialize_u32<V>(self, visitor: V) -> Result<V::Value, Self::Error>
       where V: Visitor<'de>;
    fn deserialize_u64<V>(self, visitor: V) -> Result<V::Value, Self::Error>
       where V: Visitor<'de>;
    fn deserialize_f32<V>(self, visitor: V) -> Result<V::Value, Self::Error>
       where V: Visitor<'de>;
    fn deserialize_f64<V>(self, visitor: V) -> Result<V::Value, Self::Error>
       where V: Visitor<'de>;
    fn deserialize_char<V>(self, visitor: V) -> Result<V::Value, Self::Error>
       where V: Visitor<'de>;
    fn deserialize_str<V>(self, visitor: V) -> Result<V::Value, Self::Error>
       where V: Visitor<'de>;
    fn deserialize_string<V>(self, visitor: V) -> Result<V::Value, Self::Error>
       where V: Visitor<'de>;
    fn deserialize_bytes<V>(self, visitor: V) -> Result<V::Value, Self::Error>
       where V: Visitor<'de>;
    fn deserialize_byte_buf<V>(
        self,
        visitor: V,
    ) -> Result<V::Value, Self::Error>
       where V: Visitor<'de>;
    fn deserialize_option<V>(self, visitor: V) -> Result<V::Value, Self::Error>
       where V: Visitor<'de>;
    fn deserialize_unit<V>(self, visitor: V) -> Result<V::Value, Self::Error>
       where V: Visitor<'de>;
    fn deserialize_unit_struct<V>(
        self,
        name: &'static str,
        visitor: V,
    ) -> Result<V::Value, Self::Error>
       where V: Visitor<'de>;
    fn deserialize_newtype_struct<V>(
        self,
        name: &'static str,
        visitor: V,
    ) -> Result<V::Value, Self::Error>
       where V: Visitor<'de>;
       ...

具体示例流程分析

具体步骤:

  1. 初始化工程:
cmd 复制代码
cargo init whatserde
  1. 将serde引入cargo.toml
rust 复制代码
serde = { version = "1", features = ["derive"] }
  1. main.rs
rust 复制代码
use serde::{Deserialize, Serialize};

#[derive(Serialize, Deserialize)]
struct MyTestData {
    a: u64,
    b: String,
}
fn main() {
    println!("Hello, world!");
}
  1. 使用cargo expand展开代码
    没有安装cargo-expand的开发者可根据Link说明安装expandLink
rust 复制代码
cargo expand > expand.rs

执行后将得到一份展开后的代码如下(部分展示):

  1. 序列化代码:
rust 复制代码
#[doc(hidden)]
#[allow(non_upper_case_globals, unused_attributes, unused_qualifications)]
const _: () = {
    #[allow(unused_extern_crates, clippy::useless_attribute)]
    extern crate serde as _serde;
    #[automatically_derived]
    impl _serde::Serialize for MyTestData {
        fn serialize<__S>(
            &self,
            __serializer: __S,
        ) -> _serde::__private::Result<__S::Ok, __S::Error>
        where
            __S: _serde::Serializer,
        {
            let mut __serde_state = _serde::Serializer::serialize_struct(
                __serializer,
                "MyTestData",
                false as usize + 1 + 1,
            )?;
            _serde::ser::SerializeStruct::serialize_field(
                &mut __serde_state,
                "a",
                &self.a,
            )?;
            _serde::ser::SerializeStruct::serialize_field(
                &mut __serde_state,
                "b",
                &self.b,
            )?;
            _serde::ser::SerializeStruct::end(__serde_state)
        }
    }
};

可以看到,序列化器先去序列化struct在分别支持序列化字段a和b,最后以end结尾,嵌套类的DataType也是如此,层层递进的序列化最后以end为标识符,表示到达结尾。

2)反序列化代码:

rust 复制代码
            #[doc(hidden)]
            const FIELDS: &'static [&'static str] = &["a", "b"];
            _serde::Deserializer::deserialize_struct(
                __deserializer,
                "MyTestData",
                FIELDS,
                __Visitor {
                    marker: _serde::__private::PhantomData::<MyTestData>,
                    lifetime: _serde::__private::PhantomData,
                },
            )

观察deserialize_struct 是元组结构体,其中的包含__deserializer反序列化器,结构体名称,字段FILEDS,Visitor访问器,其中,

FILEDS:

rust 复制代码
const FIELDS: &'static [&'static str] = &["a", "b"];

为Visitor提供了访问顺序,visitor便会按照顺序依次访问下面的字段,在通过反序列化器调用对应的反序列化接口将字段解析,直到没有下一个字段。

那么依次这个结论是如何得出的呢?

代码如下:

rust 复制代码
#[doc(hidden)]
#[allow(non_upper_case_globals, unused_attributes, unused_qualifications)]
const _: () = {
    #[allow(unused_extern_crates, clippy::useless_attribute)]
    extern crate serde as _serde;
    #[automatically_derived]
    impl<'de> _serde::Deserialize<'de> for MyTestData {
        fn deserialize<__D>(
            __deserializer: __D,
        ) -> _serde::__private::Result<Self, __D::Error>
        where
            __D: _serde::Deserializer<'de>,
        {
            #[allow(non_camel_case_types)]
            #[doc(hidden)]
            enum __Field {
                __field0,
                __field1,
                __ignore,
            }
            #[doc(hidden)]
            struct __FieldVisitor;

可以观察到,有枚举值__filed0,__filed1,__ignore。这侧面印证了serde通过FILEDS顺序来使用__FiledVisitor访问每一个字段并反序列化。

------------------------__ignore 是什么?

默认情况下,serde支持序列化方传来的DataType类型有增加(但不能减少),这会大大提高兼容性,(这有点像protobuf中的默认option),反序列化所需要的字段都存在,反序列化就不会出问题。

Serde支持了许多的Attributes,来限制或者扩展:

#[serde(rename = "?")] 字段重命名。

#[serde(bound = "T : MyTrait")] 限制只有实现了某种特征才能被序列化反序列化。

#[serde(default)] 即给予字段一个默认值,如果它为空的话。而不是报错。

#[serde(crate= " ...")],即作为crate引入时可根据此标签重命名依赖包名称和导入。
具体的可以参考这里

什么是'de?

注意到,在反序列化中引入了一个生命周期【'de】,一般情况下,我们常见的生命周期要么是【'static】要么是单字符【`a】

来看看官方给出的解释:

This lifetime is what enables Serde to safely perform efficient zero-copy deserialization across a variety of data formats, something that would be impossible or recklessly unsafe in languages other than Rust.

Zero-copy deserialization means deserializing into a data structure, like the User struct below, that borrows string or byte array data from the string or byte array holding the input. This avoids allocating memory to store a string for each individual field and then copying string data out of the input over to the newly allocated field. Rust guarantees that the input data outlives the period during which the output data structure is in scope, meaning it is impossible to have dangling pointer errors as a result of losing the input data while the output data structure still refers to it.

也就是说,这个因为Rust的生命周期规则,Rust可以安全高效的使用零Copy反序列化方案,而这在其他语言中几乎必然是不安全的。

rust 复制代码
#[derive(Deserialize)]
struct User<'a> {
    id: u32,
    name: &'a str,
    screen_name: &'a str,
    location: &'a str,
}

Rust保证了在作用于下输入数据的寿命必然输出数据结构的寿命,这意味着在输出结构仍引用它的情况下是不可能出现悬垂指针的,保证了程序的安全和高效。


总结

以上便讨论完毕基本的Serde原理,后续计划会继续讨论如何实现Custom 序列化反序列化。

相关推荐
legend_jz几秒前
【Linux】线程控制
linux·服务器·开发语言·c++·笔记·学习·学习方法
tangliang_cn21 分钟前
java入门 自定义springboot starter
java·开发语言·spring boot
程序猿阿伟22 分钟前
《智能指针频繁创建销毁:程序性能的“隐形杀手”》
java·开发语言·前端
新知图书33 分钟前
Rust编程与项目实战-模块std::thread(之一)
开发语言·后端·rust
威威猫的栗子35 分钟前
Python Turtle召唤童年:喜羊羊与灰太狼之懒羊羊绘画
开发语言·python
力透键背35 分钟前
display: none和visibility: hidden的区别
开发语言·前端·javascript
bluefox197936 分钟前
使用 Oracle.DataAccess.Client 驱动 和 OleDB 调用Oracle 函数的区别
开发语言·c#
盛夏绽放1 小时前
Node.js 和 Socket.IO 实现实时通信
前端·后端·websocket·node.js
ö Constancy1 小时前
c++ 笔记
开发语言·c++
Ares-Wang1 小时前
Asp.net Core Hosted Service(托管服务) Timer (定时任务)
后端·asp.net