方案概述
将18位身份证号(含校验码)压缩为48位整数 (可存储在u64的低48位),布局如下:
| 字段 | 位宽 | 说明 |
|---|---|---|
addr |
20 bits | 行政区划编码(高3位=0表示非大陆,否则为大陆) |
date |
18 bits | 出生日期相对于2000-03-01的天数偏移(存储时+131072) |
seq |
10 bits | 顺序码(身份证第15-17位,0~999) |
总位宽:20+18+10 = 48 bits。
地址编码规则(20 bits)
非大陆(高3位=0)
addr = 1→ 台湾 (710000)addr = 2→ 香港 (810000)addr = 3→ 澳门 (820000)addr = 4→ 海外 (910000)- 判断非大陆:
(addr >> 17) == 0
大陆地址(高3位≥1)
6位地址码 ABCCDD 分解为:
prov_high:省级第一位(1~6) → 3 bitsprov_low:省级第二位(0~6) → 3 bitscity:地级代码(01~99) → 7 bitsdistrict:县级代码(01~99) → 7 bits
拼装:addr = (prov_high << 17) | (prov_low << 14) | (city << 7) | district
解码时还原:prov_high = (addr >> 17) & 0x7 等。
日期编码(18 bits)
- 基准日期:
2000-03-01 - 偏移量:
offset = (birth_date - base_date).days - 存储:
stored = offset + 131072(保证非负,范围0~262143) - 解码:
offset = stored - 131072
顺序码(10 bits)
直接取身份证第15-17位的整数(0~999)。
校验码处理
- 压缩时不存储校验码(由前17位计算得出,属于冗余数据)。
- 解压时根据前17位重新计算校验码,补全18位。
Rust实现
rust
use chrono::{NaiveDate, Datelike};
const BASE_DATE: NaiveDate = NaiveDate::from_ymd_opt(2000, 3, 1).unwrap();
const OFFSET_BIAS: i32 = 1 << 17; // 131072
const MASK_ADDR: u64 = 0xFFFFF; // 20 bits
const MASK_DATE: u64 = 0x3FFFF; // 18 bits
const MASK_SEQ: u64 = 0x3FF; // 10 bits
const SHIFT_DATE: u64 = 10;
const SHIFT_ADDR: u64 = 28;
// 校验码权重因子
const WEIGHTS: [u8; 17] = [7, 9, 10, 5, 8, 4, 2, 1, 6, 3, 7, 9, 10, 5, 8, 4, 2];
const CHECK_DIGITS: &[u8] = b"10X98765432";
/// 计算身份证校验码
fn checksum(id17: &str) -> char {
let sum: u32 = id17
.chars()
.take(17)
.zip(WEIGHTS.iter())
.map(|(ch, w)| ch.to_digit(10).unwrap() * (*w as u32))
.sum();
let idx = (sum % 11) as usize;
CHECK_DIGITS[idx] as char
}
/// 将6位地址码编码为20位整数
fn encode_address(addr6: &str) -> u32 {
match addr6 {
"710000" => return 1, // 台湾
"810000" => return 2, // 香港
"820000" => return 3, // 澳门
"910000" => return 4, // 海外
_ => {}
}
// 大陆地址
let chars: Vec<char> = addr6.chars().collect();
let prov_high = chars[0].to_digit(10).unwrap() as u32; // 1-6
let prov_low = chars[1].to_digit(10).unwrap() as u32; // 0-6
let city = addr6[2..4].parse::<u32>().unwrap(); // 01-99
let district = addr6[4..6].parse::<u32>().unwrap(); // 01-99
(prov_high << 17) | (prov_low << 14) | (city << 7) | district
}
/// 从20位地址码还原6位地址码
fn decode_address(addr_code: u32) -> String {
// 高3位为0 -> 非大陆
if (addr_code >> 17) == 0 {
return match addr_code {
1 => "710000".to_string(),
2 => "810000".to_string(),
3 => "820000".to_string(),
4 => "910000".to_string(),
_ => panic!("Unknown non-mainland type: {}", addr_code),
};
}
// 大陆地址
let prov_high = (addr_code >> 17) & 0x7;
let prov_low = (addr_code >> 14) & 0x7;
let city = (addr_code >> 7) & 0x7F;
let district = addr_code & 0x7F;
format!("{}{}{:02}{:02}", prov_high, prov_low, city, district)
}
/// 将出生日期编码为18位整数
fn encode_date(birth: NaiveDate) -> u32 {
let delta = birth.signed_duration_since(BASE_DATE).num_days();
(delta + OFFSET_BIAS as i64) as u32
}
/// 从18位整数还原出生日期
fn decode_date(stored: u32) -> NaiveDate {
let delta = stored as i64 - OFFSET_BIAS as i64;
BASE_DATE + chrono::Duration::days(delta)
}
/// 编码18位身份证号 -> 48位整数(低48位有效)
pub fn encode_id(id18: &str) -> u64 {
let id17 = &id18[..17];
let addr6 = &id17[0..6];
let year = id17[6..10].parse::<i32>().unwrap();
let month = id17[10..12].parse::<u32>().unwrap();
let day = id17[12..14].parse::<u32>().unwrap();
let birth = NaiveDate::from_ymd_opt(year, month, day).expect("Invalid date");
let seq = id17[14..17].parse::<u32>().unwrap(); // 0-999
let addr_code = encode_address(addr6) as u64;
let date_code = encode_date(birth) as u64;
let seq_code = seq as u64;
(addr_code << SHIFT_ADDR) | (date_code << SHIFT_DATE) | seq_code
}
/// 解码48位整数 -> 完整的18位身份证号(含校验码)
pub fn decode_id(compressed: u64) -> String {
let addr_code = ((compressed >> SHIFT_ADDR) & MASK_ADDR) as u32;
let date_code = ((compressed >> SHIFT_DATE) & MASK_DATE) as u32;
let seq = (compressed & MASK_SEQ) as u32;
let addr6 = decode_address(addr_code);
let birth = decode_date(date_code);
let seq_str = format!("{:03}", seq);
let id17 = format!("{}{}{:04}{:02}{:02}{}", addr6, birth.year(), birth.month(), birth.day(), seq_str);
let check = checksum(&id17);
format!("{}{}", id17, check)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_mainland() {
let original = "11010119900307663X";
let compressed = encode_id(original);
let decoded = decode_id(compressed);
assert_eq!(original, decoded);
}
#[test]
fn test_taiwan() {
let original = "710000198001011234";
let compressed = encode_id(original);
let decoded = decode_id(compressed);
assert_eq!(original, decoded);
}
#[test]
fn test_hongkong() {
let original = "810000199912312345";
let compressed = encode_id(original);
let decoded = decode_id(compressed);
assert_eq!(original, decoded);
}
#[test]
fn test_macau() {
let original = "820000200001015678";
let compressed = encode_id(original);
let decoded = decode_id(compressed);
assert_eq!(original, decoded);
}
#[test]
fn test_overseas() {
let original = "910000199509090987";
let compressed = encode_id(original);
let decoded = decode_id(compressed);
assert_eq!(original, decoded);
}
}
### 说明
- 依赖:需要 chrono crate 处理日期。在 Cargo.toml 中添加:
~~~
[dependencies]
chrono = "0.4"
~~~
- 判断非大陆:在 decode_address 中通过 (addr_code >> 17) == 0 实现,效率高。
- 错误处理:示例中使用了 unwrap 和 panic,生产环境建议返回 Result。
- 完全无损:所有身份证(含港澳台海外)均可精确还原。
### 使用示例
```rust
fn main() {
let id = "11010119900307663X";
let compressed = encode_id(id);
println!("压缩后: {:#018x}", compressed); // 0x0000017243f733
let restored = decode_id(compressed);
println!("还原: {}", restored);
assert_eq!(id, restored);
}