如何硬解析 .shp 文件中的几何体，拯救 .dbf、.shx 等文件缺失的 ESRI Shapefile 格式文件

最近遇到一个问题就是 ESRI Shapefile 格式文件中 .dbf 文件损坏，导致 .shp 无法打开，又不能放弃这个数据，所以只能做修复，网上搜了很多方法，结果都失败了，所以用 deepseek 查询应解析的方法，亲测可行（简单多边形），现将方法分享：

ESRI Shapefile 文件组成

1 核心文件

扩展名	必选	用途
.shp	✅	存储几何图形（点、线、面等）
.shx	✅	存储几何索引（加速查询）
.dbf	✅	存储属性数据（类似表格）
.prj	❌	坐标系统定义（如 WGS84）
.cpg	❌	字符编码声明（如 UTF-8）

必须同时存在 .shp、.shx、.dbf 文件才能正常使用。
缺失 .prj 可能导致坐标系统未知。

2 .shp 文件结构（二进制格式）

偏移量	长度	类型	描述
0	4B	`int32`	文件标识码（固定 `9994`）
24	4B	`int32`	版本号（`1000`）
28	4B	`int32`	几何类型（见下表）
36-68	32B	`double`	边界框（xmin, ymin, xmax, ymax）

几何记录（Record）

记录头 （8B）：
- int32 记录编号（从 1 开始）
- int32 记录长度（以 16-bit 字为单位）
记录内容 ：
- int32 几何类型
- 几何数据（坐标点等）

Shapefile 几何类型

类型值	名称	描述
`0`	Null Shape	空几何
`1`	Point	二维点 `[x, y]`
`3`	PolyLine	多段线
`5`	Polygon	多边形（首尾闭合）
`8`	MultiPoint	多点集合
`11`	PointZ	三维点 `[x, y, z, m]`

多边形规则

外环：顶点按 逆时针 排列
内环（孔洞）：顶点按 顺时针 排列

Python 硬解析只有 .shp 文件的已损坏的 ESRI Shapefile 格式数据（Polygon）

环境和数据准备

安装 gma：pip install gma
本文基于：gma 2.1.7，Python 3.12
本文用到数据：

链接: https://pan.baidu.com/s/1kMlji9xqSsYIVn1cygIewA?pwd=9nc1

提取码: 9nc1

代码

python 复制代码

import struct

def parse_polygon_shp(shp_path):
    polygons = []
    iii = 0
    with open(shp_path, "rb") as f:
        # 跳过文件头（100字节）
        f.read(100)
        
        while True:
            # 读取记录头（8字节：记录号+长度）
            record_header = f.read(8)
            if not record_header:
                break
            
            # 解析记录头（大端序）
            record_num = struct.unpack(">i", record_header[:4])[0]
            record_len = struct.unpack(">i", record_header[4:8])[0]  # 单位：16-bit words
            
            # 读取记录内容（长度：record_len * 2 字节）
            try:
                record_data = f.read(record_len * 2)
            except:
                break
                
            if len(record_data) < 44:  # 多边形至少需要44字节
                continue
            
            # 解析几何类型（小端序，从记录内容开始）
            shape_type = struct.unpack("<i", record_data[0:4])[0]
            if shape_type != 5:  # 仅处理多边形
                continue

            # 解析边界框（4个double）
            bbox = struct.unpack("<4d", record_data[4:36])
            
            # 解析环数和点数
            num_parts = struct.unpack("<i", record_data[36:40])[0]
            num_points = struct.unpack("<i", record_data[40:44])[0]
            
            # 解析环的起始索引（num_parts个int32）
            parts = []
            offset = 44
            parts_data = record_data[offset : offset + 4 * num_parts]
            parts = list(struct.unpack(f"<{num_parts}i", parts_data))
            offset += 4 * num_parts
            
            # 解析所有点（num_points个xy对）
            points = []
            points_data = record_data[offset : offset + 16 * num_points]
            for i in range(num_points):
                x = struct.unpack("<d", points_data[16*i : 16*i + 8])[0]
                y = struct.unpack("<d", points_data[16*i + 8 : 16*i + 16])[0]
                points.append((x, y))
            # 保存多边形数据
            polygons.append({
                "shape_type": shape_type,
                "bbox": bbox,
                "num_parts": num_parts,
                "num_points": num_points,
                "parts": parts,
                "points": points
            })
    
    return polygons

shp_path = "青海省.shp"
res = parse_polygon_shp(shp_path)

import pandas as pd
from gma import io

fts = []
for data in res:
    pt = data['points']
    ft = io.CreateFeature(pt, GeomType = 'Polygon', Projection = 'WGS84')
    fts.append(ft)
    
ly = io.CreateLayer(fts)
ly.SaveAs("青海省_New.shp")

注意：此方法暂不适合复杂的多边形。如果是复杂多边形需要进一步优化。

如何硬解析 .shp 文件中的几何体，拯救 .dbf、.shx 等文件缺失的 ESRI Shapefile 格式文件

ESRI Shapefile 文件组成

1 核心文件

2 .shp 文件结构（二进制格式）

文件头（Header）

几何记录（Record）

Python 硬解析只有 .shp 文件的已损坏的 ESRI Shapefile 格式数据（Polygon）

环境和数据准备

代码