百度地图 API 批量采集两地驾车通行时间/距离（并发采集 + 全局 QPS 限速 + 自动重试）

在做交通可达性、OD 时空联系、通勤圈识别、城市群联系强度等研究/业务时，我们经常要获取两地之间的驾车通行时间（duration）和行驶距离（distance） 。

如果 OD 对数量大（几千、几万甚至几十万），手动查询显然不现实，而直接用多线程并发请求又很容易触发百度地图的 QPS 限制/限流，导致大量失败（302/401/429/请求过于频繁等）。

本文提供一套稳定可复用的批量采集工具，核心特点：

✅ 线程池并发 （提升吞吐）

✅ 全局 QPS 平滑限速 （所有线程共享，避免"瞬时扎堆超限"）

✅ 指数退避重试 + 随机抖动 （限流/网络波动自动恢复）

✅ 支持 CSV 批量输入，输出 CSV/Excel（直接可用于后续分析）

1. 采集接口说明（你真正需要的结果是什么？）

百度地图方向规划类接口通常会返回：

distance：路径距离（单位米）
duration：路径耗时（单位秒）

我们批量采集时，最常用的就是这两个字段，后续可以换算成：

distance_km：公里
duration_min：分钟

2. 为什么并发会失败？关键是 "QPS 超限"

很多人会遇到这种情况：

线程开到 20/30/50
刚启动一瞬间请求扎堆
结果返回：请求过于频繁/超过并发限制/429/配额限制

问题根源不是"线程数不够"，而是：
你发请求的速度超过了接口允许的 QPS（每秒请求数）。

所以正确姿势是：

并发可以开，但必须有一个"全局节流器"，让所有线程共享 QPS 上限。

3. 方案设计：线程池 + 全局 QPS 平滑限速 + 自动重试

这套脚本做了几件关键事：

✅ 3.1 全局限速器（最核心）

传统 TokenBucket 允许"突发"，启动瞬间很容易扎堆超限。

这里我用的是更稳的策略：

强制 请求间隔 = 1 / QPS
不允许突发
所有线程共用同一个 limiter

这样能显著减少 "明明 QPS 设置 24，却仍然被限流" 的情况。

✅ 3.2 每个线程独立 Session（更快更稳）

requests.Session() 不建议跨线程共享 ，容易出现莫名其妙的问题。

脚本里采用 threading.local()：

每个线程拥有自己的 Session
连接复用更快（Keep-Alive）
更稳

✅ 3.3 指数退避重试（应对限流/抖动）

遇到限流时，不是立即重试，而是：

0.6s、1.2s、2.4s...逐步退避
叠加随机抖动（避免所有线程"同一时刻"再扎堆）

这样可以让采集任务在长时间运行时更加稳健。

4. 输入 CSV 文件格式（非常重要）

脚本读取 CSV 后会自动把每一行转为一个 OD 请求，所以建议你的输入表至少包含以下列：

字段名	说明
序号	可选（方便对照）
起点	可选（用于打印日志）
终点	可选（用于打印日志）
起点经度	必须
起点纬度	必须
终点经度	必须
终点纬度	必须

示例（CSV 表头建议如下）：

序号,起点,终点,起点经度,起点纬度,终点经度,终点纬度

1,A,B,118.796623,32.060255,120.155070,30.274085

2,C,D,121.473701,31.230416,116.407526,39.904030

5. 一键运行前准备

5.1 安装依赖

pip install requests pandas openpyxl

（如果只保存 CSV，不保存 Excel，也可以不装 openpyxl）

5.2 填入你的 AK

在百度地图开放平台创建应用后获取 AK，并在代码里替换：

AK = "你的AK"

6. 完整代码（并发采集 + 全局 QPS 限速 + 自动重试 + 输出结果）

下面代码可以直接复制运行（你只要改 AK 和 file_path）

python 复制代码

# -*- coding: utf-8 -*-
import os
import time
import random
import threading
from typing import List, Dict

import requests
import pandas as pd
from concurrent.futures import ThreadPoolExecutor, as_completed


class TokenBucketRateLimiter:
    """
    更平滑的全局限速器：强制"请求间隔 = 1/qps"
    - 不允许突发（解决启动瞬间扎堆导致的 QPS 超限）
    - 所有线程共享同一个 limiter，确保全局 QPS 控制
    """
    def __init__(self, qps: float, burst: int = None):
        self.qps = float(qps)
        self.min_interval = 1.0 / self.qps if self.qps > 0 else 0.05
        self.next_time = time.monotonic()
        self.lock = threading.Lock()

    def acquire(self):
        with self.lock:
            now = time.monotonic()
            if now >= self.next_time:
                wait = 0.0
                self.next_time = now + self.min_interval
            else:
                wait = self.next_time - now
                self.next_time = self.next_time + self.min_interval

        if wait > 0:
            time.sleep(wait)


class BaiduMapDistanceCollector:
    def __init__(
        self,
        ak: str,
        max_workers: int = 8,
        qps_limit: float = 24.0,   # 建议 20~28，别贴 30 跑
        max_retries: int = 4,
        timeout: int = 10
    ):
        """
        初始化百度地图距离采集器（全局QPS限速 + 退避重试）

        Args:
            ak: 百度地图API的AK密钥
            max_workers: 并发线程数（线程再多也不会突破QPS，因为有全局限速器）
            qps_limit: 全局QPS上限（建议 < 30）
            max_retries: 失败重试次数（限流/网络抖动时自动重试）
            timeout: 单次请求超时（秒）
        """
        self.ak = ak
        self.max_workers = max_workers
        self.timeout = timeout

        self.base_url = "https://api.map.baidu.com/directionlite/v1/driving"

        # 全局限速器：所有线程共享（核心）
        self.limiter = TokenBucketRateLimiter(qps=qps_limit)

        # 每个线程一个 Session（requests.Session 不建议跨线程共享）
        self._local = threading.local()

        # 打印锁
        self.print_lock = threading.Lock()

        self.max_retries = max_retries

    def _get_session(self) -> requests.Session:
        if not hasattr(self._local, "session"):
            s = requests.Session()
            adapter = requests.adapters.HTTPAdapter(
                pool_connections=self.max_workers,
                pool_maxsize=self.max_workers,
                max_retries=0
            )
            s.mount("https://", adapter)
            s.mount("http://", adapter)
            self._local.session = s
        return self._local.session

    @staticmethod
    def _is_qps_or_throttle_error(api_data: Dict) -> bool:
        """
        判断是否属于"频率/限流/QPS"类错误（尽量兼容不同返回）
        """
        status = api_data.get("status")
        msg = str(api_data.get("message", "")).lower()

        keyword_hit = any(k in msg for k in [
            "qps", "quota", "limit", "too frequent", "too many", "throttle",
            "频率", "配额", "并发", "限制", "请求过于频繁", "超过限制"
        ])

        # 有的接口会用类似 429 / 302 等（不保证一致，兜底）
        status_hit = status in {302, 401, 429}

        return keyword_hit or status_hit

    def get_distance_duration(self, data: Dict) -> Dict:
        """
        获取两点之间的驾驶距离和时长（带全局QPS限速 + 退避重试）
        """
        origin = (data['起点纬度'], data['起点经度'])
        destination = (data['终点纬度'], data['终点经度'])

        params = {
            'ak': self.ak,
            'origin': f"{origin[0]},{origin[1]}",
            'destination': f"{destination[0]},{destination[1]}",
            'coord_type': 'wgs84'
        }

        last_error = None

        for attempt in range(self.max_retries + 1):
            # ===== 核心：全局限速放行（保证全局QPS <= qps_limit）=====
            self.limiter.acquire()

            # 微抖动（可选）：进一步避免"同一时刻"堆叠
            time.sleep(random.uniform(0.005, 0.02))

            try:
                session = self._get_session()
                response = session.get(self.base_url, params=params, timeout=self.timeout)

                response.raise_for_status()

                api_data = response.json()

                # 成功
                if api_data.get('status') == 0 and api_data.get('message') == 'ok':
                    route = api_data['result']['routes'][0]
                    result = {
                        'distance': route.get('distance'),
                        'duration': route.get('duration'),
                        'status': (route.get('restriction_info') or {}).get('status'),
                        'error': None
                    }
                    result.update(data)

                    with self.print_lock:
                        print(f"已完成: {data.get('起点')} -> {data.get('终点')}  距离:{result['distance']}m 时长:{result['duration']}s")
                    return result

                # 非成功：若疑似限流/QPS，退避重试
                if self._is_qps_or_throttle_error(api_data) and attempt < self.max_retries:
                    backoff = (0.6 * (2 ** attempt)) + random.uniform(0, 0.3)  # 指数退避 + 抖动
                    last_error = f"触发限流/超QPS：{api_data.get('message')}，退避{backoff:.2f}s后重试"
                    time.sleep(backoff)
                    continue

                # 其他错误直接返回
                result = {
                    'distance': None,
                    'duration': None,
                    'status': None,
                    'error': api_data.get('message', 'API返回错误')
                }
                result.update(data)

                with self.print_lock:
                    print(f"已完成: {data.get('起点')} -> {data.get('终点')}")
                    print(f"  错误: {result['error']}")
                return result

            except requests.exceptions.RequestException as e:
                # 网络/HTTP异常：可重试
                if attempt < self.max_retries:
                    backoff = (0.6 * (2 ** attempt)) + random.uniform(0, 0.3)
                    last_error = f"网络/HTTP错误: {str(e)}，退避{backoff:.2f}s后重试"
                    time.sleep(backoff)
                    continue

                result = {
                    'distance': None,
                    'duration': None,
                    'status': None,
                    'error': f'网络/HTTP错误: {str(e)}'
                }
                result.update(data)

                with self.print_lock:
                    print(f"已完成: {data.get('起点')} -> {data.get('终点')}")
                    print(f"  错误: {result['error']}")
                return result

            except (KeyError, IndexError, ValueError) as e:
                # 解析错误通常重试意义不大
                result = {
                    'distance': None,
                    'duration': None,
                    'status': None,
                    'error': f'数据解析错误: {str(e)}'
                }
                result.update(data)

                with self.print_lock:
                    print(f"已完成: {data.get('起点')} -> {data.get('终点')}")
                    print(f"  错误: {result['error']}")
                return result

        # 理论上不会走到这里
        result = {'distance': None, 'duration': None, 'status': None, 'error': last_error or '未知错误'}
        result.update(data)
        return result

    def batch_collect(self, coordinate_data: List[Dict]) -> List[Dict]:
        results = []
        total = len(coordinate_data)

        print("开始批量采集距离和时长信息...")
        print(f"总共需要处理 {total} 条数据，线程数 {self.max_workers}，已启用全局QPS平滑限速")

        with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
            future_to_data = {
                executor.submit(self.get_distance_duration, data): data
                for data in coordinate_data
            }

            for i, future in enumerate(as_completed(future_to_data), start=1):
                try:
                    results.append(future.result())
                    if i % 10 == 0:
                        print(f"进度: {i}/{total} ({(i / total) * 100:.1f}%)")
                except Exception as e:
                    data = future_to_data[future]
                    error_result = {
                        'distance': None,
                        'duration': None,
                        'status': None,
                        'error': f'线程执行错误: {str(e)}'
                    }
                    error_result.update(data)
                    results.append(error_result)
                    print(f"处理 {data.get('起点')} -> {data.get('终点')} 时发生错误: {str(e)}")

        return results

    @staticmethod
    def save_to_excel(results: List[Dict], filename: str):
        df = pd.DataFrame(results)
        df['distance_km'] = df['distance'].apply(lambda x: round(x / 1000, 2) if x is not None else None)
        df['duration_min'] = df['duration'].apply(lambda x: round(x / 60, 1) if x is not None else None)

        columns_order = [
            '序号', '起点', '终点',
            '起点经度', '起点纬度', '终点经度', '终点纬度',
            'distance', 'distance_km', 'duration', 'duration_min', 'status', 'error'
        ]
        columns_order = [col for col in columns_order if col in df.columns]
        df = df[columns_order]
        df.to_excel(filename, index=False)
        print(f"结果已保存到 {filename}")

    @staticmethod
    def save_to_csv(results: List[Dict], filename: str):
        df = pd.DataFrame(results)
        df['distance_km'] = df['distance'].apply(lambda x: round(x / 1000, 2) if x is not None else None)
        df['duration_min'] = df['duration'].apply(lambda x: round(x / 60, 1) if x is not None else None)

        columns_order = [
            '序号', '起点', '终点',
            '起点经度', '起点纬度', '终点经度', '终点纬度',
            'distance', 'distance_km', 'duration', 'duration_min', 'status', 'error'
        ]
        columns_order = [col for col in columns_order if col in df.columns]
        df = df[columns_order]
        df.to_csv(filename, index=False, encoding='utf-8-sig')
        print(f"结果已保存到 {filename}")


def read_coordinates_from_file(filepath: str) -> List[Dict]:
    """
    从CSV文件读取坐标数据
    """
    try:
        df = pd.read_csv(filepath, encoding='utf-8')
    except UnicodeDecodeError:
        try:
            df = pd.read_csv(filepath, encoding='gbk')
        except UnicodeDecodeError:
            df = pd.read_csv(filepath, encoding='gb2312')

    coordinate_data = df.to_dict('records')
    print(f"成功读取 {len(coordinate_data)} 条坐标数据")
    return coordinate_data


if __name__ == "__main__":
    # !!! 建议你不要在公开位置暴露 AK，最好重置后再使用 !!!
    AK = "你的AK"

    file_path = r"csv文件路径"

    if not os.path.exists(file_path):
        print(f"错误：文件不存在 - {file_path}")
        raise SystemExit(1)

    input_dir = os.path.dirname(file_path)
    input_filename = os.path.basename(file_path)
    input_name_without_ext = os.path.splitext(input_filename)[0]

    output_csv = os.path.join(input_dir, f"{input_name_without_ext}_结果.csv")
    output_xlsx = os.path.join(input_dir, f"{input_name_without_ext}_结果.xlsx")

    # 建议：qps_limit 先用 22~26，确保稳；若仍触发，降到 20~22
    collector = BaiduMapDistanceCollector(
        ak=AK,
        max_workers=8,
        qps_limit=24,     # 你可以改成 22 更稳
        max_retries=4,
        timeout=10
    )

    print("正在读取CSV文件...")
    coordinate_data = read_coordinates_from_file(file_path)

    print("\n前3条数据样例：")
    for i, data in enumerate(coordinate_data[:3]):
        print(f"第{data.get('序号', i + 1)}条: {data.get('起点')} -> {data.get('终点')}")
        print(f"  起点坐标: ({data.get('起点纬度')}, {data.get('起点经度')})")
        print(f"  终点坐标: ({data.get('终点纬度')}, {data.get('终点经度')})")

    start_time = time.time()
    results = collector.batch_collect(coordinate_data)
    end_time = time.time()

    success_count = sum(1 for r in results if r.get('error') is None)
    fail_count = len(results) - success_count

    print("\n采集完成！")
    print(f"总耗时: {end_time - start_time:.2f} 秒")
    print(f"成功: {success_count} 条, 失败: {fail_count} 条")
    if len(results) > 0:
        print(f"平均每条耗时: {(end_time - start_time) / len(results):.2f} 秒")

    print("\n前5条采集结果：")
    for i, result in enumerate(results[:5]):
        print(f"第 {result.get('序号', i + 1)} 条:")
        print(f"  起点: {result.get('起点')} -> 终点: {result.get('终点')}")
        if result.get('error'):
            print(f"  错误: {result.get('error')}")
        else:
            d = result.get('distance')
            t = result.get('duration')
            print(f"  距离: {d} 米 ({(d or 0) / 1000:.2f} 公里)")
            print(f"  时长: {t} 秒 ({(t or 0) / 60:.1f} 分钟)")
            print(f"  状态: {result.get('status')}")
        print()

    BaiduMapDistanceCollector.save_to_csv(results, output_csv)
    BaiduMapDistanceCollector.save_to_excel(results, output_xlsx)

    print("\n所有结果已保存到：")
    print(f"CSV文件: {output_csv}")
    print(f"Excel文件: {output_xlsx}")

7. 运行结果长什么样？

脚本会在控制台打印类似信息：

每条 OD 的距离/时长
当前进度（每 10 条打印一次）
成功/失败数量与耗时

并输出两份文件：

xxx_结果.csv
xxx_结果.xlsx

新增字段包括：

distance_km（公里）
duration_min（分钟）
error（失败原因）