Android BLE 稳定连接的关键，不是扫描，而是 GATT 操作队列

很多人第一次写 Android BLE，最先关注的 usually 是扫描。能不能扫到设备，权限有没有配对，UUID 有没有写错。

但 BLE 真正写到项目里以后，问题往往不是出在扫描，而是出在连上之后。

最常见的情况是这样。设备扫到了，也连上了，服务也发现了，看起来前面都没问题。然后你开始：

开通知
写 descriptor
读特征值
写特征值
请求 MTU

这些操作一多，代码就开始变得不稳定。有时候能跑，有时候没回调，有时候状态乱掉，有时候直接来一个 133。很多人第一反应是 Android 蓝牙栈不稳定，或者设备兼容性差。这个判断不能说完全错，但很多 BLE 项目的不稳定，根源其实更简单：GATT 操作没有排队。

BLE 在 Android 里最容易被误解的一点，就是它虽然长得像普通 API，但很多操作本质上不是同步调用，也不是你发几个就能并行跑几个。大多数 GATT 操作都应该被当成"单通道串行任务"来看待。前一个没完成，后一个最好别急着发。

这也是为什么很多 demo 看着能跑，项目一复杂就开始飘。demo 里可能只做一件事，比如连上后写一包数据，正好没撞车。但你真实业务里通常不会这么简单。你可能要先 discoverServices()，再开通知，再写初始化命令，再等设备回包，再继续下一步。这里面每一步都依赖回调推进，如果你把这些操作当成普通函数一股脑发出去，状态很快就乱了。

最典型的错误写法一般长这样：

kotlin 复制代码

gatt.discoverServices()
gatt.requestMtu(247)
gatt.setCharacteristicNotification(notifyCharacteristic, true)
gatt.writeDescriptor(cccdDescriptor)
gatt.writeCharacteristic(
    writeCharacteristic,
    "hello".toByteArray(),
    BluetoothGattCharacteristic.WRITE_TYPE_DEFAULT
)

这段代码看起来很直接，但问题也很明显。你把一串本来应该按顺序推进的 BLE 操作，当成普通方法调用连续扔了出去。结果通常不是"它们会自己排好队"，而是某一步没回调、某一步失败、某一步被覆盖，最后整个连接状态开始变脏。

更合理的思路，是从一开始就承认一个事实：GATT 操作要排队。

你可以先定义一个操作类型，把所有 BLE 动作都收进统一模型里：

kotlin 复制代码

sealed class BleOperation {
    data object DiscoverServices : BleOperation()
    data class RequestMtu(val mtu: Int) : BleOperation()
    data class WriteDescriptor(
        val descriptor: BluetoothGattDescriptor,
        val value: ByteArray
    ) : BleOperation()
    data class WriteCharacteristic(
        val characteristic: BluetoothGattCharacteristic,
        val value: ByteArray,
        val writeType: Int
    ) : BleOperation()
    data class ReadCharacteristic(
        val characteristic: BluetoothGattCharacteristic
    ) : BleOperation()
}

有了这个模型以后，下一步不是马上去调 API，而是先做一个操作队列。

kotlin 复制代码

private val operationQueue: ArrayDeque<BleOperation> = ArrayDeque()
private var currentOperation: BleOperation? = null

然后写一个统一的入队方法：

kotlin 复制代码

fun enqueueOperation(operation: BleOperation) {
    operationQueue.add(operation)
    if (currentOperation == null) {
        doNextOperation()
    }
}

真正执行的时候，只取队首的一个操作：

kotlin 复制代码

private fun doNextOperation() {
    val gatt = bluetoothGatt ?: return
    val operation = operationQueue.removeFirstOrNull() ?: run {
        currentOperation = null
        return
    }

    currentOperation = operation

    when (operation) {
        is BleOperation.DiscoverServices -> {
            gatt.discoverServices()
        }

        is BleOperation.RequestMtu -> {
            gatt.requestMtu(operation.mtu)
        }

        is BleOperation.WriteDescriptor -> {
            operation.descriptor.value = operation.value
            gatt.writeDescriptor(operation.descriptor)
        }

        is BleOperation.WriteCharacteristic -> {
            gatt.writeCharacteristic(
                operation.characteristic,
                operation.value,
                operation.writeType
            )
        }

        is BleOperation.ReadCharacteristic -> {
            gatt.readCharacteristic(operation.characteristic)
        }
    }
}

这时候整个模型就开始对了。你的思路不再是"我现在想做什么就立刻调什么"，而是"我把操作排进去，等当前操作完成后再推进下一个"。

真正关键的，不在 enqueueOperation()，而在回调里怎么把队列往前推。

比如发现服务完成以后：

kotlin 复制代码

override fun onServicesDiscovered(gatt: BluetoothGatt, status: Int) {
    if (status == BluetoothGatt.GATT_SUCCESS) {
        finishCurrentOperation()
    } else {
        failCurrentOperation("discoverServices failed: $status")
    }
}

写特征值完成以后：

kotlin 复制代码

override fun onCharacteristicWrite(
    gatt: BluetoothGatt,
    characteristic: BluetoothGattCharacteristic,
    status: Int
) {
    if (status == BluetoothGatt.GATT_SUCCESS) {
        finishCurrentOperation()
    } else {
        failCurrentOperation("writeCharacteristic failed: $status")
    }
}

写 descriptor 完成以后：

kotlin 复制代码

override fun onDescriptorWrite(
    gatt: BluetoothGatt,
    descriptor: BluetoothGattDescriptor,
    status: Int
) {
    if (status == BluetoothGatt.GATT_SUCCESS) {
        finishCurrentOperation()
    } else {
        failCurrentOperation("writeDescriptor failed: $status")
    }
}

读特征值完成以后：

kotlin 复制代码

override fun onCharacteristicRead(
    gatt: BluetoothGatt,
    characteristic: BluetoothGattCharacteristic,
    value: ByteArray,
    status: Int
) {
    if (status == BluetoothGatt.GATT_SUCCESS) {
        finishCurrentOperation()
    } else {
        failCurrentOperation("readCharacteristic failed: $status")
    }
}

最后把推进逻辑统一收一下：

kotlin 复制代码

private fun finishCurrentOperation() {
    currentOperation = null
    doNextOperation()
}

private fun failCurrentOperation(message: String) {
    Log.e("BLE", message)
    currentOperation = null
    doNextOperation()
}

到这里，GATT 队列的骨架就出来了。

这套结构最大的价值，不是"代码更优雅"，而是它把 BLE 通信从一堆零散回调，变成了一个可以推理的流程。你知道当前在做什么，知道后面排了什么，也知道哪个回调应该推进哪个操作。

拿一个很常见的初始化流程来说，很多 BLE 设备连上以后都要先做下面这些事：

发现服务
请求 MTU
开通知
写初始化命令

如果不用队列，代码一般会写得很乱。

如果用队列，流程就会清楚很多：

kotlin 复制代码

enqueueOperation(BleOperation.DiscoverServices)
enqueueOperation(BleOperation.RequestMtu(247))
enqueueOperation(
    BleOperation.WriteDescriptor(
        descriptor = cccdDescriptor,
        value = BluetoothGattDescriptor.ENABLE_NOTIFICATION_VALUE
    )
)
enqueueOperation(
    BleOperation.WriteCharacteristic(
        characteristic = writeCharacteristic,
        value = byteArrayOf(0xA5.toByte(), 0x01, 0x00),
        writeType = BluetoothGattCharacteristic.WRITE_TYPE_DEFAULT
    )
)

然后整个初始化链路会一个一个走，不会互相打架。

很多人写 BLE 到中期就开始冒出 133，然后觉得是玄学。实际上 133 的确有系统蓝牙栈的问题，但你自己的状态管理乱了，也很容易把自己送到那种坏状态里。比如：

上一个连接没 close()
descriptor 还没写完就开始写特征值
还在扫就开始连续 connect
回调没走完，下一步已经提前发了

这些都不是单个 API 的错，而是整条通信链路没控住。

所以你会发现，BLE 项目越往后，越不像"蓝牙 API 调用"，越像"状态机 + 队列调度"。这也是它真正比普通网络请求难的地方。网络请求很多时候天然就是独立的，BLE 不是。BLE 里很多步骤之间有强顺序依赖，顺序错了，后面就会连锁出问题。

如果你想把这套东西再往工程化推进一步，通常会再加两层。

第一层是超时。因为 BLE 回调不是每次都靠谱，如果某个操作一直不返回，队列就会卡死。所以真实项目里，最好给每个操作加超时控制。比如 5 秒没回调，就认为失败，清理当前操作，继续推进或者直接断开重连。

第二层是状态机。队列解决的是"当前操作怎么串行执行"，状态机解决的是"当前连接阶段允许做什么"。比如未连接时不能发业务包，服务没发现完不能开通知，通知没开完不能进入 ready 状态。这两个东西一起上，BLE 稳定性会明显好很多。

如果只想记一句话，我觉得 BLE 最值得记住的不是某个 API，而是这个判断：

扫描解决的是"找到设备"，GATT 队列解决的是"把连接跑稳"。

很多 BLE 文章喜欢把重点放在扫描过滤、权限申请、设备列表展示，这些当然重要，但它们更多决定的是"你能不能开始"。真正决定 BLE 能不能长期稳定工作的，往往是连上之后你怎么组织 GATT 操作。

所以如果你现在的 BLE 代码已经出现这些症状：

偶尔没回调
偶尔写失败
初始化流程时灵时不灵
重连几次以后越来越不稳定

那与其继续怀疑 UUID、继续试设备，不如先停下来看看你的 GATT 操作是不是还在裸奔。很多时候，问题不在扫描，而在你根本没给 GATT 一个队列。

一个最小可用版本

最后放一个收紧一点的最小骨架，方便你自己抄回项目里改：

kotlin 复制代码

class BleOperationQueue(
    private val gattProvider: () -> BluetoothGatt?
) {
    private val queue = ArrayDeque<BleOperation>()
    private var current: BleOperation? = null

    fun enqueue(operation: BleOperation) {
        queue.add(operation)
        if (current == null) {
            next()
        }
    }

    fun onOperationFinished() {
        current = null
        next()
    }

    fun onOperationFailed() {
        current = null
        next()
    }

    private fun next() {
        val gatt = gattProvider() ?: return
        val op = queue.removeFirstOrNull() ?: run {
            current = null
            return
        }

        current = op

        when (op) {
            is BleOperation.DiscoverServices -> gatt.discoverServices()
            is BleOperation.RequestMtu -> gatt.requestMtu(op.mtu)
            is BleOperation.ReadCharacteristic -> gatt.readCharacteristic(op.characteristic)
            is BleOperation.WriteDescriptor -> {
                op.descriptor.value = op.value
                gatt.writeDescriptor(op.descriptor)
            }
            is BleOperation.WriteCharacteristic -> {
                gatt.writeCharacteristic(op.characteristic, op.value, op.writeType)
            }
        }
    }
}

这个版本还不完整，但已经足够说明问题：BLE 里最重要的不是"调哪个方法"，而是"什么时候调、按什么顺序调、谁来推进下一步"。

这件事一旦理顺，BLE 代码会稳定很多。