基于dlib+OpenCV的人脸疲劳检测 + 年龄性别识别实战

一、前言

在计算机视觉领域，人脸相关技术一直是热门方向，从人脸检测、关键点定位到疲劳检测、年龄性别识别，都有着广泛的应用场景，比如驾驶员疲劳监测、智能门禁、人机交互等。

本文将基于dlib 和OpenCV ，从零实现两个经典人脸应用：实时人脸疲劳检测 （基于眼睛纵横比 EAR）和年龄性别识别（基于预训练 CNN 模型）。

二、dlib 库基础认知

2.1 dlib 是什么？

dlib 是一个适用于 C++ 和 Python 的第三方开源库，集成了机器学习、计算机视觉、图像处理的全套工具包，支持在机器人、嵌入式设备、移动端、高性能服务器等多环境运行，完全开源免费可商用，是人脸相关项目的首选工具库之一。

2.2 dlib vs OpenCV 人脸检测对比

那么OpenCV 也有人脸检测，为什么还要用 dlib？这里给大家做一个直观对比：

|----|------------------------------------------------|----------------------------------------------------------------------------------|
| | OpenCV 人脸检测 | dlib 人脸检测 |
| 优点 | 1. CPU 实时运行，速度快 2. 架构简单，易上手 3. 支持不同比例人脸检测 | 1. 支持正面 + 轻微非正面人脸 2. API 语法极简，调用方便 3. 小遮挡场景下仍可稳定工作 |
| 缺点 | 1. 误检率高，易把非人脸识别为人脸 2. 仅支持正面人脸，非正面效果差 3. 抗遮挡能力弱 | 1. 无法检测小脸（训练数据最小人脸为 80×80，小脸需自定义训练） 2. 检测框易缺失额头 / 下巴部分区域 3. 极端非正面（侧脸、俯视 / 仰视）效果差 |

追求速度用 OpenCV，追求精度用 dlib，两者可以根据场景灵活选择。

三、dlib 安装指南

很多同学在安装 dlib 时会遇到Failed building wheel for dlib的报错，这里给大家两种稳定安装方法：

3.1 方法一：pip 镜像安装

直接使用国内镜像源加速安装，避免网络问题：

python 复制代码

pip install dlib -i https://pypi.tuna.tsinghua.edu.cn/simple

注意：该方法需要本地配置好 C++ 编译环境（Visual Studio Build Tools），否则会编译失败。

3.2 方法二：whl 文件离线安装

如果编译环境有问题，直接下载对应 Python 版本、系统版本的预编译 whl 文件，然后执行安装。

需要下载自己对应版本的，比如Python3.8就是dlib-19.19.0-cp38-cp38-win_amd64.whl

四、核心功能 1：基于 dlib 的实时人脸疲劳检测

4.1 实现原理

疲劳检测的核心逻辑是眼睛纵横比（Eye Aspect Ratio, EAR）：人眼睁开时，眼睛的垂直高度和水平宽度的比值（EAR）较大；当人眼闭合 / 半闭合（疲劳状态）时，EAR 会显著降低。

我们通过 dlib 的 68 个人脸关键点，提取左右眼的 6 个关键点，计算 EAR 值，当 EAR 连续多帧低于阈值（通常为 0.3）时，判定为疲劳状态，发出预警。

4.2 68 个人脸关键点说明

dlib 的shape_predictor_68_face_landmarks.dat模型可以检测人脸的 68 个关键点，其中：

36-41 号点：右眼关键点
42-47 号点：左眼关键点
48-67 号点：嘴巴关键点（可用于微笑检测等拓展功能）

4.3 完整代码实现

python 复制代码

import numpy as np
import cv2
import dlib
from sklearn.metrics.pairwise import euclidean_distances
from PIL import Image, ImageDraw, ImageFont


def eye_aspect_ratio(eye):
    # 计算眼睛纵横比EAR
    A = euclidean_distances(eye[1].reshape(1,2), eye[5].reshape(1,2))
    B = euclidean_distances(eye[2].reshape(1,2), eye[4].reshape(1,2))
    C = euclidean_distances(eye[0].reshape(1,2), eye[3].reshape(1,2))
    ear = ((A + B) /2.0) / C
    return ear


def cv2AddChineseText(img, text, position, textColor=(255, 0, 0), textSize=50):
    # OpenCV绘制中文文本
    if isinstance(img, np.ndarray):
        img = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
    draw = ImageDraw.Draw(img)
    fontStyle = ImageFont.truetype("simsun.ttc", textSize, encoding="utf-8")
    draw.text(position, text, textColor, font=fontStyle)
    return cv2.cvtColor(np.asarray(img), cv2.COLOR_RGB2BGR)


def drawEye(eye, frame):
    # 绘制眼睛轮廓
    eyeHull = cv2.convexHull(eye)
    cv2.drawContours(frame, [eyeHull], -1, (0, 255, 0), 2)


# 初始化参数
COUNTER = 0  # 连续闭眼帧数计数器
EYE_AR_THRESH = 0.3  # EAR阈值
EYE_AR_CONSEC_FRAMES = 50  # 连续闭眼50帧触发预警

# 加载dlib模型
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")

# 打开摄像头
cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break
    # 人脸检测
    faces = detector(frame, 0)
    for face in faces:
        # 检测人脸关键点
        shape = predictor(frame, face)
        shape = np.array([[p.x, p.y] for p in shape.parts()])
        # 提取左右眼关键点
        rightEye = shape[36:42]
        leftEye = shape[42:48]
        # 计算左右眼EAR
        rightEAR = eye_aspect_ratio(rightEye)
        leftEAR = eye_aspect_ratio(leftEye)
        ear = (leftEAR + rightEAR) / 2.0

        # 疲劳判断
        if ear < EYE_AR_THRESH:
            COUNTER += 1
            # 连续闭眼50帧，触发预警
            if COUNTER >= EYE_AR_CONSEC_FRAMES:
                frame = cv2AddChineseText(frame, "!!!!!危险！疲劳驾驶！!!!!!", (250,250), textColor=(0,0,255), textSize=50)
        else:
            COUNTER = 0
            # 绘制眼睛轮廓
            drawEye(leftEye, frame)
            drawEye(rightEye, frame)

        # 显示EAR值
        info = f"EAR: {ear[0][0]:.2f}"
        frame = cv2AddChineseText(frame, info, (0,30), textColor=(0,255,0), textSize=30)

    cv2.imshow("Frame", frame)
    # 按ESC退出
    if cv2.waitKey(1) == 27:
        break

# 释放资源
cv2.destroyAllWindows()
cap.release()

4.4代码解析

python 复制代码

def eye_aspect_ratio(eye):
    # 计算眼睛垂直方向两个距离
    A = euclidean_distances(eye[1].reshape(1,2), eye[5].reshape(1,2))
    B = euclidean_distances(eye[2].reshape(1,2), eye[4].reshape(1,2))
    # 计算眼睛水平方向距离
    C = euclidean_distances(eye[0].reshape(1,2), eye[3].reshape(1,2))
    # 公式：(垂直平均距离) / 水平长度
    ear = ((A + B) /2.0) / C
    return ear

原理图解

一只眼睛有6 个关键点（0~5）
A、B：垂直高度
C：水平宽度
EAR 越小 = 眼睛越闭
通用标准：EAR < 0.3 判定闭眼

python 复制代码

def cv2AddChineseText(img, text, position, textColor=(255, 0, 0), textSize=50):
    if isinstance(img, np.ndarray):         # 如果是OpenCV格式
        img = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))  # 转PIL格式
    draw = ImageDraw.Draw(img)
    fontStyle = ImageFont.truetype("simsun.ttc", textSize, encoding="utf-8")  # 宋体
    draw.text(position, text, textColor, font=fontStyle)  # 写字
    return cv2.cvtColor(np.asarray(img), cv2.COLOR_RGB2BGR)  # 转回OpenCV格式

作用：解决 OpenCV 不能直接显示中文的问题，必须用 PIL 中转处理。

python 复制代码

def drawEye(eye, frame):
    eyeHull = cv2.convexHull(eye)       # 生成眼睛外轮廓
    cv2.drawContours(frame, [eyeHull], -1, (0, 255, 0), 2)  # 绿色线绘制

帧数根据摄像头帧率调整，一般摄像头 30 帧 / 秒
50 帧 ≈ 1.6 秒闭眼 → 判定疲劳

python 复制代码

detector = dlib.get_frontal_face_detector()               # dlib人脸检测器
predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat") # 68点模型

模型加载.dat 模型文件，否则代码无法运行。

python 复制代码

while True:
    ret, frame = cap.read()      # 读取一帧画面
    if not ret: break            # 读取失败退出
    
    faces = detector(frame, 0)   # 检测画面中的所有人脸
    
    for face in faces:           # 遍历每一张脸
        # 1. 检测68个关键点
        shape = predictor(frame, face)
        shape = np.array([[p.x, p.y] for p in shape.parts()])
        
        # 2. 提取左右眼坐标（固定索引！）
        rightEye = shape[36:42]   # 右眼
        leftEye = shape[42:48]    # 左眼
        
        # 3. 计算双眼EAR
        rightEAR = eye_aspect_ratio(rightEye)
        leftEAR = eye_aspect_ratio(leftEye)
        ear = (leftEAR + rightEAR) / 2.0  # 取平均值更稳定
        
        # ================== 疲劳判断 ==================
        if ear < EYE_AR_THRESH:    # 闭眼
            COUNTER += 1
            if COUNTER >= EYE_AR_CONSEC_FRAMES:
                # 红色危险预警
                frame = cv2AddChineseText(frame, "!!!!!危险！疲劳驾驶！!!!!!", (250,250), textColor=(0,0,255), textSize=50)
        else:                      # 睁眼
            COUNTER = 0
            drawEye(leftEye, frame)
            drawEye(rightEye, frame)
        
        # 显示实时EAR值
        info = f"EAR: {ear[0][0]:.2f}"
        frame = cv2AddChineseText(frame, info, (0,30), textColor=(0,255,0), textSize=30)

    # 显示画面
    cv2.imshow("Frame", frame)
    
    # 按ESC退出
    if cv2.waitKey(1) == 27:
        break

关键逻辑说明

shape [36:42] 和 shape [42:48] 这是 dlib 68 点中固定的眼睛索引，不能改！
ear = 平均值单眼误差大，双眼平均更稳定
计数器 COUNTER 只有连续闭眼才累计，睁眼立刻清零避免眨眼误判

五、核心功能 2：基于 OpenCV 的年龄性别识别

5.1 实现原理

年龄性别识别基于预训练的 CNN 模型，参考 Levi 等人的论文《Age and Gender Classification using Convolutional Neural Networks》，模型结构如下：

输入：227×227 的人脸图像
3 个卷积层 + 池化层 + 归一化
2 个全连接层 + Dropout
输出：性别（2 分类）/ 年龄（8 分类）

我们直接使用预训练好的 caffemodel 模型，通过 OpenCV 的 DNN 模块调用，实现实时年龄性别识别。

5.2 模型准备

需要提前下载 4 个模型文件，放到model文件夹下：

人脸检测模型：opencv_face_detector.pbtxt、opencv_face_detector_uint8.pb
年龄模型：deploy_age.prototxt、age_net.caffemodel
性别模型：deploy_gender.prototxt、gender_net.caffemodel

我已经上传了附件了

5.3 完整代码实现

python 复制代码

import cv2
from PIL import Image, ImageDraw, ImageFont
import numpy as np

# =================模型初始化=================
# 模型(网络模型/预训练模型):face/age/gender(脸、年龄、性别)
faceProto = "model/opencv_face_detector.pbtxt"
faceModel = "model/opencv_face_detector_uint8.pb"

ageProto = "model/deploy_age.prototxt"
ageModel = "model/age_net.caffemodel"

genderProto = "model/deploy_gender.prototxt"
genderModel = "model/gender_net.caffemodel"

# 加载网络
ageNet = cv2.dnn.readNet(ageModel, ageProto)  # 年龄模型
genderNet = cv2.dnn.readNet(genderModel, genderProto)  # 性别模型
faceNet = cv2.dnn.readNet(faceModel, faceProto)  # 人脸检测模型

# =================变量初始化=================
# 年龄段和性别标签
ageList = ['(0-2)', '(4-6)', '(8-12)', '(15-20)', '(25-32)', '(38-43)', '(48-53)', '(60-100)']
genderList = ['男性', '女性']
mean = (78.426337603, 87.7689143744, 114.8958788766)  # 模型训练时的均值，用于预处理

# =================自定义函数,获取人脸包围框=================
def getBoxes(net, frame):
    """获取人脸检测框的核心函数"""
    frameHeight = frame.shape[0]
    frameWidth = frame.shape[1]

    # 预处理图像: 缩放、减均值
    blob = cv2.dnn.blobFromImage(frame, scalefactor=1.0, size=(300, 300),
                                  mean=(104, 117, 123), swapRB=True, crop=False)
    net.setInput(blob)  # 输入图片进行人脸检测
    detections = net.forward()  # 获取检测结果

    faceBoxes = []  # 存储检测到的人脸框
    for i in range(detections.shape[2]):
        confidence = detections[0, 0, i, 2]  # 置信度
        if confidence > 0.7:  # 筛选置信度>0.7的人脸
            # 计算人脸框坐标
            x1 = int(detections[0, 0, i, 3] * frameWidth)
            y1 = int(detections[0, 0, i, 4] * frameHeight)
            x2 = int(detections[0, 0, i, 5] * frameWidth)
            y2 = int(detections[0, 0, i, 6] * frameHeight)
            faceBoxes.append([x1, y1, x2, y2])

            # 绘制人脸框
            cv2.rectangle(frame, pt1=(x1, y1), pt2=(x2, y2),
                          color=(0, 255, 0), thickness=int(round(frameHeight / 150)), lineType=cv2.LINE_AA)

    return frame, faceBoxes

# =================中文文字绘制函数=================
def cv2AddChineseText(img, text, position, textColor=(0, 255, 0), textSize=30):
    """在图像上绘制中文文字"""
    if isinstance(img, np.ndarray):
        # 转换为PIL图像
        img = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
    draw = ImageDraw.Draw(img)
    # 加载系统宋体
    fontStyle = ImageFont.truetype("simsun.ttc", textSize, encoding="utf-8")
    # 绘制文字
    draw.text(position, text, textColor, font=fontStyle)
    # 转换回OpenCV格式
    return cv2.cvtColor(np.asarray(img), cv2.COLOR_RGB2BGR)

# =================主程序:打开摄像头=================
cap = cv2.VideoCapture(0)  # 调用摄像头

while True:
    ret, frame = cap.read()
    if not ret:
        break
    frame = cv2.flip(frame, flipCode=1)  # 镜像处理，符合人眼习惯

    # 获取人脸包围框
    frame, faceBoxes = getBoxes(faceNet, frame)

    if not faceBoxes:  # 无人脸时跳过
        print("当前镜头中没有人")
        cv2.imshow("result", frame)
        if cv2.waitKey(1) == 27:  # ESC退出
            break
        continue

    # 遍历每个人脸，预测年龄性别
    for faceBox in faceBoxes:
        x1, y1, x2, y2 = faceBox
        face = frame[y1:y2, x1:x2]  # 截取人脸区域

        # 预处理：缩放、减均值
        blob = cv2.dnn.blobFromImage(face, scalefactor=1.0, size=(227, 227), mean=mean)

        # 预测性别
        genderNet.setInput(blob)
        genderOuts = genderNet.forward()
        gender = genderList[genderOuts[0].argmax()]

        # 预测年龄
        ageNet.setInput(blob)
        ageOuts = ageNet.forward()
        age = ageList[ageOuts[0].argmax()]

        # 绘制结果
        result = f"{gender},{age}"
        frame = cv2AddChineseText(frame, result, (x1, y1-30), (0, 255, 0), 30)

    # 显示结果
    cv2.imshow("result", frame)
    # ESC退出
    if cv2.waitKey(1) == 27:
        break

# 释放资源
cv2.destroyAllWindows()
cap.release()

5.4代码解析

python 复制代码

# 人脸检测模型配置+权重
faceProto = "model/opencv_face_detector.pbtxt"
faceModel = "model/opencv_face_detector_uint8.pb"

# 年龄预测模型
ageProto = "model/deploy_age.prototxt"
ageModel = "age_net.caffemodel"

# 性别预测模型
genderProto = "model/deploy_gender.prototxt"
genderModel = "gender_net.caffemodel"

关键说明

这些都是别人训练好的深度学习模型，你直接用就行
必须把这些模型文件放在 model/ 文件夹下，否则代码报错
模型结构：.prototxt = 网络结构 / .caffemodel/.pb = 训练好的权重

python 复制代码

ageNet = cv2.dnn.readNet(ageModel, ageProto)    # 年龄模型
genderNet = cv2.dnn.readNet(genderModel, genderProto) # 性别模型
faceNet = cv2.dnn.readNet(faceModel, faceProto)    # 人脸检测模型

作用：把模型文件读入内存，准备好进行推理预测。

python 复制代码

# 模型输出的8个年龄段
ageList = ['(0-2)', '(4-6)', '(8-12)', '(15-20)', '(25-32)', '(38-43)', '(48-53)', '(60-100)']

genderList = ['男性', '女性']  # 性别输出标签

mean = (78.426, 87.768, 114.895)  # 图像预处理：减均值（模型要求）

模型输出是概率数组，通过索引对应上面的文字
mean 是模型训练时的图像均值，必须固定，不能改

python 复制代码

def getBoxes(net, frame):
    # 获取画面宽高
    frameHeight = frame.shape[0]
    frameWidth = frame.shape[1]

    # 图像预处理 → 变成模型能识别的格式 blob
    blob = cv2.dnn.blobFromImage(frame, 1.0, (300, 300), (104, 117, 123), swapRB=True)
    
    net.setInput(blob)        # 把图片送入模型
    detections = net.forward()  # 模型推理，输出所有人脸检测结果

    faceBoxes = []
    for i in range(detections.shape[2]):
        confidence = detections[0,0,i,2]  # 置信度：模型认为这是不是人脸

        if confidence > 0.7:  # 置信度>0.7才保留（过滤误检测）
            # 计算人脸在画面中的坐标
            x1 = int(detections[0,0,i,3] * frameWidth)
            y1 = int(detections[0,0,i,4] * frameHeight)
            x2 = int(detections[0,0,i,5] * frameWidth)
            y2 = int(detections[0,0,i,6] * frameHeight)
            
            faceBoxes.append([x1,y1,x2,y2])
            
            # 绘制绿色人脸框
            cv2.rectangle(frame, (x1,y1), (x2,y2), (0,255,0), 2)

    return frame, faceBoxes

核心原理

blob ：把图像转换成神经网络能识别的格式
detections：模型输出的所有人脸位置 + 置信度
confidence > 0.7 ：只保留模型认为可信度高的人脸
返回值：
- 画好框的图片
- 所有人脸的坐标列表 faceBoxes

python 复制代码

def cv2AddChineseText(img, text, position, textColor=(0,255,0), textSize=30):
    # OpenCV 转 PIL 格式
    img = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
    draw = ImageDraw.Draw(img)
    
    # 使用系统宋体
    fontStyle = ImageFont.truetype("simsun.ttc", textSize)
    draw.text(position, text, textColor, font=fontStyle)
    
    # 转回 OpenCV 格式
    return cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR)

原生 OpenCV 不支持中文，会乱码，必须用 PIL 库中转处理。

python 复制代码

while True:
    ret, frame = cap.read()  # 读取一帧画面
    if not ret: break        # 读取失败则退出
    
    frame = cv2.flip(frame, 1)  # 镜像翻转：画面更符合人眼习惯

    # 调用人脸检测函数 → 返回画好框的图像 + 所有人脸坐标
    frame, faceBoxes = getBoxes(faceNet, frame)

    if not faceBoxes:
        print("当前镜头中没有人")
        cv2.imshow("result", frame)
        if cv2.waitKey(1) == 27: break
        continue  # 没有人脸，跳过后续处理

    # 遍历检测到的每一张人脸
    for faceBox in faceBoxes:
        x1,y1,x2,y2 = faceBox
        face = frame[y1:y2, x1:x2]  # 从画面中截取出人脸区域

        # 把人脸转为模型输入格式
        blob = cv2.dnn.blobFromImage(face, 1.0, (227,227), mean, swapRB=False)

        # ========== 预测性别 ==========
        genderNet.setInput(blob)
        genderPred = genderNet.forward()  # 输出[男性概率,女性概率]
        gender = genderList[genderPred[0].argmax()] # 取概率最大的

        # ========== 预测年龄 ==========
        ageNet.setInput(blob)
        agePred = ageNet.forward()
        age = ageList[agePred[0].argmax()]

        # ========== 标注结果 ==========
        result = f"{gender},{age}"
        frame = cv2AddChineseText(frame, result, (x1, y1-30), (0,255,0), 30)

    cv2.imshow("result", frame)  # 显示最终画面
    if cv2.waitKey(1) == 27:     # 按 ESC 退出
        break

关键逻辑

face = frame[y1:y2, x1:x2] 从大图里只抠出人脸，送给年龄性别模型
genderNet.forward() 模型输出两个概率：[男性概率, 女性概率]``argmax() 取出概率最大的那个
**ageNet.forward()**输出 8 个概率，对应 8 个年龄段
cv2AddChineseText 在人脸框上方显示：男性,(25-32)

七、总结

7.1 项目总结

本文基于 dlib 和 OpenCV，实现了两个经典的人脸应用：

疲劳检测：基于眼睛纵横比 EAR，通过 dlib 关键点定位实现实时监测
年龄性别识别：基于预训练 CNN 模型，通过 OpenCV DNN 模块实现实时识别