DriverGen | 串口设备驱动 / Windows GUI 程序模糊测试驱动的自动生成

注:本文为 "DriverGen | 驱动的自动生成" 相关合辑。

英文引文,机翻未校。

如有内容异常,请看原文。


DriverGen: Automating the Generation of Serial Device Drivers

DriverGen:串行设备驱动程序的自动化生成

Jiannan Zhai¹(B), Yuheng Du², Shiree Hughes¹, and Jason O. Hallstrom¹

翟建南¹(B)、杜宇恒²、希里·休斯¹、杰森·O·霍尔斯特伦¹

¹ Institute for Sensing and Embedded Network Systems Engineering, Florida Atlantic University, 777 Glades Road, Boca Raton, FL 33431, USA

¹ 佛罗里达大西洋大学传感与嵌入式网络系统工程研究所,美国佛罗里达州博卡拉顿市格莱兹路 777 号,邮编 33431

{jzhai,shughes2015,jhallstrom}@fau.edu

² School of Computing, Clemson University, Clemson, SC 29634, USA

² 克莱姆森大学计算机学院,美国南卡罗来纳州克莱姆森市,邮编 29634

yuhengd@clemson.edu

Abstract

摘要

Microprocessors operate most serial devices in the same way, issuing commands and parsing corresponding responses. Writing the device drivers for these peripherals is a repetitive task. Moreover, measuring the response time of each command can be time-consuming and error prone. In this paper, we present DriverGen, a configuration-based tool developed to provide accurate response time measurement and automated serial device driver generation. DriverGen (i) simulates the command execution sequence of a microprocessor using a Java program running on a desktop, (ii) measures the response time of the target device to each command, and (iii) generates a device driver based on the received responses and measured response times. To evaluate DriverGen, three case studies are considered.

微处理器对大多数串行设备的操作方式相同,均为下发命令并解析相应的响应。为这些外设编写设备驱动程序是一项重复性工作。此外,测量每条命令的响应时间既耗时又容易出错。本文提出一种基于配置的工具 DriverGen,其旨在实现精确的响应时间测量与串行设备驱动程序的自动化生成。DriverGen 的功能包括:(i)通过在桌面端运行的 Java 程序模拟微处理器的命令执行序列;(ii)测量目标设备对每条命令的响应时间;(iii)基于接收的响应数据和测量得到的响应时间生成设备驱动程序。为验证 DriverGen 的性能,本文设计了三组案例研究。

1 Introduction

1 引言

Our work is motivated by the recurrent structure of most serial device drivers and the importance of accurate timing. The main contributions of our work are as follows: (i) We present a serial device driver configuration language that generalizes the specification of a serial device driver. (ii) We present an approach that measures response times with precision on the order of 10 s of microseconds by monitoring data signals in the communication interface. (iii) We implement DriverGen, a configuration-based tool developed to accurately measure response times, and to automatically generate the specified serial device driver. (iv) Finally, we evaluate DriverGen, considering the performance of generated drivers for three serial devices.

本文的研究动机源于大多数串行设备驱动程序的重复结构以及精确计时的重要性。本文的主要贡献如下:(i)提出一种串行设备驱动程序配置语言,用于泛化串行设备驱动程序的规格说明;(ii)提出一种通过监控通信接口中的数据信号,实现 10 微秒级精度响应时间测量的方法;(iii)实现基于配置的工具 DriverGen,该工具可精确测量响应时间并自动生成指定的串行设备驱动程序;(iv)通过三组串行设备的驱动程序生成性能测试,完成对 DriverGen 的评估验证。

2 相关工作

Automated driver synthesis is discussed in [8]. Ratter proposes synthesis as a method to ensure correct driver construction. A state machine is generated automatically using specifications for both the device and the (desktop) operating system, and ultimately supports the generation of a driver for the device in C. We similarly provide the ability to generate a microprocessor driver for device communication. When generating a driver for a microprocessor, we experience the added challenges of memory and power constraints, timing precision, and a single-threaded operating system. Our driver must be efficient with respect to both memory usage and power consumption.

文献 [8] 探讨了驱动程序的自动化合成技术。拉特(Ratter)提出将合成方法作为确保驱动程序正确构建的手段,通过设备和(桌面端)操作系统的规格说明自动生成状态机,最终支持生成 C 语言设备驱动程序。本文同样实现了用于设备通信的微处理器驱动程序生成功能,但在生成微处理器驱动程序时,需额外应对内存与功耗限制、计时精度以及单线程操作系统带来的挑战。因此,本文设计的驱动程序需在内存占用和功耗方面具备高效性。

Another method for automating device driver generation is Termite [9]. Termite acts as an interface between the OS and a target device. It uses a formal specification of the device to generate a set of OS-independent commands. It allows the device creator to focus on the device, and the OS expert to focus on the OS, and still create a communication link between the two. Similarly, we create a method to automatically generate drivers for serial devices, eliminating the need for developers to manually write the drivers.

另一种设备驱动程序自动化生成方法是 Termite [9]。Termite 作为操作系统与目标设备之间的接口,通过设备的形式化规格说明生成一组与操作系统无关的命令。该方法使设备开发者可专注于设备本身,操作系统专家可专注于操作系统开发,同时仍能建立两者之间的通信链路。本文同样提出了一种串行设备驱动程序自动生成方法,无需开发者手动编写驱动程序。

In [6], O'Nils et al. show that by using synthesis, development time can be reduced by as much as 98 %. Their method uses ProGram, a specification language, to model the behavior of a device based on sequences of permissible events. Three inputs are required to synthesize the device driver from its behavior: architecture independent protocols, a specification of the processor and bus interface, and a specification of the target operating system.

文献 [6] 中,奥尼尔斯(O'Nils)等人指出,通过合成方法可将驱动程序开发时间缩短高达 98%。该方法采用规格说明语言 ProGram,基于允许的事件序列对设备行为进行建模。从设备行为合成驱动程序需输入三项内容:与架构无关的协议、处理器和总线接口的规格说明以及目标操作系统的规格说明。

An important requirement of automatically generated code is that the quality must be equal to or surpass that of hand-written code. In [7], O'Nils et al. argue that their tool produces a quality driver (generated in C) that is comparable to handwritten code. This tool requires a protocol specification for both the device and the operating system.

自动生成代码的一项重要要求是其质量必须等于或优于手写代码。文献 [7] 中,奥尼尔斯(O'Nils)等人称其工具生成的 C 语言驱动程序质量可与手写代码相媲美。该工具同样需要设备和操作系统的协议规格说明。

3 System Design/Implementation

3 系统设计与实现

DriverGen is based on the observation that all serial device drivers work in almost the same way. Our system simulates the execution of each command and generates the target device driver based on the execution results. Each command sequence is implemented as a function in the driver. To match the response pattern and save the desired information, we implement regular expression libraries in Java and C, used by DriverGen and the generated drivers, respectively. To determine if the target device is responding, or the response is finished, timeouts are used, making accurate timing important. DriverGen monitors the UART communication signals to measure the response time of a device to each command (response time), and the time between bytes in the response (inter-byte times).

DriverGen 的设计基于一个关键观察:所有串行设备驱动程序的工作方式几乎相同。该系统通过模拟每条命令的执行过程,并基于执行结果生成目标设备驱动程序。每条命令序列在驱动程序中以函数形式实现。为匹配响应模式并提取所需信息,本文分别在 Java 和 C 语言中实现了正则表达式库,分别供 DriverGen 工具和生成的驱动程序使用。系统通过超时机制判断目标设备是否响应以及响应是否完成,因此精确计时至关重要。DriverGen 通过监控通用异步收发传输器(UART)通信信号,测量设备对每条命令的响应时间(指令响应时间)以及响应中字节之间的时间间隔(字节间时间)。

3.1 Hardware Setup

3.1 硬件配置

The DriverGen hardware, shown in Fig. 1, consists of a desktop running a Java program, two FT232R chips, and a MoteStack [3]. The FT232R chips are used by the desktop to communicate with the target device and a MoteStack, respectively. The MoteStack is used to monitor the UART data signals to measure the response and inter-byte times.

DriverGen 的硬件配置如图 1 所示,包括一台运行 Java 程序的桌面计算机、两块 FT232R 芯片以及一个 MoteStack [3]。其中,两块 FT232R 芯片分别用于桌面计算机与目标设备、桌面计算机与 MoteStack 之间的通信;MoteStack 用于监控 UART 数据信号,以测量响应时间和字节间时间。

Fig. 1. Hardware setup

图 1. 硬件配置

3.2 Driver Configuration

3.2 驱动配置

DriverGen runs based on a driver configuration file that is used to configure UART communication, control execution of each command, and generate the target driver. The configuration parameters specify (i) basic driver information, such as driver name, version; (ii) global definitions, such as response timeout, which specifies the maximum time before the first response byte should be received; and (iii) function details, such as function names, the commands to be sent to the target device, the responses expected, and other information.

DriverGen 基于驱动配置文件运行,该文件用于配置 UART 通信参数、控制每条命令的执行以及生成目标驱动程序。配置参数包括:(i)驱动程序基本信息,如驱动名称、版本号;(ii)全局定义,如响应超时时间(指定接收第一个响应字节的最长等待时间);(iii)函数详情,如函数名称、待发送至目标设备的命令、预期响应以及其他相关信息。

3.3 System Architecture

3.3 系统架构

The DriverGen system consists of three modules. The Parser module is used to read, parse, and validate a driver configuration. The Executor module is used to execute the functions specified in the configuration, and to control the MoteStack to measure response times and inter-byte times. The Generator module is used to generate the driver source code based on the configuration parameters and execution results.

DriverGen 系统包含三个模块:解析器(Parser)模块用于读取、解析和验证驱动配置文件;执行器(Executor)模块用于执行配置文件中指定的函数,并控制 MoteStack 测量响应时间和字节间时间;生成器(Generator)模块用于基于配置参数和执行结果生成驱动程序源代码。

4 Evaluation

4 评估

We now present our evaluation of the driver generation approach. We introduce three serial devices and corresponding applications previously developed to operate with functionally equivalent, time-tested, handwritten drivers. We validate the correctness of each generated driver via substitution within the corresponding application. Finally, we consider the relative performance of the drivers, both in terms of space and execution speed.

本节将对驱动程序生成方法进行评估。本文选取三组串行设备及对应的应用程序,这些应用程序此前已配备功能等效且经时间检验的手写驱动程序。通过将生成的驱动程序替换到对应应用程序中,验证其正确性;最后从空间占用和执行速度两方面,对比生成驱动程序与手写驱动程序的性能表现。

In our experiments, the drivers and applications are implemented based on the AVR platform. To evaluate the WiFi and cellular devices, a standard x86 server is used to collect data sent from the devices.

实验中,驱动程序和应用程序均基于 AVR 平台实现。为评估 WiFi 设备和蜂窝网络设备,采用标准 x86 服务器收集设备发送的数据。

4.1 Test Devices and Applications

4.1 测试设备与应用程序

Three serial devices are used to evaluate our approach. The WH2004A is an LCD device that executes commands to display characters. The RN131 is a standalone embedded WiFi device with built-in TCP/IP support. The GM862 is a quad-band GSM/GPRS cellular modem with built-in TCP/IP, FTP, and SMTP support.

本文采用三组串行设备进行评估:WH2004A 是一款通过执行命令显示字符的液晶显示(LCD)设备;RN131 是一款内置 TCP/IP 协议的独立嵌入式 WiFi 设备;GM862 是一款四频 GSM/GPRS 蜂窝调制解调器,内置 TCP/IP、FTP 和 SMTP 协议支持。

To evaluate the generated driver for the WH2004A, an application which detects a door trespassing event and displays the event counts on the LCD is used. Since the WH2004A does not respond to incoming commands, the evaluation is focused on correctness only. The generated driver displayed the event counts without any errors for 100 door trespassing events.

为评估 WH2004A 的生成驱动程序,采用一款可检测门禁闯入事件并在 LCD 上显示事件计数的应用程序。由于 WH2004A 不对输入命令进行响应,因此评估仅聚焦于正确性。在 100 次门禁闯入事件测试中,生成的驱动程序无错误地显示了事件计数。

To evaluate the generated drivers for the RN131 and GM862, two test applications were used. The applications sense data from a group of sensors every 10 and 120 s, respectively, and record the execution time of each function. Sensor readings and execution times are then sent to a server. Each application is configured to perform 1000 transmission rounds in each test, and the average is used. Based on stored messages in the database, both drivers work as expected.

为评估 RN131 和 GM862 的生成驱动程序,设计了两款测试应用程序:分别每 10 秒和 120 秒从一组传感器采集数据,并记录每个函数的执行时间,随后将传感器读数和执行时间发送至服务器。每个应用程序在每次测试中配置为执行 1000 轮传输,结果取平均值。基于数据库中存储的消息数据,两款生成的驱动程序均达到预期工作效果。

4.2 Performance Evaluation

4.2 性能评估

We next evaluate the performance of the generated drivers relative to the handwritten drivers, both in terms of space and execution speed. We focus on the WiFi and cellular devices.

接下来,从空间占用和执行速度两方面,对比生成驱动程序与手写驱动程序的性能,评估重点为 WiFi 设备和蜂窝网络设备对应的驱动程序。

Execution Speed
执行速度

We first evaluate the execution speed of the generated drivers by sending 1000 850-byte messages to the server and tracking the execution time of each associated function. Figures 2a and b summarize the speed of the generated driver functions for the RN131 and the GM862 compared to the handwritten drivers. The x-axis represents the driver functions, and the y-axis represents the average cumulative execution time, in seconds, in a single transmission round. The functions are ordered by execution time, in decreasing order from left to right. As the figures illustrate, the generated drivers run faster than the handwritten drivers across all functions. The speed-up is achieved by reducing the time spent waiting for each response. The cumulative speed-up is proportional to the number of executions of each function in a transmission round. For example, in each round, the gm862_gsm_registered function executes approximately 40 times before detecting a valid network registration. Therefore, it shows a high speed-up in Fig. 2b. For the GM862, the overall execution time in each round is 48.50 s for the generated driver, compared to 59.60 s for the handwritten driver. For the RN131, the overall execution time in each round is 11.99 s for the generated driver, and 14.68 s for the handwritten driver.

通过向服务器发送 1000 条 850 字节的消息,并跟踪每个相关函数的执行时间,评估生成驱动程序的执行速度。图 2a 和图 2b 分别总结了 RN131 和 GM862 的生成驱动程序与手写驱动程序的函数执行速度对比。x 轴表示驱动程序函数,y 轴表示单次传输轮次中函数的平均累计执行时间(单位:秒),函数按执行时间从长到短(从左至右)排序。如图所示,生成的驱动程序在所有函数上的运行速度均快于手写驱动程序,速度提升源于减少了每条命令的等待响应时间。累计速度提升与传输轮次中每个函数的执行次数成正比。例如,在每轮传输中,gm862_gsm_registered 函数需执行约 40 次才能检测到有效的网络注册,因此在图 2b 中该函数呈现出显著的速度提升。对于 GM862,生成驱动程序的单轮总执行时间为 48.50 秒,而手写驱动程序为 59.60 秒;对于 RN131,生成驱动程序的单轮总执行时间为 11.99 秒,手写驱动程序为 14.68 秒。

Memory Usage
内存占用

We next evaluate the memory overhead introduced by the generated drivers. Avr-size is used to collect the memory data. Figure 3a summarizes the drivers' program memory (ROM) usage. The x-axis represents the drivers, and the y-axis represents size, in bytes. The hashed area represents ROM overhead introduced by the regular expression library. The ROM overhead is approximately 3400 bytes for both drivers. Figure 3b summarizes the drivers' data memory (RAM) usage. Again, the x-axis represents the drivers, and the y-axis represents size, in bytes. The hashed area represents RAM overhead introduced by the regular expressions used in the generated driver. The RAM overhead is closely related to the number of regular expressions used. Since the GM862 requires more regular expressions, the overhead for the GM862 is slightly larger than the WiFi chip, at 503 bytes.

接下来评估生成驱动程序引入的内存开销,采用 Avr-size 工具收集内存数据。图 3a 总结了驱动程序的程序存储器(ROM)占用情况,x 轴表示驱动程序,y 轴表示大小(单位:字节),阴影区域表示正则表达式库引入的 ROM 开销,两款生成驱动程序的 ROM 开销均约为 3400 字节。图 3b 总结了驱动程序的数据存储器(RAM)占用情况,x 轴表示驱动程序,y 轴表示大小(单位:字节),阴影区域表示生成驱动程序中正则表达式引入的 RAM 开销。RAM 开销与所用正则表达式的数量密切相关:由于 GM862 所需的正则表达式更多,其 RAM 开销略大于 WiFi 芯片对应的驱动程序,为 503 字节。

Fig. 2. Driver function execution time

图 2. 驱动程序函数执行时间

Execution Time in Seconds 执行时间(秒)

Handwritten Driver 手写驱动程序

Generated Driver 生成驱动程序

Fig. 3. Memory usage

图 3. 内存占用情况

驱动程序内存占用统计 ROM Memory Usage in Bytes(ROM 内存占用,字节) (a) Driver ROM

Usage(驱动程序 ROM 占用)

  • Generated Driver (Excluding Regular Expressions)(生成驱动程序,不含正则表达式库)
  • Regular Expressions(正则表达式库)
  • Handwritten Driver(手写驱动程序)

RAM Memory Usage in Bytes(RAM 内存占用,字节) (b) Driver RAM Usage(驱动程序 RAM

占用)

  • Generated Driver (Excluding Regular Expressions)(生成驱动程序,不含正则表达式库)
  • Regular Expressions(正则表达式库)
  • RN131、GM862(RN131、GM862)

5 Conclusion

5 结论

We described a configuration-based system to automatically generate serial device drivers and accurately measure the timeout characteristics associated with each driver command. Results show that the generated drivers perform as expected, introducing modest memory overhead. Importantly, the execution time of each command is reduced compared to the handwritten drivers. As a result, driver performance is increased, and improved energy efficiency is achieved.

本文提出一种基于配置的串行设备驱动程序自动生成系统,该系统可精确测量每条驱动命令的超时特性。实验结果表明,生成的驱动程序达到预期工作效果,仅引入适度的内存开销;更重要的是,与手写驱动程序相比,生成驱动程序的每条命令执行时间更短,从而提升了驱动程序性能并实现了更高的能效。

Acknowledgments

致谢

This work is supported by the NSF through awards CNS1541917 and CNS-1545705.

本研究得到美国国家科学基金会(NSF)资助,资助项目编号为 CNS1541917 和 CNS-1545705。

References

参考文献

  1. CESANTA. SLRE: super light regular expression library, September 2013. slre.sourceforge.net/
  2. Chou, P., Ortega, R., Borriello, G.: Synthesis of the hardware/software interface in microcontroller-based systems. In: Proceedings of the 1992 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 1992, pp. 488--495. IEEE Computer Society Press, Los Alamitos (1992)
  3. Eidson, G.W., Esswein, S.T., Gemmill, J.B., Hallstrom, J.O., Howard, T.R., Lawrence, J.K., Post, C.J., Sawyer, C.B., Wang, K.C., White, D.L.: The south carolina digital watershed: end-to-end support for real-time management of water resources. IJDSN, 1 (2010)
  4. Li, J., Xie, F., Ball, T., Levin, V., McGravey, C.: Formalizing hardware/software interface specifications. In: Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering, ASE 2011, pp. 143--152. IEEE Computer Society, Washington (2011)
  5. Locke, J.: Jakarta regexp Java regular expression package, April 2011. jakarta.apache.org/regexp/
  6. O'Nils, M., Jantsch, A.: Operating system sensitive device driver synthesis from implementation independent protocol specification. In: Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, pp. 562--567 (1999)
  7. O'Nils, M., Jantsch, A.: Device driver and DMA controller synthesis from HW/SW communication protocol specifications. Des. Autom. Embed. Syst. 6(2), 177--205 (2001)
  8. Ratter, A.: Automatic device driver synthesis from device specifications. The University of New South Wales, November 2012
  9. Ryzhyk, L., Chubb, P., Kuz, I., Le Sueur, E., Heiser, G.: Automatic device driver synthesis with termite. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP 2009, pp. 73--86. ACM, New York (2009)
  10. Shier, P., Garban, P.L., Oney, A.: System and method for validating communication specification conformance between a device driver and a hardware device. US2005246722 (2005)

DriverGen: Automatic Driver Generation for Windows GUI program

DriverGen:Windows GUI 程序的自动驱动生成

Zhaoyang Feng

冯朝阳

China National Digital Switching System Engineering and Technological Research Center Zhengzhou,China

中国国家数字交换系统工程技术研究中心 郑州,中国

Xinlei Wang *

王新蕾 *

China National Digital Switching System Engineering and Technological Research Center Zhengzhou,China

中国国家数字交换系统工程技术研究中心 郑州,中国

Guomiao Zhou

周国淼

China National Digital Switching System Engineering and Technological Research Center Zhengzhou,China

中国国家数字交换系统工程技术研究中心 郑州,中国

Qianqiong Wu

吴倩琼

China National Digital Switching System Engineering and Technological Research Center Zhengzhou,China

中国国家数字交换系统工程技术研究中心 郑州,中国

Abstract

Fuzz testing is the most effective method in software vulnerability detection. However, for programs on Windows platform, user graphical interface(GUI) obstructs fuzz testing process. In the existing solutions, manual writing fuzz driver widely used, but unable to large-scale applications due to the high requirement of reverse engineering ability. In order to solve this problem, we design the DriverGen, an automatic driver generation for Windows GUI program. DriverGen uses dynamic binary instrumentation to collect trace information of GUI programs, then rebuild function relationand, including function dependency extraction, LCA recognition and static control flow recovery to combine fuzz driver. We test DriverGen on 9 real GUI-programs in 4 different categories, such as Iqiyi Pictures and Format Factory. Compared with the manual driver, DriverGen improved the coverage of basic block and crashes. Our results show that DriverGen could identify 7 zero-day vulnerabilities, which 1 get CNVD and 1 get CVE number.

摘要

模糊测试是软件漏洞检测中最有效的方法。然而,对于 Windows 平台的程序,用户图形界面(GUI)会阻碍模糊测试过程。现有解决方案中,人工编写模糊测试驱动的方式被广泛使用,但由于对逆向工程能力要求较高,无法大规模应用。为解决该问题,本文设计了一种面向 Windows GUI 程序的自动驱动生成系统 DriverGen。DriverGen 采用动态二进制插桩技术收集 GUI 程序的轨迹信息,随后重构函数关系(包括函数依赖提取、最近公共祖先(LCA)识别和静态控制流恢复),进而组合生成模糊测试驱动。我们在 4 个不同类别(如视频、图像等)的 9 个真实 GUI 程序(包括爱奇艺图片、格式工厂等)上对 DriverGen 进行测试。与人工驱动相比,DriverGen 提高了基本块覆盖率和崩溃检测数量。实验结果表明,DriverGen 成功识别出 7 个零日漏洞,其中 1 个获得 CNVD 编号,1 个获得 CVE 编号。

Keywords

关键词

Fuzzing test; GUI program testing; Dynamic binary instrumentation; Binary analysis

模糊测试;GUI 程序测试;动态二进制插桩;二进制分析

I. INTRODUCTION

I. 引言

Fuzz testing as an effective technology to automatically find software errors, has found tens of thousands of vulnerabilities since its introduction. Nowadays, fuzz testing has become an indispensable learning method and tool for safety researchers. However, most of the objects of automated fuzz testing are concentrated on Linux system, which ignores the fuzz testing on Windows [1-4]. As of 2021, Windows still accounts for about 73% of the market share, and it is an important target of attackers, because it's the system that users directly contact.

模糊测试作为一种自动发现软件错误的有效技术,自问世以来已发现数万计的漏洞。如今,模糊测试已成为安全研究人员不可或缺的学习方法和工具。然而,自动化模糊测试的对象大多集中在 Linux 系统,却忽视了对 Windows 系统的模糊测试[1-4]。截至 2021 年,Windows 仍占据约 73% 的市场份额,由于其是用户直接接触的系统,因此成为攻击者的重要目标。

Windows applications rely on Graphical User Interface to interact with end users, which constitutes a major obstacle to fuzz testing [5-6]. As the image analysis software ABCView requires users to click the button through the GUI graphical dialog window first, and then the main program will continue to call the corresponding core image analysis function for further loading. This over-reliance on GUI leads to the inability to implement the automated process of fuzz testing, so an automated framework is urgently needed to assist in the process of building fuzz drivers for GUI programs on Windows platform.

Windows 应用程序依赖图形用户界面(GUI)与终端用户交互,这构成了模糊测试的主要障碍[5-6]。例如,图像分析软件 ABCView 要求用户先通过 GUI 图形对话框点击按钮,主程序才会继续调用相应的核心图像分析函数进行后续加载。这种对 GUI 的过度依赖导致模糊测试无法实现自动化流程,因此迫切需要一种自动化框架来辅助 Windows 平台 GUI 程序的模糊测试驱动构建过程。

Fig. 1 The Rendering Process of ABCView

图 1 ABCView 的渲染流程

Based on this, we propose an automatic driver generation for Windows GUI program, which tracks the target software to collect the dynamic behavior information of the application program, and searches the LCA of the target to ensure that it can cover the key function sequence. Finally, static analysis is used to extract and reconstruct it to build fuzz drivers. The contributions of paper include the following 3 points:

基于此,本文提出一种 Windows GUI 程序自动驱动生成方案:通过追踪目标软件收集应用程序的动态行为信息,搜索目标的最近公共祖先(LCA)以确保覆盖关键函数序列,最终利用静态分析提取并重构生成模糊测试驱动。本文的贡献包括以下 3 点:

  1. We proposed a dynamic path tracing framework, which can extract path information from the closed source program of Windows platform, and has stronger applicability compared with the existing FUDGE and other methods of analyzing static call relationships based on source files.

    提出一种动态路径追踪框架,可从 Windows 平台闭源程序中提取路径信息,相比现有基于源代码分析静态调用关系的 FUDGE 等方法,具有更强的适用性。

  2. We proposed a set of algorithms for extracting function dependencies and LCA identification offline. Compared with manual, the driver based on this algorithm has higher block coverage and stronger ability to detect crashes.

    提出一套离线的函数依赖提取与 LCA 识别算法,基于该算法生成的驱动相比人工驱动,具有更高的块覆盖率和更强的崩溃检测能力。

  3. We designed an automatic driver generation system DriverGen, which is oriented to bypass the graphical interface of Windows programs. DriverGen can discover zero-day vulnerabilities in real-world applications and obtains CVE numbers.

    设计了面向 Windows 程序图形界面绕过的自动驱动生成系统 DriverGen,该系统可发现实际应用中的零日漏洞并获得 CVE 编号。

978-1-6654-0886-8/22/$31.00 ©2022 电气和电子工程师协会(IEEE)

DOI: 10.1109/ICCECE54139.2022.9712749

II. 背景与相关工作

A. Graphical user interface(GUI) bypass

A. 图形用户界面(GUI)绕过

About the problem of GUI hinders fuzzing process, researchers have various solutions, and the mainstream can be divided into the following 3 types:

针对 GUI 阻碍模糊测试过程的问题,研究人员提出了多种解决方案,主流方案可分为以下 3 类:

(1) Simulating human operation by writing a script: The essence of this method is to use automated scripts to simulate people's operation behaviors. A typical example is Quan[7] and René[8] use AutoIt3 to compile the simulation script in advance to assist Winafl in fuzz testing.

(1) 通过编写脚本模拟人工操作:该方法的本质是利用自动化脚本模拟人的操作行为。典型案例为全[7]和勒内[8]利用 AutoIt3 预先编写模拟脚本,辅助 Winafl 进行模糊测试。

(2) Eliminating GUI element codes based on patches: This method skips the image interface related functions that interact with users by patching, such as Nadav[9] patches to eliminate message boxes and dialog boxes that need user interaction when fuzz testing WinRAR compression software.

(2) 基于补丁剔除 GUI 元素代码:该方法通过打补丁的方式跳过与用户交互的图像界面相关函数,例如纳达夫[9]在对 WinRAR 压缩软件进行模糊测试时,通过打补丁剔除需要用户交互的消息框和对话框。

(3) Build the fuzz driver: This method is promoted by writing, only extract the core function of the target program, then transform it into an independent program that can directly call the specific function of the target by using the command line, and finally perform fuzz testing on the independent program.

(3) 构建模糊测试驱动:该方法通过编写实现,仅提取目标程序的核心函数,再将其转化为可通过命令行直接调用目标特定函数的独立程序,最终对该独立程序进行模糊测试。

Among the three methods, first one has the advantages of convenience and simplicity, it can be completed without changing target application. The disadvantage of it is limited by the low speed of script execution, which will greatly reduce the efficiency of fuzzing. The second method is very difficult to implement because it needs deep reverse software and has a high probability of failure. The third method also needs reverse analysis to understand the internal logic of software, but after the driver is built, it can be used repeatedly, so this method has become the mainstream method for exploiting vulnerabilities in Windows platform software. Although this method is widely used, it is still a time-consuming and challenging task to manually create an effective fuzz testing driver.

在这三种方法中,第一种方法具有便捷简单的优势,无需修改目标应用程序即可完成;但其缺点是受限于脚本执行速度慢,会大幅降低模糊测试效率。第二种方法实施难度极大,需要进行深度软件逆向,且失败概率高。第三种方法同样需要通过逆向分析理解软件内部逻辑,但驱动构建完成后可重复使用,因此成为 Windows 平台软件漏洞挖掘的主流方法。尽管该方法应用广泛,但人工创建有效的模糊测试驱动仍是一项耗时且具有挑战性的任务。

B. Automated fuzz driver generation

B. 自动化模糊测试驱动生成

In order to solve the above problems, new research has been made in the academic field of automatic fuzz driver generation in recent years: FUDGE[10] is to slice the target source code with ClangMR, extract the calling sequence and context of API to create the fuzz driver and test it with dynamic analysis; FuzzGen[11] uses the target source code to infer the function call relationship, then abstract API function call graph (A2DG) is constructed, finally the driver is synthesized based on A2DG.

为解决上述问题,近年来学术界在自动化模糊测试驱动生成领域开展了新的研究:FUDGE[10] 利用 ClangMR 对目标源代码进行切片,提取 API 的调用序列和上下文以创建模糊测试驱动,并通过动态分析进行测试;FuzzGen[11] 利用目标源代码推断函数调用关系,进而构建抽象 API 函数调用图(A2DG),最终基于 A2DG 合成驱动程序。

IntelliGen [12] proposed to generate a driver for the selected entry function through hierarchical parameter replacement and type inference; Autoharness uses codeql to generate the control flow diagram for the source code, then makes an informed guess on the generation of fuzz driver.

IntelliGen[12] 提出通过分层参数替换和类型推断,为选定的入口函数生成驱动;Autoharness 利用 CodeQL 为源代码生成控制流图,随后对模糊测试驱动的生成进行合理推测。

In addition, there are some other related researches [13-15], but the existing automatic methods are limited by the source code, so they can't easily handle the closed source software on Windows.

此外,还有其他一些相关研究[13-15],但现有自动化方法受限于源代码,因此难以处理 Windows 平台上的闭源软件。

III. THE DESIGN OF DRIVERGEN

III. DriverGen 的设计

To address problems mentioned in the previous sections, we propose a DriverGen, which can bypass the GUI of Windows programs and generate fuzz drivers. As figure 1, the system mainly consists of 4 modules: A) dynamic path tracing framework, B) function relation identification and extraction, C) LCA recognition and D) Static recovery.

为解决前文所述问题,本文提出 DriverGen 系统,该系统可绕过 Windows 程序的 GUI 界面并生成模糊测试驱动。如图 1 所示,系统主要包含 4 个模块:A)动态路径追踪框架、B)函数关系识别与提取、C)LCA(最近公共祖先)识别、D)静态恢复。

Fig. 2 Workflow of DriverGen

图 2 DriverGen 的工作流程

Specifically, the program and the input given by the user is executed in the pile insertion monitoring environment, dynamic path tracing framework is used to extract the path information of the process, and function relation identification and extraction is used to determine the dependency relationship between pointer types and functions, then key sequence is obtained by identifying the nearest common ancestor node in the collected sequence by static analysis. Finally, the driver does not depend on GUI is generated for fuzz testing.

具体而言,用户给定的程序及输入在插桩监控环境中执行,动态路径追踪框架用于提取进程的路径信息,函数关系识别与提取模块确定指针类型与函数之间的依赖关系,随后通过静态分析识别收集序列中的最近公共祖先节点,得到关键序列。最终生成不依赖 GUI 的驱动程序,用于模糊测试。

A. Dynamic tracking framework

A. 动态追踪框架

The design of the dynamic tracking framework module is to monitor the LoadLibrary function with DynamoRIO to determine the dynamic link library (dll file) currently loaded by the application.

动态追踪框架模块的设计目标是利用 DynamoRIO 监控 LoadLibrary 函数,以确定应用程序当前加载的动态链接库(dll 文件)。

DynamoRIO redirects the control flow to the module for parsing every time the application program finishes executing in the GUI interface and loads a new dll file. At this time, the module enumerates the export functions in the dll file. And register a special callback function for each function.

每当应用程序在 GUI 界面执行完毕并加载新的 dll 文件时,DynamoRIO 会将控制流重定向至该模块进行解析。此时,模块会枚举 dll 文件中的导出函数,并为每个函数注册一个特殊的回调函数。

After this step, when the application program calls any export function again, the callback function in the module will execute before the export function, so that the dynamic tracking framework module can record the key call functions, parameters and return values between the DLL file and the main program. The information flow is shown in Fig. 3.

完成该步骤后,当应用程序再次调用任意导出函数时,模块中的回调函数会在导出函数执行前运行,从而使动态追踪框架模块能够记录 DLL 文件与主程序之间的关键调用函数、参数及返回值。信息流如图 3 所示。

Fig. 3 Using DBI to capture inter-process information

图 3 利用 DBI 捕获进程间信息

B. Function dependency extraction

B. 函数依赖提取

In order to describe the function dependency extraction algorithm, it is necessary to define the data source, dependency relationship and trace projection.

为描述函数依赖提取算法,需定义数据源、依赖关系和轨迹投影。

Definition 1 (Data source) For any objective function F F F, set its input to F . i n F.in F.in, with the output set to F . o u t F.out F.out. The two are expressed as objective function F F F's consumption data source and production data source.

定义 1(数据源) 对于任意目标函数 F F F,设其输入为 F . i n F.in F.in,输出为 F . o u t F.out F.out。二者分别表示目标函数 F F F 的消费数据源和生产数据源。

Definition 2 (Data dependence) If an input parameter of the objective function is the output return value of another function, or an input parameter of another function is the output return value of the objective function, then it can be determined that there is a data dependency relationship. If the functions are α \alpha α and γ \gamma γ, which satisfy the condition (1), it can be recorded as ( F α , F γ ) (F_{\alpha}, F_{\gamma}) (Fα,Fγ).

定义 2(数据依赖) 若目标函数的一个输入参数是另一个函数的输出返回值,或另一个函数的一个输入参数是该目标函数的输出返回值,则可判定二者存在数据依赖关系。设函数 α \alpha α 和 γ \gamma γ 满足条件(1),则记为 ( F α , F γ ) (F_{\alpha}, F_{\gamma}) (Fα,Fγ)。

( F α . i n ⋂ F γ . o u t ) ∪ ( F γ . i n ∩ F α . o u t ) ≠ ∅ ( 1 ) \left(F_{\alpha}.in \bigcap F_{\gamma}.out \right) \cup \left(F_{\gamma}.in \cap F_{\alpha}.out \right) \neq \emptyset \quad (1) (Fα.in⋂Fγ.out)∪(Fγ.in∩Fα.out)=∅(1)

Definition 3 (Tracer projection) Let the length of program P P P with respect to input I I I be K K K, the tracer projection is an ordered pair T = < ( ρ 1 , σ 1 ) , ⋯   , ( ρ k , σ k ) > T=<(\rho_{1}, \sigma_{1}), \cdots, (\rho_{k}, \sigma_{k})> T=<(ρ1,σ1),⋯,(ρk,σk)>, where ρ i ∈ N \rho_{i} \in N ρi∈N ( 1 ≤ i ≤ k 1 \leq i \leq k 1≤i≤k), < ρ 1 , ρ 2 , ⋯   , ρ k > <\rho_{1}, \rho_{2}, \cdots, \rho_{k}> <ρ1,ρ2,⋯,ρk> is the path of the tracer, and σ i \sigma_{i} σi ( 1 ≤ i ≤ k 1 \leq i \leq k 1≤i≤k) is the mapping from a program variable to the value of ρ i \rho_{i} ρi function before execution.

定义 3(轨迹投影) 设程序 P P P 针对输入 I I I 的长度为 K K K,其轨迹投影为有序对 T = < ( ρ 1 , σ 1 ) , ⋯   , ( ρ k , σ k ) > T=<(\rho_{1}, \sigma_{1}), \cdots, (\rho_{k}, \sigma_{k})> T=<(ρ1,σ1),⋯,(ρk,σk)>,其中 ρ i ∈ N \rho_{i} \in N ρi∈N( 1 ≤ i ≤ k 1 \leq i \leq k 1≤i≤k), < ρ 1 , ρ 2 , ⋯   , ρ k > <\rho_{1}, \rho_{2}, \cdots, \rho_{k}> <ρ1,ρ2,⋯,ρk> 为轨迹路径, σ i \sigma_{i} σi( 1 ≤ i ≤ k 1 \leq i \leq k 1≤i≤k)表示程序变量到 ρ i \rho_{i} ρi 函数执行前值的映射。

According to the above definitions, the function dependency extraction process is realized as shown in Algorithm 1.

基于上述定义,函数依赖提取过程通过算法 1 实现。

Algorithm 1: Function dependency extraction

算法 1:函数依赖提取

Input:

输入:
S e q F u n c = < ( ρ 1 , σ 1 ) , ( ρ 2 , σ 2 ) , ⋯   , ( ρ k , σ k ) > SeqFunc = <(\rho_1, \sigma_1), (\rho_2, \sigma_2), \cdots, (\rho_k, \sigma_k)> SeqFunc=<(ρ1,σ1),(ρ2,σ2),⋯,(ρk,σk)>

Output: Final function dependency set S S S

输出:最终函数依赖集 S S S

Algorithm 1 is a dynamic trajectory function dependency extraction algorithm. In the initialization stage, the dependency set S S S and temporary array D D D are both set to empty; then the algorithm judges the type of the original function sequence extracted by the DBI tool, and stores the corresponding function and pointer data information in the temporary array D D D.

算法 1 是一种动态轨迹函数依赖提取算法。初始化阶段,依赖集 S S S 和临时数组 D D D 均设为空;随后算法对 DBI 工具提取的原始函数序列进行类型判断,将对应的函数及指针数据信息存入临时数组 D D D。

Then, for any two functions F α F_{\alpha} Fα and F γ F_{\gamma} Fγ in D D D, if their production data matches the consumption data type and satisfies the data dependency in Definition 2, the function pair is stored in the data dependency set S S S, which provides the basis for further call relationship recovery.

之后,对于 D D D 中任意两个函数 F α F_{\alpha} Fα 和 F γ F_{\gamma} Fγ,若其生产数据与消费数据类型匹配,且满足定义 2 中的数据依赖关系,则将该函数对存入数据依赖集 S S S,为后续调用关系恢复提供依据。

C. LCA recognition

C. LCA 识别

In the directed acyclic graph, the LCA (Lowest Common Ancestor) of two nodes is the deepest node that can reach both nodes at the same time. This part aims to determine the LCA points.

在有向无环图中,两个节点的 LCA(最近公共祖先)是能够同时到达这两个节点的最深节点。本部分旨在确定 LCA 点。

In order to ensure that the selection of fuzz driven target points covers key function sequences, we explore the deepest node in the call graph that meets the following two conditions:

为确保模糊测试驱动目标点的选择能覆盖关键函数序列,我们在调用图中寻找满足以下两个条件的最深节点:

(1) Before the related operations of file reading, the symbolic API calls are o p e n F i l e openFile openFile and r e a d F i l e readFile readFile, which are two file operation functions of Windows system, so that the generated driver can normally carry out the mutation operation of corpus seeds;

(1)位于文件读取相关操作之前,标志性 API 调用为 Windows 系统的两个文件操作函数 o p e n F i l e openFile openFile 和 r e a d F i l e readFile readFile,以确保生成的驱动能正常执行语料库种子的变异操作;

(2) Before the position of calling the analytic function is reached, theoretically, the more analytic functions are included, the more objects will be tested in the fuzzing process, and the selection of LCA points will directly affect the test coverage.

(2)位于调用解析函数之前,理论上包含的解析函数越多,模糊测试过程中被测对象就越多,LCA 点的选择直接影响测试覆盖率。

According to the above definition and the characteristics of the required target object, as shown in Figure 4, the nearest common ancestor selection of its key function should consider meeting two conditions at the same time, and it is determined that node 2 is the LCA point. The specific process of realizing the offline algorithm to determine the position of the nearest common ancestor of key functions is shown in Algorithm 2.

根据上述定义及所需目标对象的特征,如图 4 所示,其关键函数的最近公共祖先选择需同时满足两个条件,最终确定节点 2 为 LCA 点。确定关键函数最近公共祖先位置的离线算法具体实现流程如算法 2 所示。

Fig. 4 The LCA of Function

图 4 函数的 LCA 示意图

Algorithm 2: Offline algorithm for LCA

算法 2:LCA 离线算法

Input: Any point u u u in function dependency set S S S

输入:函数依赖集 S S S 中的任意点 u u u

Output: The common ancestor a n c e s t o r [ i ] ancestor[i] ancestor[i] of all nodes i i i

输出:所有节点 i i i 的公共祖先 a n c e s t o r [ i ] ancestor[i] ancestor[i]


D. Static recovery

D. 静态恢复

Most of the information extracted by the tracking framework is missing the necessary symbolic information and address. We use the static analyzer IDA to obtain the necessary information to help the extracted relational sequence to recover the information.

追踪框架提取的大部分信息缺失必要的符号信息和地址。我们利用静态分析工具 IDA 获取所需信息,辅助已提取的关系序列进行信息恢复。

In IDA, binary exists in the form of control flow graph. If the control flow graph G G G is a directed graph, G = ( N , E , E n t r y , E x i t ) G=(N, E, Entry, Exit) G=(N,E,Entry,Exit), the digraph node is a basic module with unique entrance and exit.

在 IDA 中,二进制文件以控制流图形式存在。若控制流图 G G G 为有向图,即 G = ( N , E , E n t r y , E x i t ) G=(N, E, Entry, Exit) G=(N,E,Entry,Exit),则图中节点为具有唯一入口和出口的基本模块。

If the control flow transfers from basic module A A A to basic module B B B, a directed edge is connected from node A A A to node B B B. Where N N N is a collection of nodes, which represents the basic modules of the program; E E E is a set of edges, and each edge is represented as an ordered node pair ( N i , N j ) (N_{i}, N_{j}) (Ni,Nj), representing the transfer of control flow from node N i N_{i} Ni to N j N_{j} Nj; E n t r y Entry Entry and E x i t Exit Exit represent the entry point and exit point of the subroutine.

若控制流从基本模块 A A A 转移至基本模块 B B B,则从节点 A A A 到节点 B B B 连接一条有向边。其中, N N N 是节点集合,代表程序的基本模块; E E E 是边集合,每条边表示为有序节点对 ( N i , N j ) (N_{i}, N_{j}) (Ni,Nj),代表从节点 N i N_{i} Ni 到 N j N_{j} Nj 的控制流转移; E n t r y Entry Entry 和 E x i t Exit Exit 分别代表子程序的入口点和出口点。

Fig. 5 is a schematic diagram of the control flow of the program, which consists of five basic blocks, of which 0 is the entry basic block and 4 is the exit basic block.

图 5 是程序控制流示意图,包含 5 个基本块,其中 0 为入口基本块,4 为出口基本块。

Fig. 5 Program control flow diagram

图 5 程序控制流图

Based on this, IDA Python is used to realize the function information extraction process in the control flow diagram. Find the entry basic block position in the control flow chart formed by IDA analysis, use S E G _ C O D E SEG\_CODE SEG_CODE to judge whether the current position belongs to the code segment, and then use g e t _ f u n c _ n a m e get\_func\_name get_func_name and g e t _ t i n f o get\_tinfo get_tinfo functions to obtain the required function information, including the function type, return type and function name, etc.

基于此,利用 IDA Python 实现控制流图中的函数信息提取流程。在 IDA 分析生成的控制流图中找到入口基本块位置,通过 S E G _ C O D E SEG\_CODE SEG_CODE 判断当前位置是否属于代码段,随后调用 g e t _ f u n c _ n a m e get\_func\_name get_func_name 和 g e t _ t i n f o get\_tinfo get_tinfo 函数获取所需函数信息(包括函数类型、返回类型、函数名等)。

After serialization, it is temporarily stored in the local JSON file, and the function judgment of the next position is continued.

序列化后将其临时存储在本地 JSON 文件中,继续对下一个位置进行函数判断。

IV. EVALUATION

IV. 评估

Through these experiments, the following research questions are tackled:

通过这些实验,我们旨在解决以下研究问题:

RQ1: Can DriverGen generate fuzz drivers for real GUI programs in reality?

RQ1:DriverGen 能否为实际的 GUI 程序生成模糊测试驱动?

RQ2: Is the fuzz driver generated by DriverGen more effective than manual writing?

RQ2:DriverGen 生成的模糊测试驱动是否比人工编写的更有效?

RQ3: Can DriverGen find new unpublished vulnerabilities from real-world applications?

RQ3:DriverGen 能否从实际应用中发现未公开的新漏洞?

We select 9 real applications such as IQIYI Picture, which are common on Windows platform, as the test set. Including 4 categories of video, text editing, image and subtitle processing, involving 7 different third-party libraries, and triggering the input of key function sequences including 8 file formats such as pdf and avi. All the test objects have GUI interfaces, and some of them have been written by security researchers. For example, the Freeimage component in image software once had the public driver of Google, while IrfanView had the test driver published by Hardik Shah of McAfee, as shown in Table 1.

触发包含 PDF、AVI 等 8 种文件格式的关键函数序列输入。所有测试对象均具备 GUI 界面,其中部分已有安全研究人员编写的测试驱动。例如,图像软件中的 FreeImage 组件曾有谷歌公开的驱动程序,而 IrfanView 则有迈克菲(McAfee)的哈迪克·沙阿(Hardik Shah)发布的测试驱动,具体如下表 1 所示。

A. Applicability experiment

A. 适用性实验

In order to evaluate the applicability of DriverGen, we record the joint results while the system is running.

为评估 DriverGen 的适用性,我们记录了系统运行过程中的相关结果。

Table 2 lists the results of dynamic track recording, we observed that the number of API has a positive correlation with trace size. For example, if the intermediate path file extracted and generated in Format Factory is large (418 Mb), the number of effective function call relationships finally extracted is relatively large (19).

表 2 列出了动态轨迹记录结果,我们观察到 API 数量与轨迹大小呈正相关。例如,格式工厂中提取生成的中间路径文件较大(418 Mb),最终提取的有效函数调用关系数量也相对较多(19 个)。

Table 2 Dynamic information extraction

表 2 动态信息提取结果

Table 3 lists the information record results generated by DriverGen, it can be seen that the Ratio of this experimental part of the system ranges from 8.7% (IQIYIPicture) to 22.3% (IrfanView), and the ratio of most target application objects is around 10%, which can prove that the degree of automation can replace some manual operations.

表 3 列出了 DriverGen 生成的驱动恢复提取结果。可以看出,系统该实验部分的比例范围为 8.7%(爱奇艺图片)至 22.3%(IrfanView),且大多数目标应用程序的比例在 10% 左右,这表明该系统的自动化程度可替代部分人工操作。

Table 3 Driver recovery extraction

表 3 驱动恢复提取结果

B. Effectiveness experiment

B. 有效性实验

According to 4 software with manually drivers in the test set in Table 1, we compare them with the drivers generated by DriverGen system to explore the effectiveness of system. In process of fuzzing, the basic block coverage and the number of Crashes are selected to visualize the data.

针对表 1 测试集中 4 款拥有人工驱动的软件,我们将其与 DriverGen 系统生成的驱动进行对比,以探究系统的有效性。在模糊测试过程中,我们选取基本块覆盖率和崩溃数量作为数据可视化指标。

(1) Basic block coverage compare

(1) 基本块覆盖率对比

Fig. 6 shows the comparison results between DriverGen and manually driven basic block coverage in 24h. Based on the statistics of Mann-Whitney U test, the P-values of four target objects are 1.9E-05, 1.6E-01, 2.1E-03 and 1.1E-05, which is lower than 5E-02 except Acrobat, it proves that DriverGen system has a positive effect of improving the block coverage range.

图 6 展示了 24 小时内 DriverGen 与人工驱动的基本块覆盖率对比结果。基于曼-惠特尼 U 检验(Mann-Whitney U test)统计,四个目标对象的 P 值分别为 1.9E-05、1.6E-01、2.1E-03 和 1.1E-05,除 Adobe Acrobat 外均低于 5E-02,这表明 DriverGen 系统对提升块覆盖范围具有积极作用。

The reason is manually drivers are limited by the incomplete information obtained and the low level of function relation, while DriverGen focuses on the relationships between functions. But in Acrobat's experiment, the manually driver is better, main reason is researcher made more accurate judgment in the manually driver, they determined offset value in the iOffset file by using hdr->key=0x9AC6CDD7. Compared with DriverGen, the precision and flexibility of conditional control is somewhat lacking.

原因在于人工驱动受限于获取的信息不完整以及函数关联程度较低,而 DriverGen 重点关注函数间的关联关系。但在 Adobe Acrobat 的实验中,人工驱动表现更优,主要原因是研究人员在人工驱动中做出了更精准的判断,他们通过 hdr->key=0x9AC6CDD7 确定了 iOffset 文件中的偏移值。相比之下,DriverGen 在条件控制的精准度和灵活性方面略有不足。

Fig. 6 Comparison of Coverage

图 6 覆盖率对比

(2) Number of crash compare

(2) 崩溃数量对比

Fig. 7 shows the comparison results of the number of crashes discovered by DriverGen and manual driver in 24h. Based on statistics of Mann-Whitney U test, the P-values of two target objects are 5.3E-04 and 3.9E-05, respectively which is lower than 5E-02, it proves that DriverGen system has a positive effect on the improvement of crash detection ability and has statistical significance.

图 7 展示了 24 小时内 DriverGen 与人工驱动发现的崩溃数量对比结果。基于曼-惠特尼 U 检验统计,两个目标对象的 P 值分别为 5.3E-04 和 3.9E-05,均低于 5E-02,这表明 DriverGen 系统对提升崩溃检测能力具有积极作用,且具有统计学意义。

According to the investigation in Table 3, we found the number of function call relationships contained in driver generated by DriverGen is generally larger than that of manual. For example, in IrfanView, the driver generated by the DriverGen contains 14 different call functions, while the driver generated by manual only contains 2.

根据表 3 的调查结果,我们发现 DriverGen 生成的驱动所包含的函数调用关系数量通常多于人工驱动。例如,在 IrfanView 中,DriverGen 生成的驱动包含 14 个不同的调用函数,而人工生成的驱动仅包含 2 个。

Because the relationship extraction algorithm found more explicit dependencies expanded the search space of program path, theoretically, more basic blocks, loops and calls can be reached in the process of fuzzing, and the probability of finding crashes also increased correspondingly.

由于关系提取算法发现了更多显性依赖,扩展了程序路径的搜索空间,理论上在模糊测试过程中能够覆盖更多基本块、循环和调用,发现崩溃的概率也相应提高。

We further explore the types of crashes captured by DriverGen and manual writing drivers, it is summarized in Figure 8. The DriverGen will not only capture more crash samples, but also improve the types of crash exceptions captured.

我们进一步探究了 DriverGen 与人工驱动捕获的崩溃类型,结果总结如图 8 所示。DriverGen 不仅能捕获更多崩溃样本,还能增加捕获的崩溃异常类型。

Fig. 7 Comparison of Crashes

图 7 崩溃数量对比

Fig. 8 Bugs found by DriverGen and Manual

图 8 DriverGen 与人工驱动发现的漏洞类型对比

In conclusion, the driver generated by DriverGen achieves higher basic block coverage and stronger crash detection ability in most cases, but it doesn't mean DriverGen is superior to manual writing in all aspects. Through the comparison of two different drivers during the experiment, different types of them have their own advantages: The Manually driver is smaller, simpler and more flexible, and more fineness in judging function conditions. On the contrary, DriverGen performs well in integrity, and contain more function call relationships, so the probability of finding vulnerabilities will be greater.

综上所述,DriverGen 生成的驱动在大多数情况下实现了更高的基本块覆盖率和更强的崩溃检测能力,但这并不意味着 DriverGen 在所有方面都优于人工编写。通过实验中两种不同驱动的对比,发现它们各有优势:人工驱动体积更小、更简洁灵活,在函数条件判断上更为精细;相反,DriverGen 在完整性方面表现出色,包含更多函数调用关系,因此发现漏洞的概率更高。

C. Real vulnerability detection

C. 真实漏洞检测

In order to explore the real vulnerability detection ability of DriverGen, we make a long-term continuous fuzz testing on Table 1. During the test, hundreds of crash records were collected. After manually classifying them, 7 unpublished vulnerabilities were finally found in 6 of 9 applications, including 4 types of vulnerabilities such as integer overflow.

为探究 DriverGen 的真实漏洞检测能力,我们对表 1 中的测试对象进行了长期连续的模糊测试。测试过程中收集到数百条崩溃记录,经人工分类后,最终在 9 款应用中的 6 款中发现了 7 个未公开漏洞,包括整数溢出等 4 种漏洞类型。

One of them has been submitted to the National Vulnerability Center and assigned CNVD numbers, and the other one has been assigned CVE numbers. Details of vulnerabilities are shown in Table 4.

其中 1 个漏洞已提交至国家信息安全漏洞库(CNVD)并获得 CNVD 编号,另 1 个获得 CVE 编号。漏洞详情如下表 4 所示。

Table 4 The Table of Discovered Vulnerabilities

表 4 发现的漏洞详情表

(a) Case 1: ABCView

(a) 案例 1:ABCView

As an image software with GUI interface, ABCView's image loading function is mainly realized by calling the function library FreeImage. DriverGen first records the call information between the main process and the library by using the dynamic tracking framework, then extracts call information by using the function dependency extraction algorithm.

ABCView 是一款具备 GUI 界面的图像软件,其图像加载功能主要通过调用 FreeImage 函数库实现。DriverGen 首先利用动态追踪框架记录主进程与该库之间的调用信息,再通过函数依赖提取算法提取调用关系。

The main dependencies of the program are as follows: Main program -> FreeImage_LoadFromMemory -> FreeImage_GetImageType -> FreeImage_GetWidth -> FreeImage_GetHeight -> FreeImage_Unload. Finally, the key sequence function fuzz driver is successfully generated.

程序的主要依赖关系如下:主程序 -> FreeImage_LoadFromMemory -> FreeImage_GetImageType -> FreeImage_GetWidth -> FreeImage_GetHeight -> FreeImage_Unload。最终成功生成关键序列函数模糊测试驱动。

Testing caught the crash phenomenon during the driver's running. After analysis, there are fragile statements: IO->read_proc(FreeImage_GetBits(DIB), height*pitch). If the value of "height * pitch" is not controlled, it may lead to heap overflow vulnerability, which is marked as CNVD-2020-30165.

测试中捕获到驱动运行过程中的崩溃现象。经分析,存在脆弱语句:IO->read_proc(FreeImage_GetBits(DIB), height*pitch)。若"height * pitch"的值未受控制,可能导致堆溢出漏洞,该漏洞编号为 CNVD-2020-30165。

(b) Case 2: Format Factory

(b) 案例 2:格式工厂

Format Factory is a multifunctional format conversion software with GUI interface. The subtitle rendering function in video conversion is realized by libass. DriverGen system identified its main dependencies as: Main program -> ass_library_init -> ass_read_memory -> ass_render_frame -> ass_free_track.

格式工厂是一款具备 GUI 界面的多功能格式转换软件,其视频转换中的字幕渲染功能通过 libass 实现。DriverGen 系统识别出其主要依赖关系为:主程序 -> ass_library_init -> ass_read_memory -> ass_render_frame -> ass_free_track。

After generating the key sequence function fuzz driver, fuzzing caught the crash. Through analysis, the reason is integer overflow when calling outline_stroke function, which is a high-risk vulnerability. This vulnerability is verified to exist in libass components of other platforms, and is marked as CVE-2020-26682.

生成关键序列函数模糊测试驱动后,模糊测试捕获到崩溃现象。经分析,原因是调用 outline_stroke 函数时发生整数溢出,属于高危漏洞。该漏洞经验证在其他平台的 libass 组件中同样存在,编号为 CVE-2020-26682。

V. CONCLUSIONS AND FURTHER WORK

V. 结论与未来工作

In this paper, we propose an automatic driver generation system for Windows GUI program. Firstly, a dynamic path tracing framework is proposed to collect the information of GUI programs, then the dependencies of key functions are determined to ensure more sequences can be covered. Finally, static analysis is used to reconstruct it into a fuzz driver without GUI operation.

本文提出了一种面向 Windows GUI 程序的自动驱动生成系统。首先,设计动态路径追踪框架以收集 GUI 程序的相关信息;其次,确定关键函数的依赖关系,确保覆盖更多序列;最后,通过静态分析将其重构为无需 GUI 操作的模糊测试驱动。

We evaluating the real-software in 3 aspects, proved that the DriverGen system is effective. However, the system still has the following improvements:

我们从 3 个方面对真实软件进行了评估,证明了 DriverGen 系统的有效性。但该系统仍有以下改进方向:

(1) The system relies on the program execution track to extract information for processing, but extracting disordered function sequences may lead to errors and false positives in the process of driver generation. In the future, we refer to FuzzGen method of analysis header files to improve the accuracy of extracting function information.

系统依赖程序执行轨迹提取信息进行处理,但提取无序的函数序列可能导致驱动生成过程中出现错误和误报。未来,我们将借鉴 FuzzGen 分析头文件的方法,提高函数信息提取的准确性。

(2) The system mainly focuses on the relationship between main program and third-party library. In the future, we will capture the complete information of flow and dependence data between different modules.

系统目前主要关注主程序与第三方库之间的关系。未来,我们将捕获不同模块之间完整的数据流和依赖关系信息。

References

参考文献

1\] Boehme M, Cadar C, Roychoudhury A. Fuzzing: Challenges and Reflections\[J\]. IEEE Software, 2021, 38(3): 79-86. \[2\] Li J, Zhao B, Zhang C. Fuzzing: a survey\[J\]. Cybersecurity, 2018, 1(1): 1-13. \[3\] Manès V J M, Han H S, Han C, et al. The art, science, and engineering of fuzzing: A survey\[J\]. IEEE Transactions on Software Engineering, 2019. \[4\] Godefroid P. Fuzzing: Hack, art, and science\[J\]. Communications of the ACM, 2020, 63(2): 70-76. \[5\] Jung J, Tong S, Hu H, et al. WINNIE: Fuzzing Windows Applications with Harness Synthesis and Fast Cloning\[C\]//Proceedings of the 2021 Annual Network and Distributed System Security Symposium (NDSS), Virtual. 2021. \[6\] Zhang X, Feng C, Lei J, Tang C J. Real time idle state detection method in fuzzing test in GUI program\[J\]. Ruan Jian Xue Bao/Journal of Software, 2018, 29(5): 1288-1302. \[7\] QUAN JIN. How I Found 16 Microsoft Office Excel Vulnerabilities in 6 Months. \[8\] R. Freingruber. Fuzzing Closed Source Applications. 2017. \[9\] Nadav Grossman. Extracting a 19 Year Old Code Execution from WinRAR. \[10\] D. Babic, S. Bucur, Y. Chen, F. Ivan, T. King, M. Kusano, C. Lemieux, L. Szekeres, and W. Wang. FUDGE: Fuzz Driver Generation At Scale\[C\]//Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, 2019, pp. 975--985. \[11\] K. K. Ispoglou, D. Austin, V. Mohan, and M. Payer. FuzzGen: Automatic Fuzzer Generation\[C\]//Proceedings of the 29th USENIX Security Symposium (Security), Boston, MA, USA, Aug. 2020. \[12\] Zhang M, Liu J, Ma F, et al. IntelliGen: Automatic Driver Synthesis for Fuzz Testing\[C\]//2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 2021: 318-327. \[13\] Zhang C, Lin X, Li Y, et al. APICraft: Fuzz Driver Generation for Closed-source {SDK} Libraries\[C\]//30th {USENIX} Security Symposium ({USENIX} Security 21). 2021: 2811-2828. \[14\] Chen W, Wang Y, Zhang Z, et al. SyzGen: Automated Generation of Syscall Specification of Closed-Source macOS Drivers\[J\]. 2021. \[15\] Chen Y, Zhong R, Hu H, et al. One engine to fuzz'em all: Generic language processor testing with semantic validation\[C\]//Proceedings of the 42nd IEEE Symposium on Security and Privacy (Oakland). 2021. \[16\] Bruening D, Garnett T. Building dynamic instrumentation tools with DynamoRIO\[C\]//Proc. Int. Conf. IEEE/ACM Code Generation and Optimization (CGO), Shenzhen, China. 2013. *** ** * ** *** ## via: * DriverGen: Automating the Generation of Serial Device Driver978-3-319-47075-7_37 * DriverGen: Automatic Driver Generation for Windows GUI program \| IEEE Conference Publication \| IEEE Xplore