对expat库XML_Parse函数调用优化的测试

xpat库文档中说

复制代码
XML_Parse
enum XML_Status XMLCALL
XML_Parse(XML_Parser p,
          const char *s,
          int len,
          int isFinal);
enum XML_Status {
  XML_STATUS_ERROR = 0,
  XML_STATUS_OK = 1
};
Parse some more of the document. The string s is a buffer containing part (or perhaps all) of the document. The number of bytes of s that are part of the document is indicated by len. This means that s doesn't have to be null-terminated. It also means that if len is larger than the number of bytes in the block of memory that s points at, then a memory fault is likely. Negative values for len are rejected since Expat 2.2.1. The isFinal parameter informs the parser that this is the last piece of the document. Frequently, the last piece is empty (i.e. len is zero.)

If a parse error occurred, it returns XML_STATUS_ERROR. Otherwise it returns XML_STATUS_OK value. Note that regardless of the return value, there is no guarantee that all provided input has been parsed; only after the concluding call will all handler callbacks and parsing errors have happened.

Simplified, XML_Parse can be considered a convenience wrapper that is pairing calls to XML_GetBuffer and XML_ParseBuffer (when Expat is built with macro XML_CONTEXT_BYTES defined to a positive value, which is both common and default). XML_Parse is then functionally equivalent to calling XML_GetBuffer, memcpy, and XML_ParseBuffer.

To avoid double copying of the input, direct use of functions XML_GetBuffer and XML_ParseBuffer is advised for most production use, e.g. if you're using read or similar functionality to fill your buffers, fill directly into the buffer from XML_GetBuffer, then parse with XML_ParseBuffer.

最后两段说,这个函数其实是XML_GetBuffer和XML_ParseBuffer两个函数的包装,再在中间插入从用户buffer到parser buffer的复制,如果read函数直接用parser buffer当缓冲区,就可以省略memcpy的操作。

我用先前的xml文件转csv程序做了个测试,

原代码expatfile.c调用XML_Parse

c 复制代码
    char buffer[8192];
    int done;
    do {
        size_t len = fread(buffer, 1, sizeof(buffer), file);
        done = (len < sizeof(buffer));
  
        if (XML_Parse(parser, buffer, len, done) == XML_STATUS_ERROR) {
            break;
        }
    } while (!done);

修改后expatfile2.c调用XML_GetBuffer和XML_ParseBuffer

c 复制代码
    char buffer[8192];
    int done;
    do {void *buff = XML_GetBuffer(parser, 8192);
        size_t len = fread(buff, 1, 8192, file);
        done = (len < 8192);
  
        if (XML_ParseBuffer(parser, len, done) == XML_STATUS_ERROR) {
            break;
        }
    } while (!done);

编译运行

cmd 复制代码
gcc expatfile.c -o expatfile -lexpat -O3

time ./expatfile /par/lineitem/xl/worksheets/sheet1.xml A1:P1000000
CSV已保存到 /par/lineitem/xl/worksheets/sheet1.csv

real	0m18.882s
user	0m18.168s
sys	0m0.324s

gcc expatfile2.c -o expatfile2 -lexpat -O3

time ./expatfile2 /par/lineitem/xl/worksheets/sheet1.xml A1:P1000000
CSV已保存到 /par/lineitem/xl/worksheets/sheet1.csv

real	0m18.909s
user	0m18.116s
sys	0m0.284s

测试证明,两种调用几乎没有差别,也许现在memcpy很快,体现不出来影响了。

相关推荐
奔跑吧邓邓子4 小时前
【C语言实战(44)】C语言打造全能简易计算器:突破运算极限
c语言·实战·全能简易计算器
小志biubiu7 小时前
linux_缓冲区及简单libc库【Ubuntu】
linux·运维·服务器·c语言·学习·ubuntu·c
MeowKnight9587 小时前
【C】函数指针
c语言·1024程序员节
胜天半月子8 小时前
嵌入式开发 | C语言 | 单精度浮点数4字节可以表示的范围计算过程
c语言·嵌入式c·1024程序员节·单精度浮点数计算
m0_748233649 小时前
单调栈详解【C/C++】
c语言·c++·算法·1024程序员节
海上Bruce9 小时前
C primer plus (第六版)第十一章 编程练习第14题
c语言
Yupureki10 小时前
从零开始的C++学习生活 14:map/set的使用和封装
c语言·数据结构·c++·学习·visual studio·1024程序员节
一匹电信狗10 小时前
【LeetCode_876_2.02】快慢指针在链表中的简单应用
c语言·数据结构·c++·算法·leetcode·链表·stl
胖咕噜的稞达鸭10 小时前
算法入门---专题二:滑动窗口2(最大连续1的个数,无重复字符的最长子串 )
c语言·数据结构·c++·算法·推荐算法·1024程序员节
Yupureki11 小时前
从零开始的C++学习生活 15:哈希表的使用和封装unordered_map/set
c语言·数据结构·c++·学习·visual studio·1024程序员节