对expat库XML_Parse函数调用优化的测试

xpat库文档中说

复制代码
XML_Parse
enum XML_Status XMLCALL
XML_Parse(XML_Parser p,
          const char *s,
          int len,
          int isFinal);
enum XML_Status {
  XML_STATUS_ERROR = 0,
  XML_STATUS_OK = 1
};
Parse some more of the document. The string s is a buffer containing part (or perhaps all) of the document. The number of bytes of s that are part of the document is indicated by len. This means that s doesn't have to be null-terminated. It also means that if len is larger than the number of bytes in the block of memory that s points at, then a memory fault is likely. Negative values for len are rejected since Expat 2.2.1. The isFinal parameter informs the parser that this is the last piece of the document. Frequently, the last piece is empty (i.e. len is zero.)

If a parse error occurred, it returns XML_STATUS_ERROR. Otherwise it returns XML_STATUS_OK value. Note that regardless of the return value, there is no guarantee that all provided input has been parsed; only after the concluding call will all handler callbacks and parsing errors have happened.

Simplified, XML_Parse can be considered a convenience wrapper that is pairing calls to XML_GetBuffer and XML_ParseBuffer (when Expat is built with macro XML_CONTEXT_BYTES defined to a positive value, which is both common and default). XML_Parse is then functionally equivalent to calling XML_GetBuffer, memcpy, and XML_ParseBuffer.

To avoid double copying of the input, direct use of functions XML_GetBuffer and XML_ParseBuffer is advised for most production use, e.g. if you're using read or similar functionality to fill your buffers, fill directly into the buffer from XML_GetBuffer, then parse with XML_ParseBuffer.

最后两段说,这个函数其实是XML_GetBuffer和XML_ParseBuffer两个函数的包装,再在中间插入从用户buffer到parser buffer的复制,如果read函数直接用parser buffer当缓冲区,就可以省略memcpy的操作。

我用先前的xml文件转csv程序做了个测试,

原代码expatfile.c调用XML_Parse

c 复制代码
    char buffer[8192];
    int done;
    do {
        size_t len = fread(buffer, 1, sizeof(buffer), file);
        done = (len < sizeof(buffer));
  
        if (XML_Parse(parser, buffer, len, done) == XML_STATUS_ERROR) {
            break;
        }
    } while (!done);

修改后expatfile2.c调用XML_GetBuffer和XML_ParseBuffer

c 复制代码
    char buffer[8192];
    int done;
    do {void *buff = XML_GetBuffer(parser, 8192);
        size_t len = fread(buff, 1, 8192, file);
        done = (len < 8192);
  
        if (XML_ParseBuffer(parser, len, done) == XML_STATUS_ERROR) {
            break;
        }
    } while (!done);

编译运行

cmd 复制代码
gcc expatfile.c -o expatfile -lexpat -O3

time ./expatfile /par/lineitem/xl/worksheets/sheet1.xml A1:P1000000
CSV已保存到 /par/lineitem/xl/worksheets/sheet1.csv

real	0m18.882s
user	0m18.168s
sys	0m0.324s

gcc expatfile2.c -o expatfile2 -lexpat -O3

time ./expatfile2 /par/lineitem/xl/worksheets/sheet1.xml A1:P1000000
CSV已保存到 /par/lineitem/xl/worksheets/sheet1.csv

real	0m18.909s
user	0m18.116s
sys	0m0.284s

测试证明,两种调用几乎没有差别,也许现在memcpy很快,体现不出来影响了。

相关推荐
叫我木子2 小时前
c语言,识别到黑色就自动开枪,4399单击游戏狙击战场,源码分享,豆包ai出品
c语言·人工智能·游戏
l1t3 小时前
利用美团龙猫添加xlsx的sheet.xml读取sharedStrings.xml中共享字符串输出到csv功能
xml·c语言·数据结构·人工智能·算法·解析器
叶 落4 小时前
[Maven 基础课程]pom.xml
xml·pom.xml 常见配置项·maven 的 pom.xml
北城以北88884 小时前
Java高级编程--XML
xml·java·开发语言·intellij-idea
l1t4 小时前
DeepSeek辅助编写的利用quick_xml把xml转为csv的rust程序
xml·开发语言·人工智能·rust·解析器·quick-xml
l1t4 小时前
how to build tbox xml into the demo
xml·linux·c语言·parser·tbox
九皇叔叔5 小时前
【2】标识符
c语言
野生的编程萌新6 小时前
【C++深学日志】从0开始的C++生活
c语言·开发语言·c++·算法
Hello_Embed9 小时前
STM32HAL 快速入门(二十四):I2C 编程(一)—— 从 OLED 显示初识 I2C 协议
c语言·stm32·单片机·嵌入式硬件·学习