[Dart/Flutter] 让你的字符串搜索提速一倍,无需任何复杂算法知识。

哈哈哈,标题党了,其实这件事说起来非常简单,一言以蔽之:Dart的正则搜索比字符串搜索快很多,所以直接将字符串转换成正则表达式来搜索就行了。

我们先做一个小测试:

下载微软研究院的WikiQA语料库,在里面进行搜索。

先导入库:

dart 复制代码
import 'dart:io';

late final String file;

定义一个计时器:

dart 复制代码
timer(String algorithmName, String target,
    Iterable<Match> Function(String target, String text) findFunction) {
  final start = DateTime.now().millisecondsSinceEpoch;
  final result = findFunction.call(target, file);
  final length = result.length;
  final end = DateTime.now().millisecondsSinceEpoch;
  print('$algorithmName finds $length matches, costs ${end - start} milliseconds.');
}

有人可能会问为什么要单独建立一个length变量,直接在print里面引用${result.length}不行吗?

还真不行,我们后面会解释。

下面进行搜索:

dart 复制代码
Future<void> main() async {
  file = await File('WikiQA').readAsString();
  final targets = [
    "newspaper editor",
    "school girl",
    "United States",
    "Chinese Civil War",
    "native speaker",
    "very interesting",
    "disappeared",
    "African Americans",
    "There are also",
    "nuclear power",
    "comprehensive astronomical",
    "First Lady of the United States"
  ];
  
  for (String target in targets) {
    print('\nSearching "$target":');
    timer('String', target, (target, text) => target.allMatches(text));
    timer('RegExp', target, (target, text) => RegExp(RegExp.escape(target)).allMatches(text));
  }
}

targets里面的字符串其实是我在语料库里面看心情随便选出来的,当然我们也可以用math库的Random类来实现一个在语料库里随机选择字符串的函数,但是我懒得写。

当然,如果有谁要觉得我是故意选出来一些正则搜得比字符串搜得快的案例来博人眼球,那倒也duck不必,因为我们后面会阅读源码来证明正则是真的快。

运行结果如下:

dart 复制代码
Searching "newspaper editor":
String finds 2 matches, costs 24 milliseconds.
RegExp finds 2 matches, costs 10 milliseconds.

Searching "school girl":
String finds 1 matches, costs 20 milliseconds.
RegExp finds 1 matches, costs 9 milliseconds.

Searching "United States":
String finds 3078 matches, costs 18 milliseconds.
RegExp finds 3078 matches, costs 9 milliseconds.

Searching "Chinese Civil War":
String finds 21 matches, costs 17 milliseconds.
RegExp finds 21 matches, costs 12 milliseconds.

Searching "native speaker":
String finds 6 matches, costs 20 milliseconds.
RegExp finds 6 matches, costs 12 milliseconds.

Searching "very interesting":
String finds 1 matches, costs 18 milliseconds.
RegExp finds 1 matches, costs 12 milliseconds.

Searching "disappeared":
String finds 9 matches, costs 19 milliseconds.
RegExp finds 9 matches, costs 9 milliseconds.

Searching "African Americans":
String finds 14 matches, costs 17 milliseconds.
RegExp finds 14 matches, costs 11 milliseconds.

Searching "There are also":
String finds 18 matches, costs 18 milliseconds.
RegExp finds 18 matches, costs 13 milliseconds.

Searching "nuclear power":
String finds 19 matches, costs 21 milliseconds.
RegExp finds 19 matches, costs 11 milliseconds.

Searching "comprehensive astronomical":
String finds 2 matches, costs 18 milliseconds.
RegExp finds 2 matches, costs 11 milliseconds.

Searching "First Lady of the United States":
String finds 9 matches, costs 17 milliseconds.
RegExp finds 9 matches, costs 13 milliseconds.

我们看到,最快的能将时间缩短到原来的一半还短。

我们上面的正则搜索的语句是这样写的:

dart 复制代码
RegExp(RegExp.escape(target)).allMatches(text)

RegExp.escape(target)是将正则表达式自动转义的方法,可以避免一些特殊符号被识别成正则语法。转义后的字符串用RegExp(String text)即可转换为正则表达式。

但是这样写仍然太麻烦了,于是我们可以将其写成一个extension,方便后续调用:

dart 复制代码
extension on String {
  Iterable<RegExpMatch> allRegexMatches(String input, [int start = 0]) =>
    RegExp(RegExp.escape(this)).allMatches(input, start);
}

后续只要使用target.allRegexMatches(text)即可。

不想看具体原理的到这里的就可以结束了。 ---------------------------------------------------------------------------

然后我们来读源码:

以下代码位于dart-lang/sdk/lib/_internal/vm/lib/string_patch.dart中:

dart 复制代码
Iterable<Match> allMatches(String string, [int start = 0]) {
  if (start < 0 || start > string.length) {
    throw new RangeError.range(start, 0, string.length, "start");
  }
  return new _StringAllMatchesIterable(string, this, start);
}

显然下一步应该去找_StringAllMatchesIterable

dart 复制代码
final class _StringAllMatchesIterable extends Iterable<Match> {
  final String _input;
  final String _pattern;
  final int _index;

  _StringAllMatchesIterable(this._input, this._pattern, this._index);

  Iterator<Match> get iterator =>
      new _StringAllMatchesIterator(_input, _pattern, _index);

  Match get first {
    int index = _input.indexOf(_pattern, _index);
    if (index >= 0) {
      return new _StringMatch(index, _input, _pattern);
    }
    throw IterableElementError.noElement();
  }
}

final class _StringAllMatchesIterator implements Iterator<Match> {
  final String _input;
  final String _pattern;
  int _index;
  Match? _current;

  _StringAllMatchesIterator(this._input, this._pattern, this._index);

  bool moveNext() {
    if (_index + _pattern.length > _input.length) {
      _current = null;
      return false;
    }
    var index = _input.indexOf(_pattern, _index);
    if (index < 0) {
      _index = _input.length + 1;
      _current = null;
      return false;
    }
    int end = index + _pattern.length;
    _current = new _StringMatch(index, _input, _pattern);
    // Empty match, don't start at same location again.
    if (end == _index) end++;
    _index = end;
    return true;
  }

  Match get current => _current as Match;
}

(以上代码能看出来,allMatches返回的只是一个Iterable,搜索实际上是在访问时才真正进行的,所以上面的timer函数必须在获取结束时间之前访问一次这个Iterable,否则返回的时间差理论上是0。)

可以看到搜索时使用的是_input.indexOf(_pattern, _index)

然后去看indexOf的源码:

dart 复制代码
int indexOf(Pattern pattern, [int start = 0]) {
  if ((start < 0) || (start > this.length)) {
    throw new RangeError.range(start, 0, this.length, "start");
  }
  if (pattern is String) {
    String other = pattern;
    int maxIndex = this.length - other.length;
    // TODO: Use an efficient string search (e.g. BMH).
    for (int index = start; index <= maxIndex; index++) {
      if (_substringMatches(index, other)) {
        return index;
      }
    }
    return -1;
  }
  for (int i = start; i <= this.length; i++) {
    // TODO(11276); This has quadratic behavior because matchAsPrefix tries
    // to find a later match too. Optimize matchAsPrefix to avoid this.
    if (pattern.matchAsPrefix(this, i) != null) return i;
  }
  return -1;
}

震惊!居然就是for (int index = start; index <= maxIndex; index++)爆算,那么正则搜索比字符串搜索更快也不足为奇了。

相关推荐
孤鸿玉11 小时前
Fluter InteractiveViewer 与ScrollView滑动冲突问题解决
flutter
叽哥18 小时前
Flutter Riverpod上手指南
android·flutter·ios
BG1 天前
Flutter 简仿Excel表格组件介绍
flutter
zhangmeng2 天前
FlutterBoost在iOS26真机运行崩溃问题
flutter·app·swift
恋猫de小郭2 天前
对于普通程序员来说 AI 是什么?AI 究竟用的是什么?
前端·flutter·ai编程
卡尔特斯2 天前
Flutter A GlobalKey was used multipletimes inside one widget'schild list.The ...
flutter
w_y_fan2 天前
Flutter 滚动组件总结
前端·flutter
醉过才知酒浓2 天前
Flutter Getx 的页面传参
flutter
火柴就是我3 天前
flutter 之真手势冲突处理
android·flutter
Speed1233 天前
`mockito` 的核心“打桩”规则
flutter·dart