APM框架Matrix源码分析（十）EvilMethodTracer之慢函数监控

EvilMethodTracer慢函数监控原理是基于：

LooperMonitor监听主线程消息开始和结束（APM框架Matrix源码分析（二）LooperAnrTracer卡顿ANR监控）
字节码插桩插入i和o（APM框架Matrix源码分析（七）字节码插桩）
AppMethodBeat提供方法回溯（APM框架Matrix源码分析（九）AppMethodBeat源码分析）

TracePlugin启动的时候会调用EvilMethodTracer的onAlive方法

java 复制代码

@Override
public void onAlive() {
    super.onAlive();
    //慢函数监控开关配置
    if (isEvilMethodTraceEnable) {
       //注册消息监听
       LooperMonitor.register(this);
    }
}

添加了消息监听，消息分发前执行onDispatchBegin，分发结束执行onDispatchEnd

java 复制代码

@Override
public void onDispatchBegin(String log) {
    //通过maskIndex标记起点
    indexRecord = AppMethodBeat.getInstance().maskIndex("EvilMethodTracer#dispatchBegin");
}

java 复制代码

@Override
public void onDispatchEnd(String log, long beginNs, long endNs) {
    //endNs消息结束时间，beginNs消息开始时间，dispatchCost消息执行的毫秒数
    long dispatchCost = (endNs - beginNs) / Constants.TIME_MILLIS_TO_NANO;
    try {
        //大于阈值（默认700ms）认为有卡顿产生
        if (dispatchCost >= evilThresholdMs) {
            //从AppMethodBeat的sBuffer数组中获取堆栈区间，得到一个long数组
            long[] data = AppMethodBeat.getInstance().copyData(indexRecord);
            //当前页面
            String scene = AppActiveMatrixDelegate.INSTANCE.getVisibleScene();
            //子线程分析
            MatrixHandlerThread.getDefaultHandler().post(new AnalyseTask(isForeground(), scene, data, dispatchCost, endNs));
        }
    } finally {
        //删除标记的节点
        indexRecord.release();
    }
}

前面都分析过，一笔带过，接着看AnalyseTask的analyse方法，这一篇主要分析算法。

AnalyseTask

在分析之前先看下官方描述和官方图，结合着看代码就很清晰：

堆栈聚类问题：如果将收集的原始数据进行上报，数据量很大而且后台很难聚类有问题的堆栈，所以在上报之前需要对采集的数据进行简单的整合及裁剪，并分析出一个能代表卡顿堆栈的 key，方便后台聚合。

通过遍历采集的 buffer ，相邻 i 与 o 为一次完整函数执行，计算出一个调用树及每个函数执行耗时，并对每一级中的一些相同执行函数做聚合，最后通过一个简单策略，分析出主要耗时的那一级函数，作为代表卡顿堆栈的key。

java 复制代码

void analyse() {
            LinkedList<MethodItem> stack = new LinkedList<>();
            //判断copyData的long数组是否有数据
            if (data.length > 0) {
                //【 1.structuredDataToStack】构建堆栈集合
                TraceDataUtils.structuredDataToStack(data, stack, true, endMs);
                //【 2.trimStack】裁剪堆栈
                TraceDataUtils.trimStack(stack, Constants.TARGET_EVIL_METHOD_STACK, new TraceDataUtils.IStructuredDataFilter() {
                    @Override
                    public boolean isFilter(long during, int filterCount) {
                        return during < (long) filterCount * Constants.TIME_UPDATE_CYCLE_MS;
                    }

                    @Override
                    public int getFilterMaxCount() {
                        return Constants.FILTER_STACK_MAX_COUNT;
                    }

                    @Override
                    public void fallback(List<MethodItem> stack, int size) {
                        MatrixLog.w(TAG, "[fallback] size:%s targetSize:%s stack:%s", size, Constants.TARGET_EVIL_METHOD_STACK, stack);
                        Iterator<MethodItem> iterator = stack.listIterator(Math.min(size, Constants.TARGET_EVIL_METHOD_STACK));
                        while (iterator.hasNext()) {
                            iterator.next();
                            iterator.remove();
                        }
                    }
                });
            }


            StringBuilder reportBuilder = new StringBuilder();
            StringBuilder logcatBuilder = new StringBuilder();
            //【3.stackToString】（构建堆栈String）
            long stackCost = Math.max(cost, TraceDataUtils.stackToString(stack, reportBuilder, logcatBuilder));
            //【4.getTreeKey】代表卡顿堆栈的key
            String stackKey = TraceDataUtils.getTreeKey(stack, stackCost);

            MatrixLog.w(TAG, "%s", printEvil(scene, processStat, isForeground, logcatBuilder, stack.size(), stackKey, cost)); // for logcat

            // report
            try {
                TracePlugin plugin = Matrix.with().getPluginByClass(TracePlugin.class);
                if (null == plugin) {
                    return;
                }
                JSONObject jsonObject = new JSONObject();
                DeviceUtil.getDeviceInfo(jsonObject, Matrix.with().getApplication());

                jsonObject.put(SharePluginInfo.ISSUE_STACK_TYPE, Constants.Type.NORMAL);
                jsonObject.put(SharePluginInfo.ISSUE_COST, stackCost);
                jsonObject.put(SharePluginInfo.ISSUE_SCENE, scene);
                jsonObject.put(SharePluginInfo.ISSUE_TRACE_STACK, reportBuilder.toString());
                jsonObject.put(SharePluginInfo.ISSUE_STACK_KEY, stackKey);

                Issue issue = new Issue();
                issue.setTag(SharePluginInfo.TAG_PLUGIN_EVIL_METHOD);
                issue.setContent(jsonObject);
                plugin.onDetectIssue(issue);

            } catch (JSONException e) {
                MatrixLog.e(TAG, "[JSONException error: %s", e);
            }

        }

1.structuredDataToStack（构建堆栈集合）

java 复制代码

/**
 * 遍历long数组，解析出methodId，拿到函数的执行时间、深度信息封装成MethodItem，存在链表中
 * 
 * @param buffer 从AppMethodBeat的sBuffer数组中获取堆栈区间，得到一个long数组
 * @param result 存MethodItem的链表
 * @param isStrict 严格模式，慢函数监控这里为true
 * @param endTime 消息结束时间
 */
public static void structuredDataToStack(long[] buffer, LinkedList<MethodItem> result, boolean isStrict, long endTime) {
    long lastInId = 0L;
    //记录调用栈深度，后面转树用到
    int depth = 0;
    //存i的链表，为了配对o
    LinkedList<Long> rawData = new LinkedList<>();
    boolean isBegin = !isStrict;
	//遍历
    for (long trueId : buffer) {
        //过滤无效数据
        if (0 == trueId) {
            continue;
        }
        //严格模式下，从消息开始的方法才算作有效数据
        if (isStrict) {
            
            if (isIn(trueId) && AppMethodBeat.METHOD_ID_DISPATCH == getMethodId(trueId)) {
                isBegin = true;
            }
            if (!isBegin) {
                // MatrixLog.d(TAG, "never begin! pass this method[%s]", getMethodId(trueId));
                continue;
            }

        }
        //i方法直接存入rawData链表
        if (isIn(trueId)) {
            //long中解析出methodId
            lastInId = getMethodId(trueId);
            //消息开始深度为0
            if (lastInId == AppMethodBeat.METHOD_ID_DISPATCH) {
                depth = 0;
            }
            //深度自增
            depth++;
            //添加到rawData链表头部
            rawData.push(trueId);
        } else {//如果是o方法就开始配对
            //获取methodId
            int outMethodId = getMethodId(trueId);
            if (!rawData.isEmpty()) {
                //从链表rawData头部取出i方法
                long in = rawData.pop();
                //深度自减
                depth--;
                int inMethodId;
                //创建临时集合，为了将取出的方法再放回rawData
                LinkedList<Long> tmp = new LinkedList<>();
                tmp.add(in);
                //i和o的method不一样，即没有匹配，继续取匹配，直到i方法取完
                while ((inMethodId = getMethodId(in)) != outMethodId && !rawData.isEmpty()) {
                    MatrixLog.w(TAG, "pop inMethodId[%s] to continue match ouMethodId[%s]", inMethodId, outMethodId);
                    in = rawData.pop();
                    depth--;
                    //没有匹配到的存tmp
                    tmp.add(in);
                i
		     //链表rawData中的取完了还没匹配上（long数组是截取的可能截掉了i），取出的i方法塞回rawData。
                if (inMethodId != outMethodId
                        && inMethodId == AppMethodBeat.METHOD_ID_DISPATCH) {
                    MatrixLog.e(TAG, "inMethodId[%s] != outMethodId[%s] throw this outMethodId!", inMethodId, outMethodId);
                    rawData.addAll(tmp);
                    depth += rawData.size();
                    continue;
                }
		     //走到这里说明i和o匹配成功
                //获取方法执行完的时间偏移量
                long outTime = getTime(trueId);
                //获取方法执行开始的时间偏移量
                long inTime = getTime(in);
                //差值就是方法执行时间
                long during = outTime - inTime;
                //过滤无效数据
                if (during < 0) {
                    MatrixLog.e(TAG, "[structuredDataToStack] trace during invalid:%d", during);
                    rawData.clear();
                    result.clear();
                    return;
                }
                //将方法id、方法耗时、深度封装成MethodItem
                MethodItem methodItem = new MethodItem(outMethodId, (int) during, depth);
                //将MethodItem存入链表
                addMethodItem(result, methodItem);
            } else {
                MatrixLog.w(TAG, "[structuredDataToStack] method[%s] not found in! ", outMethodId);
            }
        }
    }
	//rawData不为空说明还有方法没匹配（统计区间函数还没执行完）
    while (!rawData.isEmpty() && isStrict) {
        long trueId = rawData.pop();
        int methodId = getMethodId(trueId);
        boolean isIn = isIn(trueId);
        //开始时间：初始时间+时间偏移量
        long inTime = getTime(trueId) + AppMethodBeat.getDiffTime();
        MatrixLog.w(TAG, "[structuredDataToStack] has never out method[%s], isIn:%s, inTime:%s, endTime:%s,rawData size:%s",
                methodId, isIn, inTime, endTime, rawData.size());
        //rawData中取出的都是i，过滤异常情况
        if (!isIn) {
            MatrixLog.e(TAG, "[structuredDataToStack] why has out Method[%s]? is wrong! ", methodId);
            continue;
        }
        //结束时间：endTime
        MethodItem methodItem = new MethodItem(methodId, (int) (endTime
                - inTime), rawData.size());
        //将MethodItem存入链表
        addMethodItem(result, methodItem);
    }
    //这里的result只存了调用深度，还未对调用栈排序
      
    //构建树对result进行真正的排序
    TreeNode root = new TreeNode(null, null);
    //链表转树，为了排序
    int count = stackToTree(result, root);
    MatrixLog.i(TAG, "stackToTree: count=%s", count);
    //清空result
    result.clear();
    //树转链表，此时result调用栈有序
    treeToStack(root, result);
}

遍历buffer

i 存在rawData头部
o 去匹配，从rawData头部取i去匹配o
i和o未匹配上，i存tmp，继续从rawData头部取i，直到匹配上或取完rawData，并将未匹配的i塞回rawData
i和o匹配上，封装MethodItem

structuredDataToStack通过遍历采集的 buffer ，相邻 i 与 o 为一次完整函数执行，将方法id、方法耗时、调用深度封装成MethodItem，存入链表并排序。

addMethodItem

同一个函数addMethodItem可能多次连续调用，累加方法执行耗时和次数

java 复制代码

private static int addMethodItem(LinkedList<MethodItem> resultStack, MethodItem item) {
    MethodItem last = null;
    if (!resultStack.isEmpty()) {
        //取出第一个元素
        last = resultStack.peek();
    }
    //同一个函数多次调用
    if (null != last && last.methodId == item.methodId && last.depth == item.depth
            && 0 != item.depth) {
        item.durTime = item.durTime == Constants.DEFAULT_ANR ? last.durTime : item.durTime;
        //累加方法执行耗时和次数
        last.mergeMore(item.durTime);
        return last.durTime;
    } else {
        //添加到链表头部
        resultStack.push(item);
        return item.durTime;
    }
}

stackToTree

stackToTree将链表根据深度属性转为多叉树，这里转树是为了调用栈真正的排序。

java 复制代码

public static int stackToTree(LinkedList<MethodItem> resultStack, TreeNode root) {
    //lastNode用来存父节点
    TreeNode lastNode = null;
    //ListIterator 提供了一种遍历和修改列表元素的方法,从根开始构建
    ListIterator<MethodItem> iterator = resultStack.listIterator(0);
    int count = 0;
    //遍历
    while (iterator.hasNext()) {
       //构建TreeNode，需要当前节点和父节点
        TreeNode node = new TreeNode(iterator.next(), lastNode);
        count++;
        //第一个节点深度肯定是0，过滤异常情况
        if (null == lastNode && node.depth() != 0) {
            MatrixLog.e(TAG, "[stackToTree] begin error! why the first node'depth is not 0!");
            return 0;
        }
        int depth = node.depth();
        //根节点
        if (lastNode == null || depth == 0) {
            root.add(node);
        } else if (lastNode.depth() >= depth) {//当前节点的深度比父节点还小，需要向上确定位置，越小越靠上。
            while (null != lastNode && lastNode.depth() > depth) {
                //向上找父节点，直到找到的节点深度 >= 当前节点，此时lastNode指向该节点
                lastNode = lastNode.father;
            }
            //找到并挂上去
            if (lastNode != null && lastNode.father != null) {
                node.father = lastNode.father;
                lastNode.father.add(node);
            }
        } else {
            //当前节点的深度 > 父节点，直接add
            lastNode.add(node);
        }
        lastNode = node;
    }
    return count;
}

treeToStack

java 复制代码

private static void treeToStack(TreeNode root, LinkedList<MethodItem> list) {
    //多叉树深度优先遍历
    for (int i = 0; i < root.children.size(); i++) {
        TreeNode node = root.children.get(i);
        if (null == node) continue;
        //添加本层node
        if (node.item != null) {
            list.add(node.item);
        }
        //存在子节点，递归
        if (!node.children.isEmpty()) {
            treeToStack(node, list);
        }
    }
}

2.trimStack（裁剪堆栈）

java 复制代码

/**
 * 把stack裁剪到目标大小30
 *
 * @param stack 经过排序的集合
 * @param targetCount 经过裁剪后的目标方法数
 * @param filter 过滤器
 */
public static void trimStack(List<MethodItem> stack, int targetCount, IStructuredDataFilter filter) {
    //如果设置的targetCount无效不处理
    if (0 > targetCount) {
        stack.clear();
        return;
    }

    int filterCount = 1;
    int curStackSize = stack.size();
    //当前stack大小 > targetCount，开始裁剪
    while (curStackSize > targetCount) {
        //转迭代器倒序删除，调用栈末尾子节点耗时相对较小
        ListIterator<MethodItem> iterator = stack.listIterator(stack.size());
        while (iterator.hasPrevious()) {
            MethodItem item = iterator.previous();
            //过滤耗时小于5ms的filterCount倍数的方法（阈值5ms.10ms,15ms,20ms...）
            if (filter.isFilter(item.durTime, filterCount)) {
                iterator.remove();
                curStackSize--;
                //目标达成则结束
                if (curStackSize <= targetCount) {
                    return;
                }
            }
        }
        curStackSize = stack.size();
        filterCount++;
        //对外层循环次数做控制
        if (filter.getFilterMaxCount() < filterCount) {
            //如果走到说明stack中耗时300ms以内的都删完了
            break;
        }
    }
    int size = stack.size();
    //如果还是没有裁剪到执行大小，就把超过targetCount的部分直接移除
    if (size > targetCount) {
        filter.fallback(stack, size);
    }
}

从后往前遍历（调用栈末尾子节点耗时相对较小），每次循环删除耗时小于指定时间（5ms*循环次数）的方法，目标达成则结束，同时控制循环次数（60次），经过上述操作还没达到目标，直接删除超出的部分。

3.stackToString（构建堆栈String）

java 复制代码

     //构建堆栈StringBuilder，并返回dispatchCost和方法耗时的最大值stackCost
     long stackCost = Math.max(cost, TraceDataUtils.stackToString(stack, reportBuilder, logcatBuilder));

java 复制代码

    public static long stackToString(LinkedList<MethodItem> stack, StringBuilder reportBuilder, StringBuilder logcatBuilder) {
            //logcatBuilder用于logcat日志打印 [id count cost]
            logcatBuilder.append("|*\t\tTraceStack:").append("\n");
            logcatBuilder.append("|*\t\t[id count cost]").append("\n");
            Iterator<MethodItem> listIterator = stack.iterator();
            long stackCost = 0; // fix cost
            while (listIterator.hasNext()) {
                MethodItem item = listIterator.next();
                //reportBuilder用于上报 格式：depth + "," + methodId + "," + count + "," + durTime
                reportBuilder.append(item.toString()).append('\n');
                logcatBuilder.append("|*\t\t").append(item.print()).append('\n');
    
                if (stackCost < item.durTime) {
                    //存方法最大耗时
                    stackCost = item.durTime;
                }
            }
            return stackCost;
        }

遍历stack，构建上报的堆栈StringBuilder，格式：调用深度，方法id，调用次数，方法耗时

示例：

swift 复制代码

"0,236,1,142\n1,243,1,26\n2,37,1,21\n0,1048574,1,428\n1,199,1,377\n2,9259,1,21\n2,2616,1,193\n3,3809,1,183\n4,3827,1,117\n5,3829,1,87\n6,10595,1,26\n6,15035,1,20\n5,10827,1,30\n6,15311,1,30\n7,15316,1,30\n8,15102,1,30\n2,23976,1,61\n3,23977,1,61\n4,24324,1,61\n5,23614,1,61\n6,23615,1,61\n7,23612,1,61\n0,1048574,1,53\n1,10800,1,21\n0,1048574,1,20\n0,1048574,1,34\n1,10802,1,20\n0,1048574,1,20\n0,1048574,1,62\n1,535,1,51\n"

4.getTreeKey （代表卡顿堆栈的key）

分析出主要耗时的那一级函数，作为代表卡顿堆栈的key。

java 复制代码

    public static String getTreeKey(List<MethodItem> stack, long stackCost) {
            StringBuilder ss = new StringBuilder();
            // 主要耗时方法的标准：大于等于总耗时30%
            long allLimit = (long) (stackCost * Constants.FILTER_STACK_KEY_ALL_PERCENT);
    
            LinkedList<MethodItem> sortList = new LinkedList<>();
            //过滤主要耗时的方法
            for (MethodItem item : stack) {
                if (item.durTime >= allLimit) {
                    sortList.add(item);
                }
            }
            //排序
            Collections.sort(sortList, new Comparator<MethodItem>() {
                @Override
                public int compare(MethodItem o1, MethodItem o2) {
                    //depath在这里相当于权重? 
                    return Integer.compare((o2.depth + 1) * o2.durTime, (o1.depth + 1) * o1.durTime);
                }
            });
            //没有主要耗时则取第一个
            if (sortList.isEmpty() && !stack.isEmpty()) {
                MethodItem root = stack.get(0);
                sortList.add(root);
            } else if (sortList.size() > 1
                    && sortList.peek().methodId == AppMethodBeat.METHOD_ID_DISPATCH) {
                //如果第一个方法是dipatchMessage则去掉
                sortList.removeFirst();
            }
            //拼接字符串，格式：方法id|
            for (MethodItem item : sortList) {
                ss.append(item.methodId + "|");
                break;
            }
            return ss.toString();
        }

小结

本篇主要对获取堆栈信息的算法做了分析：

从AppMethodBeat中的sBuffer中copyData一份long数组
通过structuredDataToStack构建方法集合（匹配i和o，将方法id、调用深度、方法耗时、调用次数封装成MethodItem，存如stack）
根据stack和元素深度属性构建多叉树得到一个调用栈树
再对多叉树进行深度优先遍历得到一个排序好的stack
对排序好的stack进行裁剪，优先移除耗时较短的方法，直到大小不超过30
构建上报的堆栈String（格式：调用深度，方法id，调用次数，方法耗时）
分析出主要耗时（大于等于总耗时30%）的那一级函数，作为代表卡顿堆栈的key。