排序匹配算法
背景:在工作中出现这样一种场景,产线设备在工作中会产生大量报警详细,我们需要对这些报警特别是对各类型报警的时长及报警频次进行统计分析,以对设备故障问题的定位提供数据分析。但是目前的背景是设备的报警及报警结束都是各自独立的详细。如果要计算报警信息的时长就需要做好报警开始及报警结束两条信息的匹配。
匹配的原则
- 设备信息一致
- 报警类型一致
- 报警状态分别为:报警开始、报警结束
- 报警结束时间大于报警开始时间,且为上述条件满足下时间最接近的一条
实体建模
报警实体类
c#
public class Alarm
{
/// <summary>
/// 设备编号
/// </summary>
public string EquipmentCode { get; set; }
/// <summary>
/// 报警内容
/// </summary>
public string AlarmContent { get; set; }
/// <summary>
/// 报警状态
/// </summary>
public AlarmStatusEnum AlarmStatus { get; set; }
/// <summary>
/// 创建时间
/// </summary>
public DateTime CreateTime { get; set; }
/// <summary>
/// 报警编号
/// </summary>
public string AlarmCode { get; set; }
}
/// <summary>
/// 报警状态
/// </summary>
public enum AlarmStatusEnum
{
Start = 0,
End = 1
}
输出实体类
c#
public class AlarmDto
{
/// <summary>
/// 设备编号
/// </summary>
public string EquipmentCode { get; set; }
/// <summary>
/// 报警编号
/// </summary>
public string AlarmCode { get; set; }
/// <summary>
/// 报警内容
/// </summary>
public string AlarmContent { get; set; }
/// <summary>
/// 创建时间
/// </summary>
public DateTime StartTime { get; set; }
/// <summary>
/// 结束时间
/// </summary>
public DateTime EndTime { get; set; }
/// <summary>
/// 报警时长
/// </summary>
public string Duration { get; set; }
}
现在要查出[startTime,deadLine]
这个时间段内的报警记录匹配合并,以便后续计算这个时间段内的各报警类型的时长及次数
原始算法
c#
public static List<AlarmDto> CrateAlarmDto(List<Alarm> alarms, DateTime startTime,DateTime deadLine)
{
//1、分组
var startAlarms = alarms.Where(s=>s.AlarmStatus == AlarmStatusEnum.Start).OrderBy(s=>s.CreateTime).ToList();
var endAlarms = alarms.Where(s => s.AlarmStatus == AlarmStatusEnum.End).OrderBy(s => s.CreateTime).ToList();
var alarmDtos = new List<AlarmDto>();
foreach (var alarm in startAlarms)
{
AlarmDto windingAlarmDto = new AlarmDto
{
EquipmentCode = alarm.EquipmentCode,
AlarmContent = alarm.AlarmContent,
StartTime = alarm.CreateTime
};
//一定要按照AlarmCode去匹配因为,不同的AlarmCode会有相同的报错内容
var endAlarm = endAlarms.Where(s=>s.EquipmentCode ==alarm.EquipmentCode && s.AlarmCode.Equals(alarm.AlarmCode) && s.CreateTime >= alarm.CreateTime).OrderBy(s=>s.CreateTime).FirstOrDefault();
if (endAlarm != null)
{
windingAlarmDto.EndTime = endAlarm.CreateTime;
//从原endAlarms里移除?有必要吗?同一个几台会出现连续两个相同报错吗?验证不会出现这种情况,但是这样会逐渐减少endAlarm的长度,匹配会更快
endAlarms.Remove(endAlarm);
}
else
{
//没有匹配到的报警开始的记录结束时间肯定超过deadLine了,我们只统计范围内的所以把deadLine作为结束时间
windingAlarmDto.EndTime = deadLine;
}
TimeSpan ts = windingAlarmDto.EndTime - windingAlarmDto.StartTime;
windingAlarmDto.Duration = $"{ts.Minutes:D2}分{ts.Seconds:D2}秒";
alarmDtos.Add(windingAlarmDto);
}
//如果endAlarms还有剩下的,说明是前面没有匹配到的,用给出的筛选的开始时间作为开始时间
if (endAlarms.Any())
{
foreach (var alarm in endAlarms)
{
AlarmDto windingAlarmDto = new AlarmDto
{
EquipmentCode = alarm.EquipmentCode,
AlarmContent = alarm.AlarmContent,
StartTime = startTime,
EndTime = alarm.CreateTime
};
TimeSpan ts = windingAlarmDto.EndTime - windingAlarmDto.StartTime;
windingAlarmDto.Duration = $"{ts.Minutes:D2}分{ts.Seconds:D2}秒";
alarmDtos.Add(windingAlarmDto);
}
}
return alarmDtos;
}
改进后的算法
c#
public static List<AlarmDto> CrateAlarmDtoNew(List<Alarm> alarms, DateTime startTime, DateTime deadLine)
{
var startAlarms = alarms.Where(s => s.AlarmStatus == AlarmStatusEnum.Start).ToList();
var endAlarms = alarms.Where(s => s.AlarmStatus == AlarmStatusEnum.End).ToList();
var alarmDtos = new List<AlarmDto>();
var endGroups = endAlarms
.GroupBy(e => new { e.EquipmentCode, e.AlarmCode })
.ToDictionary(
g => g.Key,
g => g.OrderBy(e => e.CreateTime).ToList()
);
var leftEndList1 = endGroups.SelectMany(s => s.Value).ToList();
foreach (var alarm in startAlarms.OrderBy(a => a.CreateTime))
{
AlarmDto windingAlarmDto = new AlarmDto
{
EquipmentCode = alarm.EquipmentCode,
AlarmContent = alarm.AlarmContent,
AlarmCode = alarm.AlarmCode,
StartTime = alarm.CreateTime
};
//查找有没有对应的同类型设备的报警信息集合
var key = new { alarm.EquipmentCode, alarm.AlarmCode };
if (endGroups.TryGetValue(key, out var endList))
{
// 二分查找,按定义的CompareTo方法找alarm的元素,如果没找到就返回CompareTo大于alarm的元素的补集,如果没有CompareTo大于alarm的元素就返回endList.Count
int index = endList.BinarySearch(alarm, Comparer<Alarm>.Create((x, y) => x.CreateTime.CompareTo(y.CreateTime)));
if (index < 0) index = ~index;
if (index < endList.Count)
{
var endAlarm = endList[index];
windingAlarmDto.EndTime = endAlarm.CreateTime;
endGroups[key].RemoveAt(index); //通过下标移除对应对象,效率更高
}
}
//没有对应的匹配报警结束的信息则用截至时间去匹配
if (windingAlarmDto.EndTime == DateTime.MinValue)
{
windingAlarmDto.EndTime = deadLine;
}
TimeSpan ts = windingAlarmDto.EndTime - windingAlarmDto.StartTime;
windingAlarmDto.Duration = $"{ts.Minutes:D2}分{ts.Seconds:D2}秒";
alarmDtos.Add(windingAlarmDto);
}
//如果endAlarms还有剩下的,说明是报警时间在startTime之前的,我们用startTime作为起始时间
var leftEndList = endGroups.SelectMany(s=>s.Value).ToList();
if (leftEndList.Any())
{
foreach (var alarm in leftEndList)
{
AlarmDto windingAlarmDto = new AlarmDto
{
EquipmentCode = alarm.EquipmentCode,
AlarmContent = alarm.AlarmContent,
AlarmCode = alarm.AlarmCode,
StartTime = startTime,
EndTime = alarm.CreateTime
};
TimeSpan ts = windingAlarmDto.EndTime - windingAlarmDto.StartTime;
windingAlarmDto.Duration = $"{ts.Minutes:D2}分{ts.Seconds:D2}秒";
alarmDtos.Add(windingAlarmDto);
}
}
return alarmDtos;
}
结论
整体算测试下来改进后的算法匹配更快,2000左右的数据2毫秒左右,原始算法14毫秒左右。且数据量越大,改进后的算法优势更明显。