在大规模图数据处理中,一个常见且棘手的问题是标识符冲突。当多个数据源或业务链合并时,不同节点可能拥有相同的名称、ID 或其他业务标识符,而业务规则要求全局唯一性。简单粗暴的重命名会破坏引用关系,手动处理则效率低下且容易出错。本文介绍的代码框架正是为了解决这一问题而设计的------它提供了一套完整的图数据标准化与去重引擎,支持自动检测标识符冲突、按可配置策略进行重命名,并自动修复内部引用,确保数据一致性。
一、整体架构概览
该框架由四个主要模块构成:
GraphFoundation.Normalization:核心标准化引擎,负责标识符提取、冲突分析与重命名计划生成。
GraphFoundation.Topology:拓扑变换框架,通过连接转换器对图结构进行修改(如删除冗余节点、重定向边)。
GraphFoundation.Normalization.Bridge:桥接层,提供线程安全的图访问器和命令工厂,将事务性操作封装为命令模式。
TopologyCore:高级智能去重引擎,引入策略优先级评估、表达式求值与决策机制,满足更复杂的业务去重场景。
各模块之间通过接口解耦,支持依赖注入,便于扩展和替换组件。
二、线程安全与事务支持
- 图访问器与锁管理
DefaultGraphAccessor<TKey, TNodeData, TEdgeData> 封装了对底层图 ICoreGraph 的访问,并通过 ILockManager(默认为 ReaderWriterLockManager)提供读写锁支持。所有读操作(如获取节点数据)自动进入读锁,写操作(如更新节点)自动进入写锁,同时支持可升级的读锁以应对复杂操作场景。
csharp
public TNodeData? GetNodeData(TKey key) => ReadOperation(() => _graph.GetNodeData(key));
public void UpdateNode(TKey key, TNodeData newData, string? newSymbol) => WriteOperation(() => _graph.UpdateNode(key, newData, newSymbol));
- 命令模式与事务
节点更新操作被封装为 UpdateNodeCommand,其 Execute 方法记录旧值、设置新组件并生成 NodeUpdatedEvent。命令被放入事务队列,通过 ITransaction 统一提交,保证了原子性和可回滚性。
csharp
public void Execute(TransactionContextBase context)
{
// 保存旧值,设置新组件,添加待处理事件
}
public void Undo(TransactionContextBase context)
{
// 恢复旧值
}
这种设计使得批量重命名操作要么全部成功,要么全部回滚,确保图状态的一致性。
三、标识符提取与冲突检测
- 标识符处理器接口
IIdentifierHandler<TKey, TNodeData> 是框架识别"什么算作标识符"的核心抽象。它定义了:
IsUniqueGlobally:该标识符是否需要全局唯一(如主键)还是仅作为符号(如显示名)。
ExtractValues:从节点数据中提取标识符字符串,并标记是否为主标识符。
UpdateValue:在冲突解决后,将新标识符写回节点对象。
框架内置了对 Key、Symbol 和附加符号的处理器,开发者也可实现自定义处理器来适配任意业务字段。
- 提取器与快照
IdentifierExtractor<TKey, TNodeData, TEdgeData> 负责扫描给定节点位置列表:
构建全局值计数映射,统计每个标识符值在全部节点中出现的次数。
生成每个处理器的快照(HandlerSnapshot),其中包含出现次数大于1的重复值组,以及每个值对应的节点位置列表和是否为"主标识符"的标志。
四、重命名计划与执行
- 计划生成器
NormalizationPlanner 根据全局计数和配置 INormalizationConfig 为每个重复组制定重命名方案:
保留策略:PreserveFirstOccurrence 为真时,保留组内第一个主标识符节点不变,其余节点生成新名称。
跨链引用处理:当配置允许跨链引用时,通过 CrossChainStrategy 决定如何解析跨链的名称引用(始终取第一个、最后一个、优先相同链或指定链)。
唯一ID生成:调用 IUniqueIdGenerator 生成基于模式(如 "{0}_chain{1}_dup{2}")的唯一新名称,并确保不与现有名称冲突。
计划结果存储在 RenameInfo 中,包含原始值、新名称列表、位置列表等。
- 执行引擎
NormalizationEngine 的 ProcessCore 方法驱动整体流程:
csharp
private NormalizationResult<TKey, TNodeData> ProcessCore(IList<NodeLocation> locations, bool applyRenames)
{
// 1. 提取标识符和全局计数
var (globalValueCounts, snapshots) = ExtractIdentifiers(locations);
// 2. 规划重命名
PlanRenames(snapshots, globalValueCounts, configSnapshot);
// 3. 若需应用,收集节点更新并执行(包括引用更新)
if (applyRenames) ApplyRenames(updates, snapshots, locations, configSnapshot);
return BuildResult(snapshots);
}
在执行阶段,引擎会:
克隆原节点数据(通过 IObjectCloner 深拷贝),避免修改原始对象。
调用每个标识符处理器的 UpdateValue 更新克隆体上的字段。
如果启用了引用更新,则利用 IReferenceFieldUpdater 和 IReferenceResolver 遍历节点内的引用字段,将旧标识符替换为新值。
通过事务批量提交所有节点更新命令。
五、拓扑变换与连接去重
除了标识符字段的标准化,框架还提供了基于图拓扑的去重机制。
-
拓扑引擎
TopologyEngine<TKey, TNodeData, TEdgeData> 接受一组链信息(ChainInfo)和连接转换器(IConnectionTransformer),构建临时图并依次应用转换器。每个转换器可以修改或移除图中的边,最终返回更新后的链和变更记录。
-
去重转换器
DeduplicationTransformer 是一个具体的连接转换器,它利用标识符处理器识别具有相同标识符值的节点,并构建重定向映射表(将重复节点指向首个保留节点)。在转换边时,自动将边的源或目标替换为重定向后的节点,如果替换后出现自环则移除该边。同时,可选择删除冗余节点。
这种基于拓扑的方法适用于那些不需要修改节点内部数据、仅需合并节点关系的场景。
六、高级智能去重引擎
TopologyCore 命名空间下的 SmartDeduplicationEngine 进一步提升了去重的灵活性和智能化水平。
- 节点上下文建模
NodeDedupeContext 为每个节点提供丰富的运行时上下文:
所属链信息(ChainId、ChainIndex、NodeIndex)
前后节点引用(构建链内邻接关系)
入边和出边引用节点列表
自定义属性字典(供策略表达式使用)
ChainTopologyAnalyzer 负责构建这些上下文并建立链接关系,包括通过 INodeHandler 提取节点间的引用依赖。
- 策略驱动的优先级评估
用户可以通过 DedupeConfiguration 配置多组 DedupeStrategy,每组策略包含:
Order:执行顺序。
Condition:布尔表达式,决定该策略是否对当前节点生效。
PriorityExpression:数值表达式,计算该节点在本策略下的优先级得分。
BreakOnMatch:匹配后是否停止后续策略。
StrategyEvaluator 使用 DynamicExpresso 作为表达式引擎,将 NodeDedupeContext 的属性作为变量注入,动态计算每个节点的总优先级。这允许业务人员编写类似 chainIndex == 0 ? 100 : 0 的条件,实现高度定制化的保留逻辑。
-
冲突解决决策
DecisionResolver 根据配置的 ConflictResolution 枚举选择最终保留节点(如最高优先级、最小链索引、首次出现等),并为其他节点生成唯一新名称。名称生成器 INameGenerator 支持模板化命名,并自动处理非法字符。
-
与底层标准化引擎的集成
SmartDeduplicationEngine 实际上是底层 NormalizationEngine 的一层高级封装。它将链式数据结构转换为 ICoreGraph,构造对应的标识符处理器和引用更新器,驱动标准化流程,最后再将结果写回链模型。这种分层设计使得高级引擎可以复用底层经过充分测试的并发、事务和重命名基础设施。
七、可扩展性设计
框架通过接口和依赖注入提供了丰富的扩展点:
扩展点 作用
IIdentifierHandler 自定义需要去重的标识符字段
IUniqueIdGenerator 替换默认的 {0}_chain{1}_dup{2} 命名模式
IObjectCloner 提供深拷贝实现,避免共享状态污染
IReferenceResolver 解析跨链引用时的名称映射策略
IReferenceFieldUpdater 更新节点对象内部的引用字段
INormalizationContext 在流程中传递自定义服务和数据
IExpressionEvaluator 更换表达式引擎(默认使用 DynamicExpresso)
IConnectionTransformer 实现自定义图拓扑变换逻辑
INodeHandler<TNode, TChain> 将链式业务模型适配到框架的标准操作(获取/设置节点列表、克隆链等)
八、总结
本文所分析的代码实现了一套企业级的图数据标准化与去重解决方案。其亮点包括:
线程安全与事务一致性:读写锁 + 命令模式保证并发修改的正确性。
灵活的标识符定义:通过处理器抽象,可适应任意数据字段。
智能重命名计划:支持保留首次出现、跨链引用解析、唯一名称生成。
引用自动修复:避免因重命名导致的"悬空引用"。
拓扑级去重:通过边重定向合并冗余节点,简化图结构。
策略与表达式驱动:在高级引擎中,业务规则可直接配置为动态表达式,极大降低二次开发成本。
该框架适用于需要整合多个数据源、处理重复标识符、维护引用完整性的图应用场景,如知识图谱构建、业务流程链标准化、社交网络用户合并等。开发者只需实现少量适配器接口即可将业务模型接入,享受开箱即用的去重能力。
csharp
#region Normalization
namespace GraphFoundation.Normalization.Bridge
{
using Common;
using GraphFoundation.Core;
using GraphFoundation.Core.Implementation;
using System;
using System.Collections.Generic;
public class DefaultGraphAccessor<TKey, TNodeData, TEdgeData> : IGraphAccessor<TKey, TNodeData, TEdgeData>
where TKey : notnull where TNodeData : class
{
private readonly ICoreGraph<TKey, TNodeData, TEdgeData> _graph;
private readonly ILockManager _lockManager;
public DefaultGraphAccessor(ICoreGraph<TKey, TNodeData, TEdgeData> graph, ILockManager? lockManager = null)
{
_graph = graph ?? throw new ArgumentNullException(nameof(graph));
_lockManager = lockManager ?? new ReaderWriterLockManager();
}
public IDisposable EnterReadLock() => _lockManager.EnterReadLock();
public IDisposable EnterWriteLock() => _lockManager.EnterWriteLock();
public IDisposable EnterUpgradeableReadLock() => _lockManager.EnterUpgradeableReadLock();
private T ReadOperation<T>(Func<T> operation) { using var _ = EnterReadLock(); return operation(); }
private void WriteOperation(Action operation) { using var _ = EnterWriteLock(); operation(); }
public TNodeData? GetNodeData(TKey key) => ReadOperation(() => _graph.GetNodeData(key));
public string? GetNodeSymbol(TKey key) => ReadOperation(() => _graph.GetNodeSymbol(key));
public bool ContainsNode(TKey key) => ReadOperation(() => _graph.ContainsNode(key));
public IReadOnlyList<TKey> GetAllNodeKeys() => ReadOperation(_graph.GetAllNodeKeys);
public TNodeData? GetNodeDataObject(TKey key) => GetNodeData(key);
public void UpdateNode(TKey key, TNodeData newData, string? newSymbol) => WriteOperation(() => _graph.UpdateNode(key, newData, newSymbol));
public ITransaction BeginTransaction() => _graph.BeginTransaction();
public IContext Context => _graph.Context;
}
public class DefaultCommandFactory<TKey, TNodeData, TEdgeData> : ICommandFactory<TKey, TNodeData, TEdgeData>
where TKey : notnull where TNodeData : class
{
public ICommand CreateUpdateNodeCommand(TKey key, TNodeData newData, string? newSymbol) => new UpdateNodeCommand(key, newData, newSymbol);
private class UpdateNodeCommand : ICommand
{
private readonly TKey _key;
private readonly TNodeData _newData;
private readonly string? _newSymbol;
private TNodeData? _oldData;
private string? _oldSymbol;
private bool _executed;
public UpdateNodeCommand(TKey key, TNodeData newData, string? newSymbol) => (_key, _newData, _newSymbol) = (key, newData, newSymbol);
public void Execute(TransactionContextBase context)
{
var typedCtx = context as TransactionContext<TKey, TNodeData, TEdgeData>
?? throw new InvalidOperationException($"Unsupported context type: {context?.GetType().Name ?? "null"}");
var entity = typedCtx.Core.GetNodeEntityInternal(_key);
if (!entity.IsValid) throw new InvalidOperationException($"Node entity for key '{_key}' is invalid.");
var oldComp = typedCtx.Core.EntityManager.GetComponent<NodeComponent<TKey, TNodeData>>(entity);
(_oldData, _oldSymbol) = (oldComp.Data, oldComp.Symbol);
typedCtx.Core.EntityManager.SetComponent(entity, new NodeComponent<TKey, TNodeData>(_key, _newData, _newSymbol));
typedCtx.PendingEvents.Add(new NodeUpdatedEvent<TKey>(_key, _oldData, _newData, _oldSymbol, _newSymbol));
_executed = true;
}
public void Undo(TransactionContextBase context)
{
if (!_executed || _oldData == null) return;
var typedCtx = context as TransactionContext<TKey, TNodeData, TEdgeData>
?? throw new InvalidOperationException($"Unsupported context type: {context?.GetType().Name ?? "null"}");
var entity = typedCtx.Core.GetNodeEntityInternal(_key);
if (entity.IsValid)
typedCtx.Core.EntityManager.SetComponent(entity, new NodeComponent<TKey, TNodeData>(_key, _oldData, _oldSymbol));
}
}
}
}
namespace GraphFoundation.Normalization
{
using Force.DeepCloner;
using GraphFoundation.Core;
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Linq;
public class FullRenameContext
{
public IReadOnlyDictionary<string, RenameInfo> KeyMap { get; set; } = new Dictionary<string, RenameInfo>();
public IReadOnlyDictionary<string, RenameInfo> SymbolMap { get; set; } = new Dictionary<string, RenameInfo>();
public IReadOnlyDictionary<IIdentifierHandler<object, object>, IReadOnlyDictionary<string, RenameInfo>> CustomMaps { get; set; } = new Dictionary<IIdentifierHandler<object, object>, IReadOnlyDictionary<string, RenameInfo>>();
public string Resolve(string oldValue, int currentChainIndex, INormalizationConfig config, IReadOnlyDictionary<string, RenameInfo>? additionalMap = null)
{
if (KeyMap.TryGetValue(oldValue, out var keyInfo)) return ResolveByRenameInfo(keyInfo, currentChainIndex, config);
if (SymbolMap.TryGetValue(oldValue, out var symInfo)) return ResolveByRenameInfo(symInfo, currentChainIndex, config);
if (additionalMap?.TryGetValue(oldValue, out var addInfo) == true) return ResolveByRenameInfo(addInfo, currentChainIndex, config);
return oldValue;
}
private static string ResolveByRenameInfo(RenameInfo info, int currentChainIndex, INormalizationConfig config)
{
if (info.NewNames.Count == 0) return info.Locations.FirstOrDefault()?.Key?.ToString() ?? string.Empty;
if (!config.HandleCrossChainReferences)
return info.NewNames[info.PreservedIndex >= 0 && info.PreservedIndex < info.NewNames.Count ? info.PreservedIndex : 0];
return config.CrossChainStrategy switch
{
CrossChainReferenceStrategy.AlwaysFirst => info.NewNames[0],
CrossChainReferenceStrategy.AlwaysLast => info.NewNames[^1],
CrossChainReferenceStrategy.PreferSameChain => FindNameByChainIndex(info, currentChainIndex),
CrossChainReferenceStrategy.UseSpecifiedChain => FindNameByTargetChain(info, config),
_ => info.NewNames[0]
};
}
private static string FindNameByChainIndex(RenameInfo info, int chainIdx)
{
var chainIndexMap = new Dictionary<int, string>();
for (int i = 0; i < info.Locations.Count && i < info.NewNames.Count; i++)
chainIndexMap[info.Locations[i].ChainIndex] = info.NewNames[i];
return chainIndexMap.TryGetValue(chainIdx, out var name) ? name : (info.NewNames.Count > 0 ? info.NewNames[0] : string.Empty);
}
private static string FindNameByTargetChain(RenameInfo info, INormalizationConfig config) =>
config.CrossChainTargetChainIndex is int target && info.Locations.Select((loc, i) => (loc, i)).FirstOrDefault(x => x.loc.ChainIndex == target) is var match && match.i < info.NewNames.Count
? info.NewNames[match.i]
: (info.NewNames.Count > 0 ? info.NewNames[0] : string.Empty);
}
public interface IUniqueIdGenerator
{
string Generate(string baseId, object context, Func<string, bool> isNameTaken);
void Update(string pattern, int maxAttempts);
}
public class DefaultUniqueIdGenerator : IUniqueIdGenerator
{
private readonly object _syncLock = new();
private string _pattern;
private int _maxAttempts;
public DefaultUniqueIdGenerator(string pattern = "{0}_chain{1}_dup{2}", int maxAttempts = 100)
{
if (!pattern.Contains("{0}") || !pattern.Contains("{1}") || !pattern.Contains("{2}"))
throw new ArgumentException("Pattern must contain {0}, {1}, {2} placeholders.");
(_pattern, _maxAttempts) = (pattern, maxAttempts);
}
public void Update(string pattern, int maxAttempts) { lock (_syncLock) { (_pattern, _maxAttempts) = (pattern, maxAttempts); } }
public string Generate(string baseId, object context, Func<string, bool> isNameTaken)
{
int chainIndex = context is int idx ? idx : 0;
string pattern; int maxAttempts;
lock (_syncLock) { (pattern, maxAttempts) = (_pattern, _maxAttempts); }
return GenerateInternal(baseId, chainIndex, isNameTaken, pattern, maxAttempts, "id");
}
public string Generate(string baseId, int chainIndex, Func<string, bool> isNameTaken, INormalizationConfig config) =>
GenerateInternal(baseId, chainIndex, isNameTaken, config.RenamePattern, config.MaxDuplicates, "name");
private string GenerateInternal(string baseId, int chainIndex, Func<string, bool> isNameTaken, string pattern, int maxAttempts, string entityType)
{
for (int i = 1; i <= maxAttempts; i++)
{
string name = string.Format(pattern, baseId, chainIndex, i);
if (!isNameTaken(name)) return name;
}
ExceptionHelper.ThrowIfUniqueIdGenerationFailed(baseId, maxAttempts, entityType);
return string.Empty;
}
}
internal static class RenameMapHelper
{
public static IReadOnlyDictionary<string, RenameInfo> BuildRenameMap<TKey, TNodeData>(
IEnumerable<HandlerSnapshot<TKey, TNodeData>> snapshots,
Func<IIdentifierHandler<TKey, TNodeData>, bool> predicate)
where TKey : notnull where TNodeData : class =>
snapshots.Where(s => predicate(s.Handler))
.SelectMany(s => s.RenameMap)
.GroupBy(kv => kv.Key, kv => kv.Value)
.ToDictionary(g => g.Key, g => g.First(), StringComparer.Ordinal);
}
public interface IReferenceResolver<TKey, TNodeData> where TKey : notnull where TNodeData : class
{
object Resolve(object originalValue, TKey contextNodeKey, int currentChainIndex, FullRenameContext renameContext, INormalizationConfig config);
}
public class DelegateReferenceResolver<TKey, TNodeData> : IReferenceResolver<TKey, TNodeData>
where TKey : notnull where TNodeData : class
{
private readonly Func<object, TKey, int, FullRenameContext, INormalizationConfig, object> _resolveFunc;
public DelegateReferenceResolver(Func<object, TKey, int, FullRenameContext, INormalizationConfig, object> resolveFunc) => _resolveFunc = resolveFunc ?? throw new ArgumentNullException(nameof(resolveFunc));
public object Resolve(object originalValue, TKey contextNodeKey, int currentChainIndex, FullRenameContext renameContext, INormalizationConfig config) =>
_resolveFunc(originalValue, contextNodeKey, currentChainIndex, renameContext, config);
}
public interface IReferenceFieldUpdater<TNodeData> { void UpdateReferences(TNodeData data, Func<string?, string?> resolve); }
public interface INormalizationEngine<TKey, TNodeData, TEdgeData> where TKey : notnull where TNodeData : class
{
NormalizationResult<TKey, TNodeData> Normalize(IList<NodeLocation<TKey>> locations);
NormalizationResult<TKey, TNodeData> Analyze(IList<NodeLocation<TKey>> locations);
void UpdateConfig(INormalizationConfig newConfig);
}
public interface IGraphAccessor<TKey, TNodeData, TEdgeData> where TKey : notnull where TNodeData : class
{
IDisposable EnterReadLock();
IDisposable EnterWriteLock();
IDisposable EnterUpgradeableReadLock();
TNodeData? GetNodeData(TKey key);
string? GetNodeSymbol(TKey key);
bool ContainsNode(TKey key);
IReadOnlyList<TKey> GetAllNodeKeys();
TNodeData? GetNodeDataObject(TKey key);
void UpdateNode(TKey key, TNodeData newData, string? newSymbol);
ITransaction BeginTransaction();
IContext Context { get; }
}
public interface ICommandFactory<TKey, TNodeData, TEdgeData> where TKey : notnull where TNodeData : class
{
ICommand CreateUpdateNodeCommand(TKey key, TNodeData newData, string? newSymbol);
}
public interface IObjectCloner<T> where T : class { T Clone(T source); }
public class DeepClonerAdapter<T> : IObjectCloner<T> where T : class { public T Clone(T source) => source.DeepClone(); }
public interface INormalizationContext
{
T GetService<T>() where T : class;
void SetService<T>(T service) where T : class;
T GetOrAddService<T>(Func<T> factory) where T : class;
bool TryGetData<T>(out T value, string? key = null);
void SetData<T>(T value, string? key = null);
}
public class DefaultNormalizationContext : INormalizationContext
{
private readonly ConcurrentDictionary<Type, object> _services = new();
private readonly ConcurrentDictionary<string, object> _data = new();
public T GetService<T>() where T : class => (T)_services[typeof(T)];
public void SetService<T>(T service) where T : class => _services[typeof(T)] = service;
public T GetOrAddService<T>(Func<T> factory) where T : class => (T)_services.GetOrAdd(typeof(T), _ => factory());
public bool TryGetData<T>(out T value, string? key = null)
{
key ??= typeof(T).FullName;
if (_data.TryGetValue(key, out var obj) && obj is T t) { value = t; return true; }
value = default!;
return false;
}
public void SetData<T>(T value, string? key = null) => _data[key ?? typeof(T).FullName] = value!;
}
public interface IGraphModelAdapter<TModel, TKey, TNodeData, TEdgeData>
where TKey : notnull where TNodeData : class
{
(ICoreGraph<TKey, TNodeData, TEdgeData> Graph, IReadOnlyList<NodeLocation<TKey>> Locations) ConvertToGraph(IEnumerable<TModel> models);
TModel ConvertFromGraph(ICoreGraph<TKey, TNodeData, TEdgeData> graph, NodeLocation<TKey> rootLocation, object? context);
}
public class LinearChainAdapter<TNodeData, TChain> : IGraphModelAdapter<TChain, string, TNodeData, object>
where TNodeData : class where TChain : class
{
private readonly Func<TChain, IList<TNodeData?>> _getNodes;
private readonly Action<TChain, IList<TNodeData?>> _setNodes;
private readonly Func<TChain, TNodeData> _getStartNode;
private readonly Action<TChain, TNodeData> _setStartNode;
private readonly Func<TChain, TNodeData> _getEndNode;
private readonly Action<TChain, TNodeData> _setEndNode;
private readonly Func<TChain, TChain> _cloneChain;
private readonly IObjectCloner<TNodeData> _nodeCloner;
private readonly Func<TChain, int, int, string> _keyGenerator;
private readonly Dictionary<int, List<NodeLocation<string>>> _chainLocations = new();
private readonly Dictionary<int, TChain> _originalChains = new();
private readonly Dictionary<string, TNodeData> _originalNodeByKey = new();
public LinearChainAdapter(
Func<TChain, IList<TNodeData?>> getNodes, Action<TChain, IList<TNodeData?>> setNodes,
Func<TChain, TNodeData> getStartNode, Action<TChain, TNodeData> setStartNode,
Func<TChain, TNodeData> getEndNode, Action<TChain, TNodeData> setEndNode,
Func<TChain, TChain> cloneChain, IObjectCloner<TNodeData>? nodeCloner = null,
Func<TChain, int, int, string>? keyGenerator = null)
{
_getNodes = getNodes ?? throw new ArgumentNullException(nameof(getNodes));
_setNodes = setNodes ?? throw new ArgumentNullException(nameof(setNodes));
_getStartNode = getStartNode ?? throw new ArgumentNullException(nameof(getStartNode));
_setStartNode = setStartNode ?? throw new ArgumentNullException(nameof(setStartNode));
_getEndNode = getEndNode ?? throw new ArgumentNullException(nameof(getEndNode));
_setEndNode = setEndNode ?? throw new ArgumentNullException(nameof(setEndNode));
_cloneChain = cloneChain ?? throw new ArgumentNullException(nameof(cloneChain));
_nodeCloner = nodeCloner ?? new DeepClonerAdapter<TNodeData>();
_keyGenerator = keyGenerator ?? ((chain, chainIdx, nodeIdx) => $"chain_{chainIdx}_node_{nodeIdx}");
}
public (ICoreGraph<string, TNodeData, object> Graph, IReadOnlyList<NodeLocation<string>> Locations) ConvertToGraph(IEnumerable<TChain> models)
{
_chainLocations.Clear(); _originalChains.Clear(); _originalNodeByKey.Clear();
var graph = new CoreGraph<string, TNodeData, object>(new ConcreteContext());
var locations = new List<NodeLocation<string>>();
int chainIndex = 0;
foreach (var chain in models)
{
if (chain == null) continue;
var nodes = _getNodes(chain);
if (nodes == null) continue;
var chainLocationList = new List<NodeLocation<string>>();
for (int i = 0; i < nodes.Count; i++)
{
var node = nodes[i];
if (node == null) continue;
var key = _keyGenerator(chain, chainIndex, i);
graph.AddNode(key, node, null);
var loc = new NodeLocation<string>(key, chainIndex, i, node);
locations.Add(loc);
chainLocationList.Add(loc);
_originalNodeByKey[key] = node;
}
_chainLocations[chainIndex] = chainLocationList;
_originalChains[chainIndex] = chain;
chainIndex++;
}
return (graph, locations);
}
public TChain ConvertFromGraph(ICoreGraph<string, TNodeData, object> graph, NodeLocation<string> rootLocation, object? context)
{
int chainIdx = rootLocation.ChainIndex;
if (!_originalChains.TryGetValue(chainIdx, out var originalChain))
throw new InvalidOperationException($"No original chain found for index {chainIdx}.");
var chain = _cloneChain(originalChain);
if (!_chainLocations.TryGetValue(chainIdx, out var locations) || locations == null || locations.Count == 0)
{
_setNodes(chain, new List<TNodeData?>());
return chain;
}
var chainNodes = locations.OrderBy(l => l.NodeIndex).Select(loc => GetClonedNodeData(graph.GetNodeData(loc.Key), loc.Key)).ToList();
_setNodes(chain, chainNodes);
if (chainNodes.Count > 0)
{
var firstNode = chainNodes.FirstOrDefault(n => n != null);
var lastNode = chainNodes.LastOrDefault(n => n != null);
if (firstNode != null) _setStartNode(chain, firstNode);
if (lastNode != null) _setEndNode(chain, lastNode);
}
return chain;
}
private TNodeData? GetClonedNodeData(TNodeData? nodeData, string key) =>
nodeData != null ? _nodeCloner.Clone(nodeData) : _originalNodeByKey.TryGetValue(key, out var originalNode) ? _nodeCloner.Clone(originalNode) : null;
}
}
namespace GraphFoundation.Normalization.Processing
{
using GraphFoundation.Normalization;
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Linq;
internal class IdentifierExtractor<TKey, TNodeData, TEdgeData> where TKey : notnull where TNodeData : class
{
private readonly IGraphAccessor<TKey, TNodeData, TEdgeData> _accessor;
private readonly IEnumerable<IIdentifierHandler<TKey, TNodeData>> _handlers;
private ConcurrentDictionary<TKey, TNodeData>? _cachedAllNodeData;
public IdentifierExtractor(IGraphAccessor<TKey, TNodeData, TEdgeData> accessor, IEnumerable<IIdentifierHandler<TKey, TNodeData>> handlers) =>
(_accessor, _handlers) = (accessor ?? throw new ArgumentNullException(nameof(accessor)), handlers ?? throw new ArgumentNullException(nameof(handlers)));
public Dictionary<IIdentifierHandler<TKey, TNodeData>, Dictionary<string, int>> BuildGlobalValueCounts()
{
_cachedAllNodeData = new ConcurrentDictionary<TKey, TNodeData>();
var result = _handlers.ToDictionary(h => h, _ => new Dictionary<string, int>(StringComparer.Ordinal));
foreach (var key in _accessor.GetAllNodeKeys())
{
var data = _accessor.GetNodeDataObject(key);
if (data == null) continue;
_cachedAllNodeData[key] = data;
var loc = new NodeLocation<TKey>(key);
foreach (var handler in _handlers)
foreach (var (value, _, _) in handler.ExtractValues(key, data, loc))
{
var counts = result[handler];
counts[value] = counts.TryGetValue(value, out var cnt) ? cnt + 1 : 1;
}
}
return result;
}
public List<HandlerSnapshot<TKey, TNodeData>> BuildHandlerSnapshots(IList<NodeLocation<TKey>> locations)
{
var nodeDataCache = _cachedAllNodeData ?? BuildNodeDataCache(locations);
var handlerValueMaps = _handlers.ToDictionary(h => h, _ => new Dictionary<string, List<(NodeLocation<TKey>, bool)>>(StringComparer.Ordinal));
foreach (var loc in locations.Where(loc => _accessor.ContainsNode(loc.Key)))
{
if (!nodeDataCache.TryGetValue(loc.Key, out var data) && (data = _accessor.GetNodeDataObject(loc.Key)) != null)
nodeDataCache[loc.Key] = data;
if (data == null) continue;
foreach (var handler in _handlers)
{
var valueMap = handlerValueMaps[handler];
foreach (var (value, _, isMain) in handler.ExtractValues(loc.Key, data, loc))
{
if (!valueMap.TryGetValue(value, out var list))
valueMap[value] = list = new List<(NodeLocation<TKey>, bool)>();
list.Add((loc, isMain));
}
}
}
return _handlers.Select(handler => new HandlerSnapshot<TKey, TNodeData>(handler,
handlerValueMaps[handler].Where(kv => kv.Value.Count > 1).Select(kv => new OriginalValueGroup<TKey>(kv.Key, kv.Value))
.OrderBy(g => g.Value, StringComparer.Ordinal).ToList())).ToList();
}
private ConcurrentDictionary<TKey, TNodeData> BuildNodeDataCache(IList<NodeLocation<TKey>> locations)
{
var cache = new ConcurrentDictionary<TKey, TNodeData>();
foreach (var key in locations.Select(loc => loc.Key).Distinct())
if (_accessor.ContainsNode(key) && _accessor.GetNodeDataObject(key) is TNodeData data)
cache[key] = data;
return cache;
}
public void ClearCache() => _cachedAllNodeData = null;
}
internal class NormalizationPlanner<TKey, TNodeData, TEdgeData> where TKey : notnull where TNodeData : class
{
private readonly IUniqueIdGenerator _idGenerator;
public NormalizationPlanner(IUniqueIdGenerator idGenerator) => _idGenerator = idGenerator ?? throw new ArgumentNullException(nameof(idGenerator));
public void PlanRenames(List<HandlerSnapshot<TKey, TNodeData>> snapshots, Dictionary<IIdentifierHandler<TKey, TNodeData>, Dictionary<string, int>> globalValueCounts, INormalizationConfig config)
{
foreach (var snapshot in snapshots)
{
var renameMap = new Dictionary<string, RenameInfo>(StringComparer.Ordinal);
var globalCounts = globalValueCounts[snapshot.Handler];
var localCounts = new Dictionary<string, int>(globalCounts, StringComparer.Ordinal);
foreach (var group in snapshot.OriginalValues.Where(g => localCounts.GetValueOrDefault(g.Value) > 1))
{
int preservedIndex = config.PreserveFirstOccurrence ? Math.Max(0, group.Items.FindIndex(i => i.isMain)) : -1;
var locations = new List<NodeLocation<object>>();
var newNames = new List<string>();
var isMainList = new List<bool>();
for (int i = 0; i < group.Items.Count; i++)
{
var (loc, isMain) = group.Items[i];
locations.Add(new NodeLocation<object>(loc.Key, loc.ChainIndex, loc.NodeIndex, loc.UserData));
isMainList.Add(isMain);
if (i == preservedIndex) newNames.Add(group.Value);
else
{
string newName = _idGenerator.Generate(group.Value, loc.ChainIndex, n => localCounts.ContainsKey(n));
newNames.Add(newName);
localCounts[newName] = localCounts.GetValueOrDefault(newName) + 1;
}
}
renameMap[group.Value] = new RenameInfo { Locations = locations, NewNames = newNames, IsMain = isMainList, PreservedIndex = preservedIndex };
}
snapshot.RenameMap = renameMap;
}
}
}
public class DefaultReferenceResolver<TKey, TNodeData> : IReferenceResolver<TKey, TNodeData> where TKey : notnull where TNodeData : class
{
public object Resolve(object originalValue, TKey contextNodeKey, int currentChainIndex, FullRenameContext renameContext, INormalizationConfig config) =>
ReferenceResolver.Resolve(originalValue, str => renameContext.Resolve(str, currentChainIndex, config));
}
public class NormalizationEngine<TKey, TNodeData, TEdgeData> : INormalizationEngine<TKey, TNodeData, TEdgeData>
where TKey : notnull where TNodeData : class
{
private readonly IGraphAccessor<TKey, TNodeData, TEdgeData> _accessor;
private readonly ICommandFactory<TKey, TNodeData, TEdgeData> _commandFactory;
private readonly IEnumerable<IIdentifierHandler<TKey, TNodeData>> _handlers;
private readonly IObjectCloner<TNodeData> _cloner;
private readonly IReferenceResolver<TKey, TNodeData>? _referenceResolver;
private readonly IReferenceFieldUpdater<TNodeData>? _referenceFieldUpdater;
private readonly INormalizationContext _context;
private readonly IFactoryManager _factoryManager;
private readonly object _syncRoot = new();
private INormalizationConfig _config;
private IUniqueIdGenerator _idGenerator;
public NormalizationEngine(
IGraphAccessor<TKey, TNodeData, TEdgeData> accessor,
ICommandFactory<TKey, TNodeData, TEdgeData> commandFactory,
IEnumerable<IIdentifierHandler<TKey, TNodeData>> handlers,
INormalizationContext? context = null,
INormalizationConfig? config = null,
IUniqueIdGenerator? idGenerator = null,
IObjectCloner<TNodeData>? cloner = null,
IReferenceResolver<TKey, TNodeData>? referenceResolver = null,
IReferenceFieldUpdater<TNodeData>? referenceFieldUpdater = null,
IFactoryManager? factoryManager = null)
{
_accessor = accessor ?? throw new ArgumentNullException(nameof(accessor));
_commandFactory = commandFactory ?? throw new ArgumentNullException(nameof(commandFactory));
_handlers = handlers ?? Enumerable.Empty<IIdentifierHandler<TKey, TNodeData>>();
_context = context ?? new DefaultNormalizationContext();
_factoryManager = factoryManager ?? new DefaultFactoryManager();
_config = config ?? new NormalizationConfig();
_idGenerator = idGenerator ?? _factoryManager.GetUniqueIdGenerator();
_cloner = cloner ?? _factoryManager.GetObjectCloner<TNodeData>();
_referenceResolver = referenceResolver ?? _factoryManager.GetReferenceResolver<TKey, TNodeData>();
_referenceFieldUpdater = referenceFieldUpdater;
_context.SetService(_config);
_context.SetService(_idGenerator);
_context.SetService(_cloner);
_context.SetService(_factoryManager);
if (_referenceResolver != null) _context.SetService(_referenceResolver);
if (_referenceFieldUpdater != null) _context.SetService(_referenceFieldUpdater);
}
public void UpdateConfig(INormalizationConfig newConfig)
{
if (newConfig == null) throw new ArgumentNullException(nameof(newConfig));
lock (_syncRoot) { _config = newConfig; _idGenerator.Update(_config.RenamePattern, _config.MaxDuplicates); _context.SetService(_config); }
}
public NormalizationResult<TKey, TNodeData> Normalize(IList<NodeLocation<TKey>> locations) =>
!_config.EnableNormalization ? new NormalizationResult<TKey, TNodeData>() : ProcessCore(locations, true);
public NormalizationResult<TKey, TNodeData> Analyze(IList<NodeLocation<TKey>> locations) =>
!_config.EnableNormalization ? new NormalizationResult<TKey, TNodeData>() : ProcessCore(locations, false);
private NormalizationResult<TKey, TNodeData> ProcessCore(IList<NodeLocation<TKey>> locations, bool applyRenames)
{
var configSnapshot = GetConfigSnapshot();
var (globalValueCounts, snapshots) = ExtractIdentifiers(locations);
PlanRenames(snapshots, globalValueCounts, configSnapshot);
if (!applyRenames || CollectNodeUpdates(snapshots) is not { Count: > 0 } updates) return BuildResult(snapshots);
ApplyRenames(updates, snapshots, locations, configSnapshot);
return BuildResult(snapshots);
}
private INormalizationConfig GetConfigSnapshot() { lock (_syncRoot) { return _config; } }
private (Dictionary<IIdentifierHandler<TKey, TNodeData>, Dictionary<string, int>>, List<HandlerSnapshot<TKey, TNodeData>>) ExtractIdentifiers(IList<NodeLocation<TKey>> locations)
{
using (_accessor.EnterReadLock())
{
var extractor = new IdentifierExtractor<TKey, TNodeData, TEdgeData>(_accessor, _handlers);
return (extractor.BuildGlobalValueCounts(), extractor.BuildHandlerSnapshots(locations));
}
}
private void PlanRenames(List<HandlerSnapshot<TKey, TNodeData>> snapshots, Dictionary<IIdentifierHandler<TKey, TNodeData>, Dictionary<string, int>> globalValueCounts, INormalizationConfig configSnapshot) =>
new NormalizationPlanner<TKey, TNodeData, TEdgeData>(_idGenerator).PlanRenames(snapshots, globalValueCounts, configSnapshot);
private void ApplyRenames(Dictionary<TKey, NodeUpdateInfo> updatesPerNode, List<HandlerSnapshot<TKey, TNodeData>> snapshots, IList<NodeLocation<TKey>> locations, INormalizationConfig configSnapshot)
{
using (_accessor.EnterWriteLock())
{
if (configSnapshot.UpdateNodeReferences && _referenceFieldUpdater != null && _referenceResolver != null)
{
var fullContext = BuildFullRenameContext(snapshots);
var chainIndexMap = BuildChainIndexMap(snapshots, locations);
UpdateNodeReferences(updatesPerNode, fullContext, chainIndexMap, configSnapshot);
}
ExecuteNodeUpdates(updatesPerNode);
}
}
private Dictionary<TKey, NodeUpdateInfo> CollectNodeUpdates(List<HandlerSnapshot<TKey, TNodeData>> snapshots)
{
var updates = new Dictionary<TKey, NodeUpdateInfo>();
foreach (var snapshot in snapshots)
foreach (var (oldValue, info) in snapshot.RenameMap)
for (int i = 0; i < info.Locations.Count; i++)
{
if (!(info.Locations[i].Key is TKey key) || !_accessor.ContainsNode(key)) continue;
var oldData = _accessor.GetNodeDataObject(key);
if (oldData == null) continue;
if (!updates.TryGetValue(key, out var entry))
updates[key] = entry = new NodeUpdateInfo(oldData, _cloner.Clone(oldData), _accessor.GetNodeSymbol(key));
entry.NewData = snapshot.Handler.UpdateValue(key, entry.NewData, oldValue, info.NewNames[i], info.IsMain[i]);
if (info.IsMain[i]) entry.NewSymbol = info.NewNames[i];
entry.IdentifierUpdates.Add((snapshot.Handler, oldValue, info.NewNames[i], info.IsMain[i]));
}
return updates;
}
private FullRenameContext BuildFullRenameContext(List<HandlerSnapshot<TKey, TNodeData>> snapshots) => new()
{
KeyMap = RenameMapHelper.BuildRenameMap(snapshots, h => h.IsUniqueGlobally),
SymbolMap = RenameMapHelper.BuildRenameMap(snapshots, h => !h.IsUniqueGlobally),
CustomMaps = new Dictionary<IIdentifierHandler<object, object>, IReadOnlyDictionary<string, RenameInfo>>()
};
private IReadOnlyDictionary<TKey, int> BuildChainIndexMap(List<HandlerSnapshot<TKey, TNodeData>> snapshots, IList<NodeLocation<TKey>> locations) =>
locations.Where(loc => loc.Key is TKey).Select(loc => ((TKey)loc.Key, loc.ChainIndex))
.Concat(snapshots.SelectMany(s => s.OriginalValues).SelectMany(g => g.Items).Select(item => (key: item.location.Key, chainIndex: item.location.ChainIndex)).Where(item => item.key is TKey).Select(item => ((TKey)item.key, item.chainIndex)))
.Distinct().ToDictionary(x => x.Item1, x => x.Item2);
private void UpdateNodeReferences(Dictionary<TKey, NodeUpdateInfo> updates, FullRenameContext renameContext, IReadOnlyDictionary<TKey, int> chainIndexMap, INormalizationConfig config)
{
if (_referenceFieldUpdater == null || _referenceResolver == null) return;
foreach (var (key, info) in updates)
{
int chainIdx = chainIndexMap.GetValueOrDefault(key, -1);
_referenceFieldUpdater.UpdateReferences(info.NewData, old => old == null ? null : (_referenceResolver.Resolve(old, key, chainIdx, renameContext, config) as string));
}
}
private void ExecuteNodeUpdates(Dictionary<TKey, NodeUpdateInfo> updates)
{
using var tx = _accessor.BeginTransaction();
foreach (var (key, info) in updates)
tx.Enqueue(_commandFactory.CreateUpdateNodeCommand(key, info.NewData, info.NewSymbol));
tx.Commit();
}
private static NormalizationResult<TKey, TNodeData> BuildResult(List<HandlerSnapshot<TKey, TNodeData>> snapshots) => new()
{
HandlerRenameMaps = snapshots.Where(s => s.RenameMap.Count > 0).ToDictionary(s => s.Handler, s => (IReadOnlyDictionary<string, RenameInfo>)s.RenameMap),
KeyRenameMap = RenameMapHelper.BuildRenameMap(snapshots, h => h.IsUniqueGlobally),
SymbolRenameMap = RenameMapHelper.BuildRenameMap(snapshots, h => !h.IsUniqueGlobally)
};
private class NodeUpdateInfo
{
public TNodeData OldData { get; }
public TNodeData NewData { get; set; }
public string? NewSymbol { get; set; }
public List<(IIdentifierHandler<TKey, TNodeData> handler, string oldValue, string newValue, bool isMain)> IdentifierUpdates { get; } = new();
public NodeUpdateInfo(TNodeData oldData, TNodeData newData, string? newSymbol) => (OldData, NewData, NewSymbol) = (oldData, newData, newSymbol);
}
}
}
#nullable enable
#region ConnectionTransformationFramework
namespace GraphFoundation.Topology.Engine
{
using ContextManagement;
using GraphFoundation.Core;
using GraphFoundation.Core.Implementation;
using GraphFoundation.Topology;
using GraphFoundation.Topology.Adapters;
using GraphFoundation.Topology.Transformers;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
public class GraphBuilder<TKey, TNodeData, TEdgeData> where TKey : notnull where TNodeData : class where TEdgeData : class
{
public CoreGraph<TKey, TNodeData, TEdgeData> BuildFromChains(IList<ChainInfo<TKey, TNodeData, TEdgeData>> chains)
{
var graph = new CoreGraph<TKey, TNodeData, TEdgeData>(new ConcreteContext());
foreach (var chain in chains)
for (int i = 0; i < chain.NodeKeys.Count - 1; i++)
{
var source = chain.NodeKeys[i];
var target = chain.NodeKeys[i + 1];
if (!graph.ContainsNode(source)) graph.AddNode(source, default(TNodeData)!);
if (!graph.ContainsNode(target)) graph.AddNode(target, default(TNodeData)!);
var edgeId = $"{source}_{target}_{i}";
if (!graph.ContainsEdge(edgeId)) graph.AddEdge(edgeId, source, target, true, default(TEdgeData)!, 1.0);
}
return graph;
}
public CoreGraph<TKey, TNodeData, TEdgeData> ConvertFromReadOnlyGraph(IReadOnlyGraph<TKey, TNodeData, TEdgeData> readOnlyGraph)
{
var graph = new CoreGraph<TKey, TNodeData, TEdgeData>(new ConcreteContext());
foreach (var nodeKey in readOnlyGraph.GetAllNodeKeys()) graph.AddNode(nodeKey, readOnlyGraph.GetNodeData(nodeKey) ?? default(TNodeData)!);
foreach (var edgeId in readOnlyGraph.GetAllEdgeIds())
{
var source = readOnlyGraph.GetEdgeSource(edgeId);
var target = readOnlyGraph.GetEdgeTarget(edgeId);
graph.AddEdge(edgeId, source, target, true, readOnlyGraph.GetEdgeData(edgeId) ?? default(TEdgeData)!, readOnlyGraph.GetEdgeWeight(edgeId));
}
return graph;
}
}
public class TransformerExecutor<TKey, TNodeData, TEdgeData> where TKey : notnull where TNodeData : class where TEdgeData : class
{
public List<ConnectionChange<TKey>> ExecuteTransformers(IEnumerable<IConnectionTransformer<TKey, TNodeData, TEdgeData>> transformers, CoreGraph<TKey, TNodeData, TEdgeData> graph, IReadOnlyGraph<TKey, TNodeData, TEdgeData> readOnlyGraph, IConnectionSelector<TKey, TNodeData, TEdgeData>? selector, IContext context)
{
var changes = new List<ConnectionChange<TKey>>();
foreach (var transformer in transformers)
{
if (context.CancellationToken.IsCancellationRequested) break;
var edgeIds = graph.GetAllEdgeIds().ToList();
foreach (var edgeId in edgeIds)
{
var source = graph.GetEdgeSource(edgeId);
var target = graph.GetEdgeTarget(edgeId);
var weight = graph.GetEdgeWeight(edgeId);
var oldData = graph.GetEdgeData(edgeId);
var connection = new Connection<TKey, TNodeData, TEdgeData>(edgeId, source, target, true, weight, oldData);
if (selector != null && !selector.ShouldTransform(connection, readOnlyGraph)) continue;
var result = transformer.Transform(connection, readOnlyGraph, new TransformationContext(context, context.CancellationToken));
if (result == null)
{
graph.RemoveEdge(edgeId);
changes.Add(new ConnectionChange<TKey> { Type = ConnectionChange<TKey>.ChangeType.Removed, EdgeId = edgeId });
}
else if (!result.EdgeId.Equals(edgeId) || !result.Source.Equals(source) || !result.Target.Equals(target))
{
graph.RemoveEdge(edgeId);
graph.AddEdge(result.EdgeId, result.Source, result.Target, result.IsDirected, result.Data, result.Weight);
changes.Add(new ConnectionChange<TKey> { Type = ConnectionChange<TKey>.ChangeType.Updated, EdgeId = result.EdgeId, Source = result.Source, Target = result.Target, Data = result.Data, OldSource = source, OldTarget = target, OldData = oldData });
}
}
}
return changes;
}
}
public class ResultBuilder<TKey, TNodeData, TEdgeData> where TKey : notnull where TNodeData : class where TEdgeData : class
{
public List<ChainInfo<TKey, TNodeData, TEdgeData>> WriteBackChains(IList<ChainInfo<TKey, TNodeData, TEdgeData>> chains) => chains.ToList();
public TopologyTransformationResult<TKey, TNodeData, TEdgeData> Build(List<ChainInfo<TKey, TNodeData, TEdgeData>> updatedChains, List<ConnectionChange<TKey>> allChanges, TimeSpan elapsed) => new() { ProcessedChains = updatedChains, ConnectionChanges = allChanges, Elapsed = elapsed, HasErrors = false, Errors = new List<Exception>() };
}
public class TopologyEngine<TKey, TNodeData, TEdgeData> : ITopologyEngine<TKey, TNodeData, TEdgeData> where TKey : notnull where TNodeData : class where TEdgeData : class
{
private readonly IContext _defaultContext;
private readonly GraphBuilder<TKey, TNodeData, TEdgeData> _graphBuilder = new();
private readonly TransformerExecutor<TKey, TNodeData, TEdgeData> _transformerExecutor = new();
private readonly ResultBuilder<TKey, TNodeData, TEdgeData> _resultBuilder = new();
public TopologyEngine(IContext? defaultContext = null) => _defaultContext = defaultContext ?? new ConcreteContext();
public TopologyTransformationResult<TKey, TNodeData, TEdgeData> Execute(IList<ChainInfo<TKey, TNodeData, TEdgeData>>? chains, IEnumerable<IConnectionTransformer<TKey, TNodeData, TEdgeData>> transformers, IConnectionSelector<TKey, TNodeData, TEdgeData>? selector = null, IContext? context = null)
{
if (chains == null || chains.Count == 0) throw new ArgumentException("Chains cannot be null or empty when using ChainInfo input.", nameof(chains));
var stopwatch = Stopwatch.StartNew();
context ??= _defaultContext;
var graph = _graphBuilder.BuildFromChains(chains);
var allChanges = _transformerExecutor.ExecuteTransformers(transformers, graph, new CoreGraphReadOnlyAdapter<TKey, TNodeData, TEdgeData>(graph), selector, context);
return _resultBuilder.Build(_resultBuilder.WriteBackChains(chains), allChanges, stopwatch.Elapsed);
}
public TopologyTransformationResult<TKey, TNodeData, TEdgeData> Execute(IReadOnlyGraph<TKey, TNodeData, TEdgeData> graph, IEnumerable<IConnectionTransformer<TKey, TNodeData, TEdgeData>> transformers, IConnectionSelector<TKey, TNodeData, TEdgeData>? selector = null, IContext? context = null)
{
if (graph == null) throw new ArgumentNullException(nameof(graph));
var stopwatch = Stopwatch.StartNew();
context ??= _defaultContext;
var coreGraph = _graphBuilder.ConvertFromReadOnlyGraph(graph);
var allChanges = _transformerExecutor.ExecuteTransformers(transformers, coreGraph, graph, selector, context);
return _resultBuilder.Build(new List<ChainInfo<TKey, TNodeData, TEdgeData>>(), allChanges, stopwatch.Elapsed);
}
}
}
namespace GraphFoundation.Topology.Transformers.Deduplication
{
using GraphFoundation.Core;
using GraphFoundation.Topology.Transformers;
using System;
using System.Collections.Generic;
using System.Linq;
public class DeduplicationTransformer<TKey, TNodeData, TEdgeData> : IConnectionTransformer<TKey, TNodeData, TEdgeData> where TKey : notnull where TNodeData : class where TEdgeData : class
{
private readonly IIdentifierHandler<TKey, TNodeData> _identifierHandler;
private readonly bool _deleteRedundantNodes;
private Dictionary<TKey, TKey> _nodeRedirectionMap = new();
private bool _redirectionMapInitialized;
public DeduplicationTransformer(IIdentifierHandler<TKey, TNodeData> identifierHandler, bool deleteRedundantNodes = true) =>
(_identifierHandler, _deleteRedundantNodes) = (identifierHandler ?? throw new ArgumentNullException(nameof(identifierHandler)), deleteRedundantNodes);
public Connection<TKey, TNodeData, TEdgeData>? Transform(Connection<TKey, TNodeData, TEdgeData> connection, IReadOnlyGraph<TKey, TNodeData, TEdgeData> graph, ITransformationContext context)
{
if (!_redirectionMapInitialized) { BuildRedirectionMap(graph, context); _redirectionMapInitialized = true; }
// 如果源或目标为 null,直接返回 null(或者根据业务逻辑处理)
if (connection.Source == null || connection.Target == null)
return null;
var newTarget = _nodeRedirectionMap.GetValueOrDefault(connection.Target, connection.Target);
var newSource = _nodeRedirectionMap.GetValueOrDefault(connection.Source, connection.Source);
if (newSource.Equals(connection.Source) && newTarget.Equals(connection.Target))
return connection;
if (newSource.Equals(newTarget))
return null;
return new Connection<TKey, TNodeData, TEdgeData>(connection.EdgeId, newSource, newTarget, connection.IsDirected, connection.Weight, connection.Data);
}
private void BuildRedirectionMap(IReadOnlyGraph<TKey, TNodeData, TEdgeData> graph, ITransformationContext context)
{
foreach (var group in FindDuplicateNodes(graph))
for (int i = 1; i < group.Count; i++)
_nodeRedirectionMap[group[i]] = group[0];
}
private List<List<TKey>> FindDuplicateNodes(IReadOnlyGraph<TKey, TNodeData, TEdgeData> graph)
{
var valueToKeys = new Dictionary<string, List<TKey>>();
foreach (var key in graph.GetAllNodeKeys())
{
var data = graph.GetNodeData(key);
if (data == null) continue;
var loc = new NodeLocation<TKey>(key);
foreach (var (value, _, _) in _identifierHandler.ExtractValues(key, data, loc))
{
if (!valueToKeys.TryGetValue(value, out var list)) valueToKeys[value] = list = new List<TKey>();
list.Add(key);
}
}
return valueToKeys.Values.Where(g => g.Count > 1).Select(g => g.ToList()).ToList();
}
}
}
#endregion
namespace TopologyCore
{
using DynamicExpresso;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
public class NodeDedupeContext<TNode> where TNode : class
{
public TNode Node { get; }
public string ChainId { get; }
public int ChainIndex { get; }
public int NodeIndex { get; }
public NodeDedupeContext<TNode>? PreviousNode { get; set; }
public NodeDedupeContext<TNode>? NextNode { get; set; }
public IReadOnlyList<NodeDedupeContext<TNode>> IncomingReferences { get; set; } = Array.Empty<NodeDedupeContext<TNode>>();
public IReadOnlyList<NodeDedupeContext<TNode>> OutgoingReferences { get; set; } = Array.Empty<NodeDedupeContext<TNode>>();
public Dictionary<string, object> Attributes { get; } = new();
public bool IsProcessed { get; set; }
public bool IsDuplicate { get; set; }
public string? OriginalIdentifier { get; set; }
public string? NewIdentifier { get; set; }
public NodeDedupeContext(TNode node, string chainId, int chainIndex, int nodeIndex) =>
(Node, ChainId, ChainIndex, NodeIndex) = (node ?? throw new ArgumentNullException(nameof(node)), string.IsNullOrEmpty(chainId) ? throw new ArgumentException("Value cannot be null or empty", nameof(chainId)) : chainId, chainIndex < 0 ? throw new ArgumentOutOfRangeException(nameof(chainIndex), "Value cannot be negative") : chainIndex, nodeIndex < 0 ? throw new ArgumentOutOfRangeException(nameof(nodeIndex), "Value cannot be negative") : nodeIndex);
public void AddAttribute(string key, object value) { if (string.IsNullOrEmpty(key)) throw new ArgumentException("Value cannot be null or empty", nameof(key)); Attributes[key] = value; }
public T? GetAttribute<T>(string key) =>
!Attributes.TryGetValue(key, out var value) || value == null ? default : value is T typedValue ? typedValue : throw new InvalidCastException($"Cannot cast attribute '{key}' value of type '{value.GetType().Name}' to '{typeof(T).Name}'");
public override string ToString() => $"NodeContext[Chain={ChainId}, Index={NodeIndex}, IsDuplicate={IsDuplicate}]";
}
public class IdentifierDefinition
{
public string Name { get; set; } = string.Empty;
public bool IsGloballyUnique { get; set; }
public bool LinkedToMain { get; set; }
public string? MainIdentifierName { get; set; }
public Func<object, string?> Getter { get; set; } = _ => null;
public Action<object, string?> Setter { get; set; } = (_, _) => { };
public string? ValidationPattern { get; set; }
public bool Required { get; set; }
public string? DefaultValue { get; set; }
public bool IsValid(string? value) => !Required || (!string.IsNullOrEmpty(value) && (string.IsNullOrEmpty(ValidationPattern) || System.Text.RegularExpressions.Regex.IsMatch(value, ValidationPattern)));
public override string ToString() => $"IdentifierDefinition[Name={Name}, IsGloballyUnique={IsGloballyUnique}, LinkedToMain={LinkedToMain}]";
}
public class DedupeStrategy
{
public string Name { get; set; } = string.Empty;
public int Order { get; set; }
public string Condition { get; set; } = "true";
public string PriorityExpression { get; set; } = "0";
public string? NewNameTemplate { get; set; }
public bool Enabled { get; set; } = true;
public string? Description { get; set; }
public bool BreakOnMatch { get; set; }
public override string ToString() => $"DedupeStrategy[Name={Name}, Order={Order}, Enabled={Enabled}]";
}
public enum ConflictResolution { PickHighestPriority, PickLowestPriority, PickFirstOccurrence, PickLastOccurrence, PickSmallestChainIndex, PickLargestChainIndex, ThrowException }
public interface IDedupeConfiguration
{
string DefaultRenamePattern { get; }
int MaxDuplicates { get; }
bool EnableCrossChainReference { get; }
CrossChainReferenceStrategy CrossChainStrategy { get; }
int? CrossChainTargetChainIndex { get; }
ConflictResolution ConflictResolution { get; }
IReadOnlyList<IdentifierDefinition> Identifiers { get; }
IReadOnlyList<DedupeStrategy> Strategies { get; }
bool IsValid();
}
internal static class Constants { public const int DefaultMaxDuplicates = 100; public const int DefaultTimeoutMilliseconds = 30000; public const int MaxNameGenerationAttempts = 1000; public const string DefaultRenamePattern = "{0}_chain{1}_dup{2}"; public const string DefaultUnnamedValue = "unnamed"; }
public class DedupeConfiguration : IDedupeConfiguration
{
private INormalizationConfig _normalizationConfig = new NormalizationConfig();
public INormalizationConfig NormalizationConfig { get => _normalizationConfig; set => _normalizationConfig = value ?? throw new ArgumentNullException(nameof(value)); }
public string DefaultRenamePattern { get => _normalizationConfig.RenamePattern; set => ((NormalizationConfig)_normalizationConfig).RenamePattern = value; }
public int MaxDuplicates { get => _normalizationConfig.MaxDuplicates; set => ((NormalizationConfig)_normalizationConfig).MaxDuplicates = value; }
public bool EnableCrossChainReference { get => _normalizationConfig.HandleCrossChainReferences; set => ((NormalizationConfig)_normalizationConfig).HandleCrossChainReferences = value; }
public CrossChainReferenceStrategy CrossChainStrategy { get => _normalizationConfig.CrossChainStrategy; set => ((NormalizationConfig)_normalizationConfig).CrossChainStrategy = value; }
public int? CrossChainTargetChainIndex { get => _normalizationConfig.CrossChainTargetChainIndex; set => ((NormalizationConfig)_normalizationConfig).CrossChainTargetChainIndex = value; }
public ConflictResolution ConflictResolution { get; set; } = ConflictResolution.PickHighestPriority;
public IReadOnlyList<IdentifierDefinition> Identifiers { get; set; } = new List<IdentifierDefinition>();
public IReadOnlyList<DedupeStrategy> Strategies { get; set; } = new List<DedupeStrategy>();
public bool EnableLogging { get; set; }
public int TimeoutMilliseconds { get; set; } = Constants.DefaultTimeoutMilliseconds;
public bool SkipValidation { get; set; }
public bool IsValid() => SkipValidation || (!string.IsNullOrEmpty(DefaultRenamePattern) && MaxDuplicates > 0 && Identifiers?.Count > 0 && Strategies != null && Identifiers.All(id => !string.IsNullOrEmpty(id.Name) && id.Getter != null && id.Setter != null));
public IdentifierDefinition? GetMainIdentifier() => Identifiers.FirstOrDefault(id => !id.LinkedToMain);
public override string ToString() => $"DedupeConfiguration[Identifiers={Identifiers.Count}, Strategies={Strategies.Count}]";
}
public interface ISmartDeduplicationEngine<TNode> where TNode : class { DeduplicationResult<TNode> Execute(IList<ChainInfo<TNode>> chains, IDedupeConfiguration config, INodeHandler<TNode, ChainInfo<TNode>> handler); }
public class DeduplicationResult<TNode> where TNode : class { public List<ChainInfo<TNode>> ProcessedChains { get; set; } = new(); public Dictionary<string, Dictionary<string, string>> RenameMaps { get; set; } = new(); }
public interface IChainTopologyAnalyzer<TNode> where TNode : class { IReadOnlyList<NodeDedupeContext<TNode>> Analyze(IList<ChainInfo<TNode>> chains, INodeHandler<TNode, ChainInfo<TNode>> handler); }
public interface IIdentifierExtractor<TNode> where TNode : class { List<(string IdentifierName, string? Value, NodeDedupeContext<TNode> Context)> Extract(IReadOnlyList<NodeDedupeContext<TNode>> allContexts, IReadOnlyList<IdentifierDefinition> identifiers); }
public interface IStrategyEvaluator<TNode> where TNode : class { Dictionary<NodeDedupeContext<TNode>, double> Evaluate(IReadOnlyList<NodeDedupeContext<TNode>> duplicateGroup, string identifierValue, IReadOnlyList<DedupeStrategy> strategies, IExpressionEvaluator evaluator); }
public interface IDecisionResolver<TNode> where TNode : class { (NodeDedupeContext<TNode> Preserved, Dictionary<NodeDedupeContext<TNode>, string> NewNames) Resolve(IReadOnlyList<NodeDedupeContext<TNode>> duplicateGroup, string duplicatedValue, Dictionary<NodeDedupeContext<TNode>, double> priorities, IDedupeConfiguration config, INameGenerator nameGenerator); }
public interface IRenameExecutor<TNode> where TNode : class { void Execute(IReadOnlyList<NodeDedupeContext<TNode>> allContexts, Dictionary<string, Dictionary<string, string>> renameMaps, IReadOnlyList<IdentifierDefinition> identifiers); }
public interface IReferenceUpdater<TNode> where TNode : class { void UpdateReferences(IReadOnlyList<NodeDedupeContext<TNode>> allContexts, Dictionary<string, Dictionary<string, string>> renameMaps, IDedupeConfiguration config, INodeHandler<TNode, ChainInfo<TNode>> handler); }
public interface INameGenerator { string Generate(string baseValue, NodeDedupeContext<object> context, IDedupeConfiguration config); }
public interface IExpressionEvaluator { double EvaluateDouble(string expression, IDictionary<string, object> variables); bool EvaluateBool(string expression, IDictionary<string, object> variables); }
public class ChainTopologyAnalyzer<TNode> : IChainTopologyAnalyzer<TNode> where TNode : class
{
public IReadOnlyList<NodeDedupeContext<TNode>> Analyze(IList<ChainInfo<TNode>> chains, INodeHandler<TNode, ChainInfo<TNode>> handler)
{
if (chains == null) throw new ArgumentNullException(nameof(chains));
if (handler == null) throw new ArgumentNullException(nameof(handler));
var allContexts = new List<NodeDedupeContext<TNode>>();
int chainIndex = 0;
foreach (var chain in chains.Where(c => c != null))
{
var chainId = $"chain_{chainIndex}";
var nodes = handler.GetNodes(chain);
if (nodes != null)
for (int i = 0; i < nodes.Count; i++)
if (nodes[i] != null) allContexts.Add(new NodeDedupeContext<TNode>(nodes[i]!, chainId, chainIndex, i));
chainIndex++;
}
BuildNodeLinks(allContexts);
BuildReferenceLinks(allContexts, handler);
return allContexts;
}
private void BuildNodeLinks(IList<NodeDedupeContext<TNode>> contexts)
{
var byChain = contexts.GroupBy(c => c.ChainId).ToDictionary(g => g.Key, g => g.OrderBy(c => c.NodeIndex).ToList());
foreach (var (_, list) in byChain)
for (int i = 0; i < list.Count; i++)
{
if (i > 0) list[i].PreviousNode = list[i - 1];
if (i < list.Count - 1) list[i].NextNode = list[i + 1];
}
}
private void BuildReferenceLinks(IList<NodeDedupeContext<TNode>> contexts, INodeHandler<TNode, ChainInfo<TNode>> handler)
{
var nodeByKey = new Dictionary<string, NodeDedupeContext<TNode>>();
var keyToReferencingNodes = new Dictionary<string, List<NodeDedupeContext<TNode>>>();
foreach (var ctx in contexts)
{
var key = handler.GetKey(ctx.Node);
if (!string.IsNullOrEmpty(key))
nodeByKey[key] = ctx;
}
foreach (var ctx in contexts)
{
var outgoingRefs = new List<NodeDedupeContext<TNode>>();
var referencedKeys = ExtractReferencedKeys(ctx.Node, handler);
foreach (var refKey in referencedKeys.Where(k => !string.IsNullOrEmpty(k)))
if (nodeByKey.TryGetValue(refKey!, out var target))
{
outgoingRefs.Add(target);
if (!keyToReferencingNodes.TryGetValue(refKey!, out var list)) keyToReferencingNodes[refKey!] = list = new List<NodeDedupeContext<TNode>>();
list.Add(ctx);
}
ctx.OutgoingReferences = outgoingRefs;
}
foreach (var ctx in contexts)
{
var key = handler.GetKey(ctx.Node);
ctx.IncomingReferences = !string.IsNullOrEmpty(key) && keyToReferencingNodes.TryGetValue(key!, out var referencingNodes) ? referencingNodes : Array.Empty<NodeDedupeContext<TNode>>();
}
}
private HashSet<string> ExtractReferencedKeys(TNode node, INodeHandler<TNode, ChainInfo<TNode>> handler)
{
var keys = new HashSet<string>();
handler.UpdateReferences(node, (oldVal, _) => { if (!string.IsNullOrEmpty(oldVal)) keys.Add(oldVal); return oldVal; });
return keys;
}
}
public class IdentifierExtractor<TNode> : IIdentifierExtractor<TNode> where TNode : class
{
public List<(string IdentifierName, string? Value, NodeDedupeContext<TNode> Context)> Extract(IReadOnlyList<NodeDedupeContext<TNode>> allContexts, IReadOnlyList<IdentifierDefinition> identifiers)
{
if (allContexts == null) throw new ArgumentNullException(nameof(allContexts));
if (identifiers == null) throw new ArgumentNullException(nameof(identifiers));
var result = new List<(string, string?, NodeDedupeContext<TNode>)>();
foreach (var ctx in allContexts.Where(c => c != null && c.Node != null))
foreach (var id in identifiers.Where(i => i != null))
{
if (id.Getter == null) continue;
var value = id.Getter(ctx.Node);
if (string.IsNullOrEmpty(value))
{
if (id.Required) continue;
value = id.DefaultValue;
if (string.IsNullOrEmpty(value)) continue;
}
if (!id.IsValid(value)) continue;
result.Add((id.Name, value, ctx));
}
return result;
}
}
public class DynamicExpressoEvaluator : IExpressionEvaluator
{
private readonly Dictionary<string, Delegate> _expressionCache = new();
private readonly object _cacheLock = new();
private readonly Interpreter _sharedInterpreter = new();
public double EvaluateDouble(string expression, IDictionary<string, object> variables) => (double)CompileExpression(expression, variables).DynamicInvoke(variables.Values.ToArray());
public bool EvaluateBool(string expression, IDictionary<string, object> variables) => (bool)CompileExpression(expression, variables).DynamicInvoke(variables.Values.ToArray());
private Delegate CompileExpression(string expression, IDictionary<string, object> variables)
{
string cacheKey = GenerateCacheKey(expression, variables);
lock (_cacheLock)
{
if (!_expressionCache.TryGetValue(cacheKey, out var compiledExpression))
{
var lambda = _sharedInterpreter.Parse(expression, BuildParameters(variables));
compiledExpression = lambda.Compile<Func<object[], object>>();
_expressionCache[cacheKey] = compiledExpression;
}
return compiledExpression;
}
}
private string GenerateCacheKey(string expression, IDictionary<string, object> variables)
{
var sb = new StringBuilder("expr:").Append(expression).Append(':');
foreach (var kv in variables.OrderBy(k => k.Key)) sb.Append(kv.Key).Append('_').Append(kv.Value?.GetType().Name ?? "null").Append(';');
return sb.ToString();
}
private Parameter[] BuildParameters(IDictionary<string, object> variables) => variables.Select(kv => new Parameter(kv.Key, (kv.Value ?? new object()).GetType(), kv.Value ?? new object())).ToArray();
}
public class StrategyEvaluator<TNode> : IStrategyEvaluator<TNode> where TNode : class
{
public Dictionary<NodeDedupeContext<TNode>, double> Evaluate(IReadOnlyList<NodeDedupeContext<TNode>> duplicateGroup, string identifierValue, IReadOnlyList<DedupeStrategy> strategies, IExpressionEvaluator evaluator)
{
var priorities = duplicateGroup.ToDictionary(ctx => ctx, _ => 0.0);
var orderedStrategies = strategies.Where(s => s != null && s.Enabled).OrderBy(s => s.Order).ToList();
var reusableVariables = new Dictionary<string, object>();
foreach (var ctx in duplicateGroup.Where(c => c != null))
{
double totalPriority = 0;
UpdateVariables(reusableVariables, ctx, identifierValue);
foreach (var strategy in orderedStrategies)
if (evaluator.EvaluateBool(strategy.Condition, reusableVariables))
{
totalPriority += evaluator.EvaluateDouble(strategy.PriorityExpression, reusableVariables);
if (strategy.BreakOnMatch) break;
}
priorities[ctx] = totalPriority;
}
return priorities;
}
private void UpdateVariables(Dictionary<string, object> variables, NodeDedupeContext<TNode> ctx, string identifierValue)
{
variables["node"] = ctx.Node;
variables["chainIndex"] = ctx.ChainIndex;
variables["nodeIndex"] = ctx.NodeIndex;
variables["previousNode"] = ctx.PreviousNode?.Node ?? new object();
variables["nextNode"] = ctx.NextNode?.Node ?? new object();
variables["incomingRefs"] = ctx.IncomingReferences;
variables["outgoingRefs"] = ctx.OutgoingReferences;
variables["duplicatedValue"] = identifierValue ?? string.Empty;
variables["attributes"] = ctx.Attributes;
variables["isDuplicate"] = ctx.IsDuplicate;
variables["originalIdentifier"] = ctx.OriginalIdentifier ?? string.Empty;
variables["newIdentifier"] = ctx.NewIdentifier ?? string.Empty;
variables["incomingRefsCount"] = ctx.IncomingReferences.Count;
variables["outgoingRefsCount"] = ctx.OutgoingReferences.Count;
}
}
public class DecisionResolver<TNode> : IDecisionResolver<TNode> where TNode : class
{
private readonly INameGenerator _nameGenerator;
public DecisionResolver(INameGenerator? nameGenerator = null) => _nameGenerator = nameGenerator ?? new DefaultNameGenerator();
public (NodeDedupeContext<TNode> Preserved, Dictionary<NodeDedupeContext<TNode>, string> NewNames) Resolve(IReadOnlyList<NodeDedupeContext<TNode>> duplicateGroup, string duplicatedValue, Dictionary<NodeDedupeContext<TNode>, double> priorities, IDedupeConfiguration config, INameGenerator nameGenerator)
{
if (duplicateGroup == null) throw new ArgumentNullException(nameof(duplicateGroup));
if (priorities == null) throw new ArgumentNullException(nameof(priorities));
if (config == null) throw new ArgumentNullException(nameof(config));
if (duplicateGroup.Count == 0) throw new InvalidOperationException("Duplicate group is empty");
var preserved = SelectPreservedNode(duplicateGroup, priorities, config, duplicatedValue);
return (preserved, GenerateNewNames(duplicateGroup, preserved, duplicatedValue, config));
}
private NodeDedupeContext<TNode> SelectPreservedNode(IReadOnlyList<NodeDedupeContext<TNode>> duplicateGroup, Dictionary<NodeDedupeContext<TNode>, double> priorities, IDedupeConfiguration config, string duplicatedValue)
{
if (config.ConflictResolution == ConflictResolution.ThrowException) throw new InvalidOperationException($"Duplicate found: {duplicatedValue}");
return config.ConflictResolution switch
{
ConflictResolution.PickHighestPriority => duplicateGroup.OrderByDescending(ctx => priorities[ctx]).First(),
ConflictResolution.PickLowestPriority => duplicateGroup.OrderBy(ctx => priorities[ctx]).First(),
ConflictResolution.PickFirstOccurrence => duplicateGroup.OrderBy(ctx => ctx.ChainIndex).ThenBy(ctx => ctx.NodeIndex).First(),
ConflictResolution.PickLastOccurrence => duplicateGroup.OrderByDescending(ctx => ctx.ChainIndex).ThenByDescending(ctx => ctx.NodeIndex).First(),
ConflictResolution.PickSmallestChainIndex => duplicateGroup.OrderBy(ctx => ctx.ChainIndex).First(),
ConflictResolution.PickLargestChainIndex => duplicateGroup.OrderByDescending(ctx => ctx.ChainIndex).First(),
_ => duplicateGroup.First()
};
}
private Dictionary<NodeDedupeContext<TNode>, string> GenerateNewNames(IReadOnlyList<NodeDedupeContext<TNode>> duplicateGroup, NodeDedupeContext<TNode> preserved, string duplicatedValue, IDedupeConfiguration config)
{
var newNames = new Dictionary<NodeDedupeContext<TNode>, string>();
int duplicateSeq = 1;
var usedNames = new HashSet<string> { duplicatedValue };
foreach (var ctx in duplicateGroup.Where(c => c != preserved))
{
var newName = GenerateUniqueName(ctx, duplicatedValue, duplicateSeq, config, usedNames);
newNames[ctx] = newName;
usedNames.Add(newName);
duplicateSeq++;
}
return newNames;
}
private string GenerateUniqueName(NodeDedupeContext<TNode> ctx, string duplicatedValue, int sequence, IDedupeConfiguration config, HashSet<string> usedNames)
{
string newName = string.Empty;
try
{
var objectContext = new NodeDedupeContext<object>(ctx.Node, ctx.ChainId, ctx.ChainIndex, ctx.NodeIndex) { IsDuplicate = ctx.IsDuplicate, OriginalIdentifier = ctx.OriginalIdentifier, NewIdentifier = ctx.NewIdentifier, IsProcessed = ctx.IsProcessed };
newName = _nameGenerator.Generate(duplicatedValue, objectContext, config);
}
catch { }
if (string.IsNullOrEmpty(newName))
{
var strategy = config.Strategies.FirstOrDefault(s => s != null && s.Enabled && !string.IsNullOrEmpty(s.NewNameTemplate));
newName = strategy?.NewNameTemplate != null ? strategy.NewNameTemplate.Replace("{0}", duplicatedValue).Replace("{1}", ctx.ChainIndex.ToString()).Replace("{2}", sequence.ToString()).Replace("{chainIndex}", ctx.ChainIndex.ToString()).Replace("{nodeIndex}", ctx.NodeIndex.ToString()).Replace("{sequence}", sequence.ToString()) : string.Format(config.DefaultRenamePattern, duplicatedValue, ctx.ChainIndex, sequence);
}
if (string.IsNullOrEmpty(newName)) newName = $"{duplicatedValue}_dup_{sequence}";
return MakeUnique(newName, usedNames);
}
private string MakeUnique(string baseName, HashSet<string> usedNames)
{
if (string.IsNullOrEmpty(baseName)) throw new ArgumentException("Base name cannot be null or empty", nameof(baseName));
string name = baseName;
int suffix = 1, attempts = 0;
while (usedNames.Contains(name) && attempts < Constants.MaxNameGenerationAttempts) { name = $"{baseName}_{suffix++}"; attempts++; }
return attempts >= Constants.MaxNameGenerationAttempts ? $"{baseName}_{Guid.NewGuid().ToString().Substring(0, 8)}" : name;
}
}
public class RenameExecutor<TNode> : IRenameExecutor<TNode> where TNode : class
{
public void Execute(IReadOnlyList<NodeDedupeContext<TNode>> allContexts, Dictionary<string, Dictionary<string, string>> renameMaps, IReadOnlyList<IdentifierDefinition> identifiers)
{
var mainIdentifierMap = BuildMainIdentifierMap(allContexts, renameMaps, identifiers);
foreach (var ctx in allContexts.Where(c => c != null && c.Node != null))
{
bool nodeRenamed = false;
foreach (var id in identifiers.Where(i => i != null))
{
if (id.Getter == null || id.Setter == null) continue;
string? newValue = GetNewValue(ctx, id, renameMaps, mainIdentifierMap);
var oldValue = id.Getter(ctx.Node);
if (!string.IsNullOrEmpty(oldValue) && newValue != null && newValue != oldValue) { id.Setter(ctx.Node, newValue); nodeRenamed = true; }
}
if (nodeRenamed) ctx.IsProcessed = true;
}
}
private Dictionary<string, string> BuildMainIdentifierMap(IReadOnlyList<NodeDedupeContext<TNode>> allContexts, Dictionary<string, Dictionary<string, string>> renameMaps, IReadOnlyList<IdentifierDefinition> identifiers)
{
var mainIdentifierMap = new Dictionary<string, string>();
var mainIdentifier = identifiers.FirstOrDefault(id => id != null && !id.LinkedToMain);
var mainIdentifierName = mainIdentifier?.Name;
if (mainIdentifier != null && !string.IsNullOrEmpty(mainIdentifierName) && renameMaps.TryGetValue(mainIdentifierName!, out var mainMap) && mainIdentifier.Getter != null)
foreach (var ctx in allContexts.Where(c => c != null && c.Node != null))
{
var oldValue = mainIdentifier.Getter(ctx.Node);
if (!string.IsNullOrEmpty(oldValue) && mainMap.TryGetValue(oldValue, out var newVal))
{
var nodeKey = GetNodeKey(ctx.Node);
if (!string.IsNullOrEmpty(nodeKey)) mainIdentifierMap[nodeKey] = newVal;
}
}
return mainIdentifierMap;
}
private string? GetNewValue(NodeDedupeContext<TNode> ctx, IdentifierDefinition id, Dictionary<string, Dictionary<string, string>> renameMaps, Dictionary<string, string> mainIdentifierMap)
{
if (string.IsNullOrEmpty(id.Name)) return null;
if (id.LinkedToMain && !string.IsNullOrEmpty(id.MainIdentifierName))
{
var nodeKey = GetNodeKey(ctx.Node);
if (!string.IsNullOrEmpty(nodeKey) && mainIdentifierMap.TryGetValue(nodeKey, out var mainNew)) return mainNew;
}
else if (renameMaps.TryGetValue(id.Name, out var map) && id.Getter != null)
{
var oldValue = id.Getter(ctx.Node);
if (!string.IsNullOrEmpty(oldValue) && map.TryGetValue(oldValue, out var mapped)) return mapped;
}
return null;
}
private string GetNodeKey(TNode node) => node.ToString() ?? $"node_{node.GetHashCode()}";
}
public class DefaultNameGenerator : INameGenerator
{
private static readonly char[] InvalidChars = { '<', '>', ':', '"', '/', '\\', '|', '?', '*', '[', ']' };
public string Generate(string baseValue, NodeDedupeContext<object> context, IDedupeConfiguration config)
{
if (string.IsNullOrEmpty(baseValue)) throw new ArgumentException("Value cannot be null or empty", nameof(baseValue));
if (context == null) throw new ArgumentNullException(nameof(context));
if (config == null) throw new ArgumentNullException(nameof(config));
try
{
string pattern = config.DefaultRenamePattern;
if (string.IsNullOrEmpty(pattern)) pattern = "{0}_chain{1}_dup{2}";
string generatedName = string.Format(pattern, baseValue, context.ChainIndex, 1);
return SanitizeName(generatedName);
}
catch { return $"{baseValue}_dup_{context.ChainIndex}_{context.NodeIndex}"; }
}
private string SanitizeName(string name)
{
var sb = new StringBuilder(name);
foreach (var invalidChar in InvalidChars) sb.Replace(invalidChar, '_');
for (int i = 0; i < sb.Length; i++) if (sb[i] < 0x20) sb.Replace(sb[i].ToString(), "_");
string sanitized = sb.ToString();
return string.IsNullOrEmpty(sanitized) ? Constants.DefaultUnnamedValue : sanitized;
}
}
public class ReferenceUpdater<TNode> : IReferenceUpdater<TNode> where TNode : class
{
public void UpdateReferences(IReadOnlyList<NodeDedupeContext<TNode>> allContexts, Dictionary<string, Dictionary<string, string>> renameMaps, IDedupeConfiguration config, INodeHandler<TNode, ChainInfo<TNode>> handler)
{
var referenceMaps = BuildReferenceMaps(renameMaps, config);
foreach (var ctx in allContexts.Where(c => c != null && c.Node != null))
{
bool nodeUpdated = false;
handler.UpdateReferences(ctx.Node, (oldRef, _) =>
{
if (string.IsNullOrEmpty(oldRef)) return oldRef;
string newRef = referenceMaps.TryGetValue(oldRef, out var mapped) ? mapped : oldRef;
if (newRef != oldRef) nodeUpdated = true;
return newRef;
});
if (nodeUpdated) ctx.IsProcessed = true;
}
}
private Dictionary<string, string> BuildReferenceMaps(Dictionary<string, Dictionary<string, string>> renameMaps, IDedupeConfiguration config)
{
var referenceMaps = new Dictionary<string, string>();
var mainIdentifier = config.Identifiers.FirstOrDefault(id => id != null && !id.LinkedToMain);
if (mainIdentifier != null && !string.IsNullOrEmpty(mainIdentifier.Name) && renameMaps.TryGetValue(mainIdentifier.Name, out var mainMap))
foreach (var entry in mainMap.Where(e => !string.IsNullOrEmpty(e.Key))) referenceMaps[entry.Key] = entry.Value;
foreach (var kv in renameMaps.Where(k => !string.IsNullOrEmpty(k.Key)))
foreach (var entry in kv.Value.Where(e => !string.IsNullOrEmpty(e.Key) && !referenceMaps.ContainsKey(e.Key))) referenceMaps[entry.Key] = entry.Value;
return referenceMaps;
}
}
public class SmartDeduplicationEngine<TNode> : ISmartDeduplicationEngine<TNode> where TNode : class
{
private readonly IChainTopologyAnalyzer<TNode> _topologyAnalyzer;
private readonly IIdentifierExtractor<TNode> _identifierExtractor;
private readonly IStrategyEvaluator<TNode> _strategyEvaluator;
private readonly IDecisionResolver<TNode> _decisionResolver;
private readonly IRenameExecutor<TNode> _renameExecutor;
private readonly IReferenceUpdater<TNode> _referenceUpdater;
private readonly IExpressionEvaluator _expressionEvaluator;
private readonly GraphFoundation.Normalization.Handlers.INameGenerator _nameGenerator;
public SmartDeduplicationEngine(IChainTopologyAnalyzer<TNode>? topologyAnalyzer = null, IIdentifierExtractor<TNode>? identifierExtractor = null, IStrategyEvaluator<TNode>? strategyEvaluator = null, IDecisionResolver<TNode>? decisionResolver = null, IRenameExecutor<TNode>? renameExecutor = null, IReferenceUpdater<TNode>? referenceUpdater = null, IExpressionEvaluator? expressionEvaluator = null, GraphFoundation.Normalization.Handlers.INameGenerator? nameGenerator = null) =>
(_topologyAnalyzer, _identifierExtractor, _strategyEvaluator, _nameGenerator, _decisionResolver, _renameExecutor, _referenceUpdater, _expressionEvaluator) =
(topologyAnalyzer ?? new ChainTopologyAnalyzer<TNode>(), identifierExtractor ?? new IdentifierExtractor<TNode>(), strategyEvaluator ?? new StrategyEvaluator<TNode>(), nameGenerator ?? new GraphFoundation.Normalization.Handlers.DefaultNameGenerator(),
decisionResolver ?? new DecisionResolver<TNode>(nameGenerator as INameGenerator ?? new DefaultNameGenerator()), renameExecutor ?? new RenameExecutor<TNode>(), referenceUpdater ?? new ReferenceUpdater<TNode>(), expressionEvaluator ?? new DynamicExpressoEvaluator());
public DeduplicationResult<TNode> Execute(IList<ChainInfo<TNode>> chains, IDedupeConfiguration config, INodeHandler<TNode, ChainInfo<TNode>> handler)
{
if (chains == null) throw new ArgumentNullException(nameof(chains));
if (config == null) throw new ArgumentNullException(nameof(config));
if (handler == null) throw new ArgumentNullException(nameof(handler));
if (!config.IsValid()) throw new InvalidOperationException("Invalid configuration");
var normalizedHandler = new NodeHandlerAdapter<TNode, ChainInfo<TNode>>(handler);
var (graph, locations) = ConvertToGraph(chains, normalizedHandler);
var normEngine = CreateNormalizationEngine(graph, locations, normalizedHandler, (DedupeConfiguration)config);
var normResult = normEngine.Normalize(locations);
var processedChains = ConvertFromGraph(graph, locations, normalizedHandler, chains);
var renameMaps = BuildRenameMaps(normResult);
return new DeduplicationResult<TNode> { ProcessedChains = processedChains, RenameMaps = renameMaps };
}
private (GraphFoundation.Core.ICoreGraph<string, TNode, object> Graph, List<NodeLocation<string>> Locations) ConvertToGraph(IList<ChainInfo<TNode>> chains, INodeHandler<TNode, ChainInfo<TNode>> handler)
{
var graph = new GraphFoundation.Core.Implementation.CoreGraph<string, TNode, object>(new ConcreteContext());
var locations = new List<NodeLocation<string>>();
for (int chainIndex = 0; chainIndex < chains.Count; chainIndex++)
{
var chain = chains[chainIndex];
var nodes = handler.GetNodes(chain);
for (int nodeIndex = 0; nodeIndex < nodes.Count; nodeIndex++)
{
var node = nodes[nodeIndex];
if (node == null) continue;
var key = handler.GetKey(node);
if (string.IsNullOrEmpty(key)) continue;
graph.AddNode(key, node, null);
locations.Add(new NodeLocation<string>(key, chainIndex, nodeIndex, chain));
}
}
return (graph, locations);
}
private NormalizationEngine<string, TNode, object> CreateNormalizationEngine(
GraphFoundation.Core.ICoreGraph<string, TNode, object> graph,
List<NodeLocation<string>> locations,
INodeHandler<TNode, ChainInfo<TNode>> handler,
DedupeConfiguration config)
{
var graphAccessor = new GraphFoundation.Normalization.Bridge.DefaultGraphAccessor<string, TNode, object>(graph);
var commandFactory = new GraphFoundation.Normalization.Bridge.DefaultCommandFactory<string, TNode, object>();
var handlers = new List<IIdentifierHandler<string, TNode>>
{
new KeyHandler<TNode, ChainInfo<TNode>>(handler),
new SymbolHandler<TNode, ChainInfo<TNode>>(handler),
new AdditionalSymbolsHandler<TNode, ChainInfo<TNode>>(handler)
};
var normContext = new GraphFoundation.Normalization.DefaultNormalizationContext();
var idGenerator = new GraphFoundation.Normalization.DefaultUniqueIdGenerator(config.DefaultRenamePattern, config.MaxDuplicates);
var cloner = new GraphFoundation.Normalization.DeepClonerAdapter<TNode>();
var referenceResolver = new GraphFoundation.Normalization.Processing.DefaultReferenceResolver<string, TNode>();
var referenceUpdater = new ReferenceUpdater<TNode, ChainInfo<TNode>>(handler);
return new GraphFoundation.Normalization.Processing.NormalizationEngine<string, TNode, object>(
graphAccessor,
commandFactory,
handlers,
normContext,
config.NormalizationConfig,
idGenerator,
cloner,
referenceResolver,
referenceUpdater
);
}
private List<ChainInfo<TNode>> ConvertFromGraph(
GraphFoundation.Core.ICoreGraph<string, TNode, object> graph,
List<NodeLocation<string>> locations,
INodeHandler<TNode, ChainInfo<TNode>> handler,
IList<ChainInfo<TNode>> originalChains)
{
var processedChains = new List<ChainInfo<TNode>>();
var chainLocations = locations.GroupBy(loc => loc.ChainIndex).ToDictionary(g => g.Key, g => g.OrderBy(loc => loc.NodeIndex).ToList());
foreach (var kv in chainLocations)
{
var chainIndex = kv.Key;
if (chainIndex < 0 || chainIndex >= originalChains.Count) continue;
var chainLocationsList = kv.Value;
var originalChain = originalChains[chainIndex];
var clonedChain = handler.CloneChain(originalChain);
var nodes = chainLocationsList.Select(loc => graph.GetNodeData(loc.Key)).Where(data => data != null).ToList();
handler.SetNodes(clonedChain, nodes as IList<TNode?>);
if (nodes.Any())
{
handler.SetStartNode(clonedChain, nodes.First()!);
handler.SetEndNode(clonedChain, nodes.Last()!);
}
processedChains.Add(clonedChain);
}
return processedChains;
}
private Dictionary<string, Dictionary<string, string>> BuildRenameMaps(NormalizationResult<string, TNode> normResult)
{
var renameMaps = new Dictionary<string, Dictionary<string, string>>();
if (normResult.KeyRenameMap.Count > 0)
renameMaps["Key"] = normResult.KeyRenameMap.ToDictionary(kv => kv.Key, kv => kv.Value.NewNames.FirstOrDefault() ?? kv.Key);
if (normResult.SymbolRenameMap.Count > 0)
renameMaps["Symbol"] = normResult.SymbolRenameMap.ToDictionary(kv => kv.Key, kv => kv.Value.NewNames.FirstOrDefault() ?? kv.Key);
foreach (var (handler, renameMap) in normResult.HandlerRenameMaps)
renameMaps[handler.GetType().Name] = renameMap.ToDictionary(kv => kv.Key, kv => kv.Value.NewNames.FirstOrDefault() ?? kv.Key);
return renameMaps;
}
private class KeyHandler<TNodeHandler, TChainHandler> : IIdentifierHandler<string, TNodeHandler> where TNodeHandler : class where TChainHandler : class
{
private readonly INodeHandler<TNodeHandler, TChainHandler> _handler;
public KeyHandler(INodeHandler<TNodeHandler, TChainHandler> handler) => _handler = handler;
public bool IsUniqueGlobally => true;
public bool SupportsInPlaceUpdate => true;
public IEnumerable<(string identifierValue, NodeLocation<string> location, bool isMainIdentifier)> ExtractValues(string key, TNodeHandler data, NodeLocation<string> loc)
{
var nodeKey = _handler.GetKey(data);
if (!string.IsNullOrEmpty(nodeKey)) yield return (nodeKey, loc, true);
}
public TNodeHandler UpdateValue(string key, TNodeHandler data, string oldValue, string newValue, bool isMain) { _handler.SetKey(data, newValue); return data; }
}
private class SymbolHandler<TNodeHandler, TChainHandler> : IIdentifierHandler<string, TNodeHandler> where TNodeHandler : class where TChainHandler : class
{
private readonly INodeHandler<TNodeHandler, TChainHandler> _handler;
public SymbolHandler(INodeHandler<TNodeHandler, TChainHandler> handler) => _handler = handler;
public bool IsUniqueGlobally => false;
public bool SupportsInPlaceUpdate => true;
public IEnumerable<(string identifierValue, NodeLocation<string> location, bool isMainIdentifier)> ExtractValues(string key, TNodeHandler data, NodeLocation<string> loc)
{
var symbol = _handler.GetSymbol(data);
if (!string.IsNullOrEmpty(symbol)) yield return (symbol, loc, true);
}
public TNodeHandler UpdateValue(string key, TNodeHandler data, string oldValue, string newValue, bool isMain) { _handler.SetSymbol(data, newValue); return data; }
}
private class AdditionalSymbolsHandler<TNodeHandler, TChainHandler> : IIdentifierHandler<string, TNodeHandler> where TNodeHandler : class where TChainHandler : class
{
private readonly INodeHandler<TNodeHandler, TChainHandler> _handler;
public AdditionalSymbolsHandler(INodeHandler<TNodeHandler, TChainHandler> handler) => _handler = handler;
public bool IsUniqueGlobally => false;
public bool SupportsInPlaceUpdate => true;
public IEnumerable<(string identifierValue, NodeLocation<string> location, bool isMainIdentifier)> ExtractValues(string key, TNodeHandler data, NodeLocation<string> loc)
{
foreach (var symbol in _handler.GetAdditionalSymbols(data))
if (!string.IsNullOrEmpty(symbol)) yield return (symbol, loc, false);
}
public TNodeHandler UpdateValue(string key, TNodeHandler data, string oldValue, string newValue, bool isMain) { var symbols = _handler.GetAdditionalSymbols(data).ToList(); symbols[symbols.IndexOf(oldValue)] = newValue; _handler.SetAdditionalSymbols(data, symbols); return data; }
}
private class ReferenceUpdater<TNodeHandler, TChainHandler> : GraphFoundation.Normalization.IReferenceFieldUpdater<TNodeHandler> where TNodeHandler : class where TChainHandler : class
{
private readonly INodeHandler<TNodeHandler, TChainHandler> _handler;
public ReferenceUpdater(INodeHandler<TNodeHandler, TChainHandler> handler) => _handler = handler;
public void UpdateReferences(TNodeHandler data, Func<string?, string?> resolve) => _handler.UpdateReferences(data, (refKey, node) => resolve(refKey) ?? refKey);
}
}
public class NodeHandlerAdapter<TNode, TChain> : INodeHandler<TNode, TChain> where TNode : class where TChain : class
{
private readonly INodeHandler<TNode, TChain> _innerHandler;
public NodeHandlerAdapter(INodeHandler<TNode, TChain> innerHandler) => _innerHandler = innerHandler ?? throw new ArgumentNullException(nameof(innerHandler));
public string GetKey(TNode node) => _innerHandler.GetKey(node) ?? string.Empty;
public void SetKey(TNode node, string key) { }
public string GetSymbol(TNode node) => GetKey(node);
public void SetSymbol(TNode node, string symbol) { }
public IEnumerable<string> GetAdditionalSymbols(TNode node) => Enumerable.Empty<string>();
public void SetAdditionalSymbols(TNode node, IEnumerable<string> symbols) { }
public TNode CloneNode(TNode node) => node;
public TChain CloneChain(TChain chain) => _innerHandler.CloneChain(chain);
public void UpdateReferences(TNode node, Func<string, TNode, string> resolve) => _innerHandler.UpdateReferences(node, resolve);
public IList<TNode> GetNodes(TChain chain) => _innerHandler.GetNodes(chain);
public void SetNodes(TChain chain, IList<TNode> nodes) => _innerHandler.SetNodes(chain, nodes);
public TNode GetStartNode(TChain chain) => _innerHandler.GetStartNode(chain);
public void SetStartNode(TChain chain, TNode node) => _innerHandler.SetStartNode(chain, node);
public TNode GetEndNode(TChain chain) => _innerHandler.GetEndNode(chain);
public void SetEndNode(TChain chain, TNode node) => _innerHandler.SetEndNode(chain, node);
}
}
#endregion