马尔可夫转换场 python 实践

研究了一下将一维信号转成二维图像的方法，这里对马尔可夫转换场单独整理一下，方便以后查阅。

马尔可夫转换场 python 实践

数据概览
马尔可夫转换场（MTF）
- [MTF 概述](#MTF 概述)
- [MTF 详细计算过程](#MTF 详细计算过程)
- - [Step 1. 数据离散化](#Step 1. 数据离散化)
  - [Step 2. 建立马尔可夫转换矩阵](#Step 2. 建立马尔可夫转换矩阵)
  - [Step 3. 计算转换概率](#Step 3. 计算转换概率)
  - [Step 4. 计算马尔可夫转换场(MTF)](#Step 4. 计算马尔可夫转换场(MTF))
  - [Step 5. MTF聚合压缩](#Step 5. MTF聚合压缩)
  - [Step 6. 提取一些有意义的信息和指标](#Step 6. 提取一些有意义的信息和指标)
  - [Step 7. 转换概率映射回原始信号](#Step 7. 转换概率映射回原始信号)
网络图的讨论
- 概述
- 网络图构建的详细过程
- - [Step 1. 构建网络图](#Step 1. 构建网络图)
  - [Step 2. 计算分区和模块用来表示图](#Step 2. 计算分区和模块用来表示图)
  - [Step 3. 画出新图](#Step 3. 画出新图)
  - [Step 4. 将分区颜色映射回原来的时序](#Step 4. 将分区颜色映射回原来的时序)
Bug修改
- [module 'matplotlib.cm' has no attribute 'get_cmap'](#module 'matplotlib.cm' has no attribute 'get_cmap')
- [module 'networkx' has no attribute 'from_numpy_matrix'](#module 'networkx' has no attribute 'from_numpy_matrix')
参考资料

马尔可夫转移场（MTF）是一种突出时间序列行为的可视化技术。这里深入探讨了如何构建和解释这些领域。

然后，在马尔可夫转换场的基础上进一步探索网络图的解释。

python 复制代码

import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import sys

from matplotlib import gridspec
from numba import njit, prange
from pyts.image import MarkovTransitionField

import tsia.plot
import tsia.markov
import tsia.network_graph

python 复制代码

%matplotlib inline
plt.style.use('Solarize_Light2')

数据概览

使用的数据全部采集自工业设备，时间跨度为一年，采集频率为一个小时。

加载数据

python 复制代码

DATA = "/home/torch/LiangGuangjin/mtf-deep-dive-main/data/"
tag_df = pd.read_csv(os.path.join(DATA, 'signal-1.csv'))
tag_df['timestamp'] = pd.to_datetime(tag_df['timestamp'], format="%Y-%m-%d %H:%M:%S")
tag_df = tag_df.set_index('timestamp')

将数据以波形图的形式绘制

python 复制代码

fig = plt.figure(figsize=(28,4))
plt.plot(tag_df, linewidth=0.5)
plt.show()

马尔可夫转换场（MTF）

MTF 概述

pyts 软件包包含一些开箱即用的时间序列成像功能，包括 MTF：通过这种转换，之前的信号可以用下面的矩阵来表示：

计算马尔可夫转换场

python 复制代码

n_bins = 8
strategy = 'quantile'
X = tag_df.values.reshape(1, -1)
n_samples, n_timestamps = X.shape

mtf = MarkovTransitionField(image_size=0.01, n_bins=n_bins, strategy=strategy)
tag_mtf = mtf.fit_transform(X)

绘制马尔可夫转换场

python 复制代码

fig = plt.figure(figsize=(5,4))
ax = fig.add_subplot(111)
_, mappable_image = tsia.plot.plot_markov_transition_field(mtf=tag_mtf[0], ax=ax, reversed_cmap=True)
plt.colorbar(mappable_image)

MTF 详细计算过程

下面我们将详细拆解一下MTF，以便深入理解MTF的特性从而进一步洞察时序数据。

我们将采用如下步骤：

将时序数据离散化
构建马尔可夫转换矩阵
估算转换概率
计算马尔可夫转换场
计算聚合的MTF
提取有意义的指标
将转换概率映射回初始信号

Step 1. 数据离散化

python 复制代码

X_binned, bin_edges = tsia.markov.discretize(tag_df)

经此运算，时间序列的每个值 x [ i ] x[i] x[i] 都与刚刚计算出的一个分段 b i n bin bin （量化 q u a n t i l e quantile quantile）相关联。

看一下 X _ b i n n e d X\_binned X_binned 的内容：

python 复制代码

unique, counts = np.unique(X_binned, return_counts=True)
print(np.asarray((unique, counts)).T)

> [[  0 552]
>  [  1 552]
>  [  2 552]
>  [  3 552]
>  [  4 552]
>  [  5 552]
>  [  6 552]
>  [  7 552]]

整个信号的长度为 4416 4416 4416，而 552 × 8 = 4416 552\times 8=4416 552×8=4416，这也就说明离散化就是将 X X X 分成了 Q Q Q 个分位数段。

看一下 b i n _ e d g e s bin\_edges bin_edges 的内容：

python 复制代码

bin_edges

> [ 82.4318845 , 118.41579271, 137.42079667, 156.7783225 ,
>   166.35528917, 175.224915  , 183.85208333, 196.53184021,
>   255.378145  ]

第一个和最后一个bin edges分别对应信号的最低值和最高值。中间的bin edges由离散器计算。对于之前加载的信号，量化过程会将该信号的不同取值分入以下bins：

Bin	Start	End
0	< 82.4	82.4
1	82.4	118.4
2	118.4	137.4
3	137.4	156.8
4	156.8	166.4
5	166.4	175.2
6	175.2	183.9
7	183.9	196.5
8	196.5	255.3
9	255.3	> 255.3

在时间序列上绘制这些分段，以直观地了解数值是如何离散的：

python 复制代码

tsia.plot.plot_timeseries_quantiles(tag_df, bin_edges, label='signal-1')
plt.legend()

Step 2. 建立马尔可夫转换矩阵

python 复制代码

X_mtm = tsia.markov.markov_transition_matrix(X_binned)

> array([[465,  86,   1,   0,   0,   0,   0,   0],
>        [ 80, 405,  63,   2,   0,   2,   0,   0],
>        [  3,  59, 379,  96,   9,   2,   2,   1],
>        [  2,   2,  94, 352,  75,  19,   6,   2],
>        [  0,   0,  12,  89, 314, 110,  23,   4],
>        [  0,   0,   0,   9, 125, 312,  86,  20],
>        [  2,   0,   2,   4,  21,  89, 320, 114],
>        [  0,   0,   0,   0,   8,  18, 115, 411]])

简单解读一下，第一个单元的 465 465 465表示有 465 465 465个时间点上从 b i n 0 bin 0 bin0到 b i n 0 bin 0 bin0， b i n 0 bin 0 bin0代表 82.4 82.4 82.4到 118.4 118.4 118.4的区间中；

第一行第二个单元格中的 86 86 86表示有 86 86 86个时间点从 b i n 0 bin 0 bin0转换到了 b i n 1 bin 1 bin1, b i n 1 bin 1 bin1代表 118.4 118.4 118.4到 137.4 137.4 137.4的区间。

其他单元格的意义以此类推。显然对角线上的数值表示了自我转换的计数。

Step 3. 计算转换概率

现在将对每个 b i n bin bin 进行标准化。该矩阵现在包含幅度轴上的转移概率：

python 复制代码

X_mtm = tsia.markov.markov_transition_probabilities(X_mtm)
np.round(X_mtm * 100, 1)

> array([[84.2, 15.6,  0.2,  0. ,  0. ,  0. ,  0. ,  0. ],
>       [14.5, 73.4, 11.4,  0.4,  0. ,  0.4,  0. ,  0. ],
>       [ 0.5, 10.7, 68.8, 17.4,  1.6,  0.4,  0.4,  0.2],
>       [ 0.4,  0.4, 17. , 63.8, 13.6,  3.4,  1.1,  0.4],
>       [ 0. ,  0. ,  2.2, 16.1, 56.9, 19.9,  4.2,  0.7],
>       [ 0. ,  0. ,  0. ,  1.6, 22.6, 56.5, 15.6,  3.6],
>       [ 0.4,  0. ,  0.4,  0.7,  3.8, 16.1, 58. , 20.7],
>       [ 0. ,  0. ,  0. ,  0. ,  1.4,  3.3, 20.8, 74.5]])

规范化数值后得到转化概率。

以第二行为例， 14.5 14.5 14.5 代表有 14.5 % 14.5\% 14.5%的时间点从 b i n 1 bin 1 bin1 转化到了 b i n 0 bin 0 bin0 ; 11.4 11.4 11.4代表有 11.4 11.4% 11.4 的时间点从 b i n 1 bin 1 bin1 转化到了 b i n 2 bin 2 bin2。其他以此类推。

Step 4. 计算马尔可夫转换场(MTF)

转移场的思想是顺序表示马尔可夫转移概率以保留时域中的信息。 MTF 生成过程沿时间顺序排列每个概率，以构建整个信号的 MTF：

python 复制代码

def _markov_transition_field(X_binned, X_mtm, n_timestamps, n_bins):
    X_mtf = np.zeros((n_timestamps, n_timestamps))
 
    # We loop through each timestamp twice to build a N x N matrix:
    for i in prange(n_timestamps):
        for j in prange(n_timestamps):
            # We align each probability along the temporal order:
            # MTF(i,j) denotes the transition probability of the bin
            # "i" to the bin "j":
            X_mtf[i, j] = X_mtm[X_binned[i], X_binned[j]]
 
    return X_mtf

X_mtf = _markov_transition_field(X_binned, X_mtm, n_timestamps, n_bins)
np.round(X_mtf * 100, 1)

> array([[68.8, 68.8, 68.8, ..., 68.8, 68.8, 68.8],
>        [68.8, 68.8, 68.8, ..., 68.8, 68.8, 68.8],
>        [68.8, 68.8, 68.8, ..., 68.8, 68.8, 68.8],
>        ...,
>        [68.8, 68.8, 68.8, ..., 68.8, 68.8, 68.8],
>        [68.8, 68.8, 68.8, ..., 68.8, 68.8, 68.8],
>        [68.8, 68.8, 68.8, ..., 68.8, 68.8, 68.8]])

X_mtf.shape

> (4416, 4416)

MTF主要是在保留时间域信息的情况下顺序表示马尔可夫转换概率。注意MTF是以时间为下标的。我们来解读一下MTF中的数值。例如，MTF[1,2] = 68.8%，代表时间点1和2。我们查一下这两个时间点位于哪个分位数或者说位于哪个bin中：

python 复制代码

(X_binned[1],X_binned[2])

> (2, 2)

这两点都位于bin 2中，也就是说从 b i n 2 bin 2 bin2到 b i n 2 bin 2 bin2的概率为 68.8 % 68.8\% 68.8%。

python 复制代码

(X_mtf[1,6],X_binned[6])

> (0.10707803992740472, 1)

时序点 6 6 6所处的分位桶或者说状态是 1 1 1，所以MTF表明时序点 1 1 1到 6 6 6也就是状态从 2 2 2到 1 1 1的转换概率为 10.7 % 10.7\% 10.7%.

下面画一下完整的MTF的图像:

python 复制代码

fig = plt.figure(figsize=(15,12))
ax = fig.add_subplot(1,1,1)
_, mappable_image = tsia.plot.plot_markov_transition_field(mtf=X_mtf, ax=ax, reversed_cmap=True)
plt.colorbar(mappable_image)

Step 5. MTF聚合压缩

为了使图像大小易于管理，计算效率更高（上述 MTF 矩阵的维数为 4116 x 4116），我们使用模糊核 1 / m 2 1/m² 1/m2 对每个非重叠 m × m m \times m m×m 补丁中的像素进行平均，从而减小 MTF 的大小。 m m m 是图像大小，我们将设置为 48 48 48。也就是说，我们将长度为 m = 48 m = 48 m=48 的每个子序列中的过渡概率汇总在一起。让我们来计算相应的聚合 MTF：如果信号长度可以被选定的图像大小（48）整除（这里就是这种情况），我们就可以用这个简单的函数来计算聚合 MTF：

python 复制代码

image_size = 48
window_size, remainder = divmod(n_timestamps, image_size)

if remainder == 0:
    X_amtf = np.reshape(
        X_mtf, (image_size, window_size, image_size, window_size)
    ).mean(axis=(1, 3))
    
else:
    # Needs to compute piecewise aggregate approximation in this case. This
    # is fully implemented in the pyts package
    pass

python 复制代码

fig = plt.figure(figsize=(5,4))
ax = fig.add_subplot(1,1,1)
_, mappable_image = tsia.plot.plot_markov_transition_field(mtf=X_amtf, ax=ax, reversed_cmap=True)
plt.colorbar(mappable_image)

Step 6. 提取一些有意义的信息和指标

MTF 的对角线包含自转移的概率：

自转移概率是在下一时间步从一个分位数移动到相同分位数的概率）。
我们可以提取该分布的特征（平均值和标准差）。
MTF 的其他对角线较难解释，但仍然可以绘制。

python 复制代码

_ = tsia.plot.plot_mtf_metrics(X_amtf)

Step 7. 转换概率映射回原始信号

采用 MTF 对角线上显示的转移概率将这些概率映射回原始信号的一种方法：

python 复制代码

mtf_map = tsia.markov.get_mtf_map(tag_df, X_amtf, reversed_cmap=True)
_ = tsia.plot.plot_colored_timeseries(tag_df, mtf_map)

让我们看一下粗略的 MTF（image_size=8 会使其变得相当粗略），以更好地理解我们可以从这个马尔可夫场中提取什么：

python 复制代码

def plot_colored_timeseries(tag, image_size=96, colormap='jet'):
    # Loads the signal from disk:
    tag_df = pd.read_csv(os.path.join(DATA, f'{tag}.csv'))
    tag_df['timestamp'] = pd.to_datetime(tag_df['timestamp'], format='%Y-%m-%d %H:%M:%S')
    tag_df = tag_df.set_index('timestamp')

    # Build the MTF for this signal:
    X = tag_df.values.reshape(1, -1)
    mtf = MarkovTransitionField(image_size=image_size, n_bins=n_bins, strategy=strategy)
    tag_mtf = mtf.fit_transform(X)

    # Initializing figure:
    fig = plt.figure(figsize=(28, 4))
    gs = gridspec.GridSpec(1, 2, width_ratios=[1,4])

    # Plotting MTF:
    ax = fig.add_subplot(gs[0])
    ax.set_title('Markov transition field')
    _, mappable_image = tsia.plot.plot_markov_transition_field(mtf=tag_mtf[0], ax=ax, reversed_cmap=True)
    plt.colorbar(mappable_image)
    
    # Plotting signal:
    ax = fig.add_subplot(gs[1])
    ax.set_title(f'Signal timeseries for tag {tag}')
    mtf_map = tsia.markov.get_mtf_map(tag_df, tag_mtf[0], reversed_cmap=True, step_size=0)
    _ = tsia.plot.plot_colored_timeseries(tag_df, mtf_map, ax=ax)
        
    return tag_mtf

python 复制代码

stats = []
mtf = plot_colored_timeseries('signal-1', image_size=8)
s = tsia.markov.compute_mtf_statistics(mtf[0])
s.update({'Signal': 'signal-1'})
stats.append(s)

我们该如何解释这幅图呢？

平均而言，第一部分（黄色）的平均转换概率约为 19%：这意味着，当我们观察整个信号时，我们在该部分看到的转换并不频繁（19% 的时间）。
相比之下，深蓝色部分（第六个部分）的转换在该信号中发生的频率更高（约为 50%）。

网络图的讨论

概述

从MTF中我们可以生成图 G = ( V , E ) G=(V,E) G=(V,E)，节点 V V V 和时间 i i i 是一一对应的关系。我们有两种思路来对这里面的信息进行编码：

Flow encoding： 这种表示法有助于观察大信息流发生在哪里。将时间流映射到顶点，使用从 T 0 T_0 T0 到 T N T_N TN 的颜色梯度为网络图的每个节点着色，并使用 MTF 权重为顶点之间的边着色。
Modularity encoding： 模块化是网络分析中识别特定局部结构的重要模式。我们将模块标签（即社区 ID）映射到每个顶点，并为每个社区附加特定颜色，将顶点的大小映射到聚类系数，并将边的颜色映射到目标顶点的模块标签。

网络图构建的详细过程

下面我们将分解建立图分析的步骤，以求更好地理解其特性同时更好的洞察时序数据中所蕴涵的信息。总共分位四步：

构建信号 MTF 并从中提取网络图
计算图分区和模块并将其编码用来表示图
绘制网络图
将分区颜色映射回时间序列

Step 1. 构建网络图

python 复制代码

G = tsia.network_graph.get_network_graph(tag_mtf[0])
_ = tsia.plot.plot_network_graph(G, title='Network graph')

Step 2. 计算分区和模块用来表示图

上面的网络图并没有给我们带来很多新的见解。我们将使用 Louvain 方法搜索该网络图中的社区：

python 复制代码

encoding = tsia.network_graph.get_modularity_encoding(G)

除了模块化和社群（或分区）数量外，我们还可以从网络图中计算出其他相关统计数据：

python 复制代码

stats = tsia.network_graph.compute_network_graph_statistics(G)
stats

> {
>     'Diameter': 1,
>     'Average degree': 48.0,
>     'Average weighted degree': 46.273244774344036,
>     'Density': 1.0425531914893618,
>     'Average path length': 0.00650027056715701,
>     'Average clustering coefficient': 0.10427714321139174,
>     'Modularity': 0.3747117973757859,
>     'Partitions': 4
> }

Step 3. 画出新图

python 复制代码

nb_partitions = stats['Partitions']
modularity = stats['Modularity']
title = rf'Partitions: $\bf{nb_partitions}$ - Modularity: $\bf{modularity:.3f}$'
_ = tsia.plot.plot_network_graph(G, title=title, encoding=encoding)

社区发现算法在这个信号中抽象出了四个模块或者分区。但我们还不清楚这个新信息对我们有什么用。

Step 4. 将分区颜色映射回原来的时序

python 复制代码

ng_map = tsia.network_graph.get_network_graph_map(tag_df, encoding, reversed_cmap=True)
_ = tsia.plot.plot_colored_timeseries(tag_df, ng_map)

图中所发现的分区在原始时序中都有对应的子序列，并且该子序列具备相对显著的特点，在时序分析中称为 shapelet。

下面画出了其他信号的类似表示：

python 复制代码

def plot_communities_timeseries(tag, image_size=48, colormap='jet'):
    # Loads the signal from disk:
    tag_df = pd.read_csv(os.path.join(DATA, f'{tag}.csv'))
    tag_df['timestamp'] = pd.to_datetime(tag_df['timestamp'], format='%Y-%m-%d %H:%M:%S')
    tag_df = tag_df.set_index('timestamp')
    
    X = tag_df.values.reshape(1, -1)
    mtf = MarkovTransitionField(image_size=image_size, n_bins=n_bins, strategy=strategy)
    tag_mtf = mtf.fit_transform(X)
    
    G = tsia.network_graph.get_network_graph(tag_mtf[0])
    statistics = tsia.network_graph.compute_network_graph_statistics(G)
    nb_partitions = statistics['Partitions']
    modularity = statistics['Modularity']
    encoding = tsia.network_graph.get_modularity_encoding(G, reversed_cmap=True)
    ng_map = tsia.network_graph.get_network_graph_map(tag_df, encoding, reversed_cmap=True)
    
    fig = plt.figure(figsize=(28, 4))
    gs = gridspec.GridSpec(1, 2, width_ratios=[1,4])

    ax = fig.add_subplot(gs[0])
    title = rf'Partitions: $\bf{nb_partitions}$ - Modularity: $\bf{modularity:.3f}$'
    tsia.plot.plot_network_graph(G, ax=ax, title=title, reversed_cmap=True, encoding=encoding)
    
    ax = fig.add_subplot(gs[1])
    tsia.plot.plot_colored_timeseries(tag_df, ng_map, ax=ax)
    
    return statistics

python 复制代码

signals = [f'signal-{i}' for i in range(1,7)]

stats = []
for signal in signals:
    s = plot_communities_timeseries(signal)
    s.update({'Signal': signal})
    stats.append(s)

Bug修改

module 'matplotlib.cm' has no attribute 'get_cmap'

直接追到错误出现的地方：

python 复制代码

if reversed_cmap == True:
    colormap = plt.cm.get_cmap(colormap).reversed()
else:
    colormap = plt.cm.get_cmap(colormap)

修改上述代码为下述模样;

python 复制代码

if reversed_cmap == True:
    colormap = plt.get_cmap(colormap).reversed()
else:
    colormap = plt.get_cmap(colormap)

module 'networkx' has no attribute 'from_numpy_matrix'

在networkx3.3 中**from_numpy_matrix()**函数已经被弃用，需要更改为from_numpy_array()

参考资料

Advanced visualization techniques for time series analysis

https://github.com/michaelhoarau/mtf-deep-dive/tree/main