Information Theoretical Estimators (ITE) Toolbox的使用（MATLAB）

Information Theoretical Estimators (ITE) Toolbox是什么

ITE is can estimate several entropy, mutual information, divergence, association measures, cross quantities and kernels on distributions. Thanks to its highly modular design, ITE supports additionally (i) the combinations of the estimation techniques, (ii) the easy construction and embedding of novel information theoretical estimators, and (iii) their immediate application in information theoretical optimization problems.

ITE is

(i) written in Matlab/Octave,

(ii) multi-platform (tested extensively on Windows and Linux),

(iii) free and open source (released under the GNU GPLv3(>=) license).

ITE offers

(i) solvers for Independent Subspace Analysis (ISA), and its extensions to different linear-, controlled-, post nonlinear-, complex valued-, partially observed models, as well as to systems with nonparametric source dynamics.

(ii) several consistency tests (analytical vs estimated value),

(iii) illustrations for information theoretical image registration and distribution regression [supervised entropy learning and aerosol prediction based on multispectral satellite images].

For further details, see "https://bitbucket.org/szzoli/ite-in-python/" (Python: new), "https://bitbucket.org/szzoli/ite/" (Matlab/Octave)

简言之，用来计算变量之间的熵的工具箱。

作者接触是因为审稿人一再要求用这个工具计算信息熵，可能也算是权威吧。

我之前的实验都是用MATLAB做的，所以优先选择了MATLAB来进行实验，但因为未知的原因，实验结果和我预期的相差很大，因此，我放弃了。以下是当初实验的尝试和自己的解读，读者慎重参考。

Information Theoretical Estimators (ITE) Toolbox怎么安装

可以用code目录下的install.m文件进行安装（不过我没成功，中途总报错，不知道什么原因，遂放弃）

下载zip包，并解压缩。
因为作者之前的实验都是基于MATLAB写的，所以我选了matlab语言写的，在网址"https://bitbucket.org/szzoli/ite/" (Matlab/Octave)下了zip包，然后解压缩放到一个我本地电脑管理的matlab代码存放文件夹处。
设置路径，保存并关闭
如下图所示，点击"主页"栏的"设置路径"
因为ITE有子文件夹，所以选择"添加并包含子文件夹"。
然后选择刚刚ITE解压的那个文件夹路径，具体就不截屏了。选择之后保存，然后关闭即可。
更新工具箱路径缓存
"预设项"------"常规"------"更新工具箱路径缓存"
测试是否可以使用
简单的方法，测试该函数是否存在，这个函数是ITE工具箱里的。

matlab 复制代码

help D_initialization

更官方的方法，参考使用手册。如下图所示，下载对应的pdf，我下载的版本对应pdf是第一个。网址是手册地址

使用ITE工具箱

KGV

IKGV_estimation

基于高斯分布的互信息函数：IKGV_estimation（pdf中的KGV方法）。

应该是 I ( X ; Y ) I(X;Y) I(X;Y)。

matlab 复制代码

%function [I] = IKGV_estimation(Y,ds,co)
%Estimates mutual information (I) using the KGV (kernel generalized variance) method. 
%
%We use the naming convention 'I<name>_estimation' to ease embedding new mutual information estimation methods.
%
%INPUT:
%   Y: Y(:,t) is the t^th sample.
%  ds: subspace dimensions. ds(m) = dimension of the m^th subspace, m=1,...,M (M=length(ds)).
%  co: mutual information estimator object.
%
%REFERENCE:
%   Zoltan Szabo, Barnabas Poczos, Andras Lorincz. Undercomplete Blind Subspace Deconvolution. Journal of Machine Learning Research 8(May):1063-1095, 2007. (multidimensional case, i.e., ds(j)>=1)
%   Francis Bach, Michael I. Jordan. Kernel Independent Component Analysis. Journal of Machine Learning Research, 3: 1-48, 2002. (one-dimensional case, i.e., ds(1)=ds(2)=...=ds(end)=1)

进行互信息函数计算之前的初始化

复制代码

 function [co] = IKGV_initialization(mult)
 function [co] = IKGV_initialization(mult,post_init)
 Initialization of the KGV (kernel generalized variance) mutual information estimator.
 
 Note:
    1)The estimator is treated as a cost object (co).
    2)We use the naming convention 'I<name>_initialization' to ease embedding new mutual information estimation methods.
 
 INPUT:
    mult: is a multiplicative constant relevant (needed) in the estimation; '=1' means yes (='exact' estimation), '=0' no (=estimation up to 'proportionality').
    post_init: {field_name1,field_value1,field_name2,field_value2,...}; cell array containing the names and the values of the cost object fields that are to be used
    (instead of their default values). For further details, see 'post_initialization.m'.
 OUTPUT:
    co: cost object (structure).

为了求解 I ( A ; B ) − I ( B ; E ) I(\mathbf{A};\mathbf{B})-I(\mathbf{B};\mathbf{E}) I(A;B)−I(B;E)，首先按照变量 A \mathbf{A} A, B \mathbf{B} B, E \mathbf{E} E的定义，生成3个二维变量，从统计的角度来说，满足 A ∼ C N ( 0 , δ a 2 ) \mathbf{A} \sim \mathcal{C}\mathcal{N}\left( {0,\delta_a^2}\right) A∼CN(0,δa2)， B ∼ C N ( 0 , δ b 2 ) \mathbf{B} \sim \mathcal{C}\mathcal{N}\left( {0,\delta_b^2}\right) B∼CN(0,δb2)， E ∼ C N ( 0 , δ e 2 ) \mathbf{E} \sim \mathcal{C}\mathcal{N}\left( {0,\delta_e^2}\right) E∼CN(0,δe2)， A \mathbf{A} A, B \mathbf{B} B, E \mathbf{E} E有内在联系。

比如matlab生成了5000个 A \mathbf{A} A, B \mathbf{B} B, E \mathbf{E} E，求 I ( A ∣ B ) I(\mathbf{A}|\mathbf{B}) I(A∣B)和 I ( B ∣ E ) I(\mathbf{B}|\mathbf{E}) I(B∣E)，即

复制代码

ds=[1;1];
mult=1;
Y_1=[A;B];
Y_2=[B;E];
co_1=IKGV_initialization(mult);
co_2=IKGV_initialization(mult);
I_1=IKGV_estimation(Y_1,ds,co_1);
I_2=IKGV_estimation(Y_2,ds,co_2);

condHShannon_HShannon_estimation(A,B,co)和

condHShannon_HShannon_estimation(B,C,co)。

Estimation of Conditional Quantities

条件熵 H ( X ∣ Y ) H(X|Y) H(X∣Y)

一般推导式为： H ( X ∣ Y ) = H ( X , Y ) − H ( Y ) H(X|Y)=H(X,Y)-H(Y) H(X∣Y)=H(X,Y)−H(Y)

参照上述方式，按照变量 A \mathbf{A} A, B \mathbf{B} B, C \mathbf{C} C的定义，生成3个二维变量，从统计的角度来说，满足
A ∼ C N ( 0 , δ a 2 ) \mathbf{A} \sim \mathcal{C}\mathcal{N}\left( {0,\delta_a^2}\right) A∼CN(0,δa2)， B ∼ C N ( 0 , δ b 2 ) \mathbf{B} \sim \mathcal{C}\mathcal{N}\left( {0,\delta_b^2}\right) B∼CN(0,δb2)， C ∼ C N ( 0 , δ c 2 ) \mathbf{C} \sim \mathcal{C}\mathcal{N}\left( {0,\delta_c^2}\right) C∼CN(0,δc2)， A \mathbf{A} A, B \mathbf{B} B, C \mathbf{C} C有内在联系。

比如matlab生成了5000个 A \mathbf{A} A, B \mathbf{B} B, C \mathbf{C} C，求 H ( A ∣ B ) H(\mathbf{A}|\mathbf{B}) H(A∣B)和 H ( B ∣ C ) H(\mathbf{B}|\mathbf{C}) H(B∣C)，即condHShannon_HShannon_estimation(A,B,co)和condHShannon_HShannon_estimation(B,C,co)。
ds=[2;2]
先合成 A \mathbf{A} A, B \mathbf{B} B的数据（成为一个4x5000的矩阵Y_1），求 H ( A ∣ B ) H(\mathbf{A}|\mathbf{B}) H(A∣B)；
再合成 B \mathbf{B} B, C \mathbf{C} C的数据（成为一个4x5000的矩阵Y_2），求 H ( B ∣ C ) H(\mathbf{B}|\mathbf{C}) H(B∣C)；

条件互信息 I ( X ; Y ∣ Z ) I(X;Y|Z) I(X;Y∣Z)

一般推导式为： I ( X ; Y ∣ Z ) = − H ( X , Y , Z ) + H ( X , Z ) + H ( Y , Z ) − H ( Z ) I(X;Y|Z)=-H(X,Y,Z)+H(X,Z)+H(Y,Z)-H(Z) I(X;Y∣Z)=−H(X,Y,Z)+H(X,Z)+H(Y,Z)−H(Z)

比如matlab生成了5000个 A \mathbf{A} A, B \mathbf{B} B, C \mathbf{C} C，

ds=[2;2;2]

合成 A \mathbf{A} A, B \mathbf{B} B, C \mathbf{C} C的数据（成为一个6x5000的矩阵Y），求 I ( A ; B ∣ C ) I(\mathbf{A};\mathbf{B}|\mathbf{C}) I(A;B∣C)，即condIShannon_HShannon_estimation(Y,ds,co)。

复高斯变量的条件信息熵

假设X,Y分别服从 C N ( 0 , σ X 2 ) \mathcal{C}\mathcal{N}\left( {0,\sigma_X^2}\right) CN(0,σX2)和 C N ( 0 , σ Y 2 ) \mathcal{C}\mathcal{N}\left( {0,\sigma_Y^2}\right) CN(0,σY2)，即X的实部和虚部都服从均值为0，方差为 σ X 2 2 \frac{\sigma_X^2}{2} 2σX2的高斯分布，Y的实部和虚部都服从均值为0，方差为 σ Y 2 2 \frac{\sigma_Y^2}{2} 2σY2的高斯分布。X和Y是二维变量。

以及ITE包里condH_estimation函数的描述

matlab 复制代码

function [condH] = condH_estimation(Y1,Y2,co)
 Conditional entropy estimation (condH) of Y using the specified conditional entropy estimator.
 
 INPUT:
    Y1: Y(:,t1) is the t1^th sample.
    Y2: Y(:,t2) is the t2^th sample.
   co: conditional entropy estimator object.

拓展：

实数，计算互信息量互信息概述与matlab实现
I = − 1 2 log 2 ( 1 − ρ 2 ) I=-\frac{1}{2}\text{log}_2(1-\rho^2) I=−21log2(1−ρ2)