Arikah Map

Total correlation

(Redirected from Total Correlation)

The total correlation (Watanabe 1960) is one of several generalizations of the mutual information. It is also known as the multivariate constraint (Garner 1962) or multiinformation (Studený & Vejnarová 1999). It expresses the amount redundancy or dependency (and thus structure) existing among a set of variables.

For a given set of n variables <math>\{X_{1},X_{2},\ldots,X_{n}\}</math>, the total correlation <math>C(X_{1},X_{2},\ldots,X_{n})</math> is given by


<math>C(X_{1},X_{2},\ldots,X_{n}) = \sum_{i=1}^{n} H(X_{i})-H(X_{1},X_{2},\ldots,X_{n})</math>


where <math>H(X_{i}) \,</math> is the information entropy of variable <math>X_{i} \,</math>, and <math>H(X_{1},X_{2},\ldots,X_{n})</math> is the joint entropy of the variable set <math>\{X_{1},X_{2},\ldots,X_{n}\}</math>. In terms of the discrete probabilitydistributions on variables <math>\{X_{1},X_{2},\ldots,X_{n}\}</math>, the totalcorrelation is given by


<math>C(X_{1},X_{2},\ldots,X_{n})= \sum_{x_{1}\in\mathcal{X}_{1}} \sum_{x_{2}\in\mathcal{X}_{2}} \ldots \sum_{x_{n}\in\mathcal{X}_{n}} p(x_{1},x_{2},\ldots,x_{n})\log\frac{p(x_{1},x_{2},\ldots,x_{n})} {p(x_{1})p(x_{2})\cdots p(x_{n})}

</math>


The total correlation is the amount of information shared among the variables in the set. The sum <math>\begin{matrix}\sum_{i=1}^{n} H(X_{i})\end{matrix}</math> represents the amount of information in bits (assuming base-2 logs) that the variables would possess if they were totally independent of one another (non-redundant), or, equivalently, the average code length to transmit the values of all variables if each variable was (optimally) coded independently. The term <math>H(X_{1},X_{2},\ldots ,X_{n})</math> is the actual amount of information that the variable set contains, or equivalently, the average code length to transmit the values of all variables if the set of variables was (optimally) coded together. The difference betweenthese terms therefore represents the absolute redundancy (in bits) present in the givenset of variables, and thus provides a general quantitative measure of thestructure or organization embodied in the set of variables(Rothstein 1952). The total correlation is also the Kullback-Leibler divergence between the actual distribution <math>p(X_{1},X_{2},\ldots,X_{n})</math> and its maximum entropy product approximation <math>p(X_{1})p(X_{2})\cdots p(X_{n})</math>.

Total correlation tells us in the most general sense how cohesive or related are a group of variables. A near-zero total correlation indicates that the variables in the group are essentially statistically independent; they are completely unrelated, in the sense that knowing the value of one variable does not provide any clue as to the values of the other variables. On the other hand, the maximum total correlation, given by

<math>C_{max} = \sum_{i=1}^{n} H(X_{i})-\max\limits_{X_{i}}H(X_{i})</math>

occurs when one of the variables is completely redundant with all of the other variables. The variables are then maximally related in the sense that knowing the value of one variable providescomplete information about the values of all the other variable, and the variables can be figuratively regarded as cogs, in which the position of one cog determines the positions of all the others (Rothstein 1952).

It is important to note that the total correlation counts up all the redundancies among a set of variables, but that these redundancies may be distributed throughout the variable set in a variety of complicated ways (Garner 1962). For example, some variables in the set may be totally inter-redundant while others in the set are completely independent. Perhaps more significantly, redundancy may be carried in interactions of various degrees: A group of variables may not possess any pairwise redundancies, but may posses higher-order interaction redundancies of the kindexemplified by the parity function. The decomposition of total correlation into its constituent redundancies is explored in a number sources (Mcgill 1954, Watanabe 1960, Garner 1962, Studeny & Vejnarova 1999, Jakulin & Bratko 2003a, Jakulin & Bratko 2003b, Nemenman 2004, Han 1978, Han 1980).

Uses of Total Correlation

Clustering and feature selection algorithms based on total correlation have been explored by Watanabe.

References

Categories


Information theory

Find

Find

Find