Entropy and Mutual Information of the bivariate Student t distribution
Available from
Rafael Calsaverini's profile on Mendeley.
Page 1
Entropy and Mutual Information of the bivariate Student t distribution
Entropy and Mutual Information of the bivariate Student t distribution
Rafael Calsaverini
Dep. de Fsica Geral, Instituto de Fsica, Universidade de S~ao Paulo, Brazil
(Dated: May 27, 2009)
This article presents a simple derivation of the Entropy and Mutual Information of the Student
t distribution used in reference We dene the distribution and the concept of Mutual Information
and calculate it using a simple 'replica-like' trick.
I. INTRODUCTION
We present here in some detail a simple derivation of the Mutual Information of a Student-t multivariate distribution
that was brie
y commented in reference The derivation itself was not central to the theme of that article and not
interesting enough to justify another refered article, but still interesting enough for a brief e-print that might be useful
for someone else.
II. STANDARD 1-DIM STUDENT T DISTRIBUTION
If we dene de variable =
P
i=1 zi with zi standard normally distributed variables, (zi N (0; 1)), it's distribution
function is known as the 2-distribution and it is given by:
p(2 = x) = q(x) =
1
2
2 (
2 )
e
x
2 x
2 1
The t-distribution is said to be the distribution of the variable:
t =
r
2
z
with z N (0; 1). The expression for this distribution is easily calculated if we write:
p(t) =
Z 1
0
p
t
2 = x
p
2 = x
dx
The distribution p(tj2 = x) is a normal with zero mean and variance x . So:
p(t) =
Z 1
0
r
x
2
e
x
2 t
2
q(x)dx
=
1
p
2
1
2
2 (
2 )
Z 1
0
x
1
2 e (1+
1
t
2) 2 dx:
The last integral is easily solved remembering the denition of the Gamma function:
1
xz
(z) =
Z 1
0
dssz 1e xs
Therefore we nally have:
p(t) =
(+12 )
(2 )
p
()
1 +
t2
+12
:
III. d-DIM STUDENT T DISTRIBUTION
The same thing can be done in more than one dimension by dening:
ti =
r
2
zi
Rafael Calsaverini
Dep. de Fsica Geral, Instituto de Fsica, Universidade de S~ao Paulo, Brazil
(Dated: May 27, 2009)
This article presents a simple derivation of the Entropy and Mutual Information of the Student
t distribution used in reference We dene the distribution and the concept of Mutual Information
and calculate it using a simple 'replica-like' trick.
I. INTRODUCTION
We present here in some detail a simple derivation of the Mutual Information of a Student-t multivariate distribution
that was brie
y commented in reference The derivation itself was not central to the theme of that article and not
interesting enough to justify another refered article, but still interesting enough for a brief e-print that might be useful
for someone else.
II. STANDARD 1-DIM STUDENT T DISTRIBUTION
If we dene de variable =
P
i=1 zi with zi standard normally distributed variables, (zi N (0; 1)), it's distribution
function is known as the 2-distribution and it is given by:
p(2 = x) = q(x) =
1
2
2 (
2 )
e
x
2 x
2 1
The t-distribution is said to be the distribution of the variable:
t =
r
2
z
with z N (0; 1). The expression for this distribution is easily calculated if we write:
p(t) =
Z 1
0
p
t
2 = x
p
2 = x
dx
The distribution p(tj2 = x) is a normal with zero mean and variance x . So:
p(t) =
Z 1
0
r
x
2
e
x
2 t
2
q(x)dx
=
1
p
2
1
2
2 (
2 )
Z 1
0
x
1
2 e (1+
1
t
2) 2 dx:
The last integral is easily solved remembering the denition of the Gamma function:
1
xz
(z) =
Z 1
0
dssz 1e xs
Therefore we nally have:
p(t) =
(+12 )
(2 )
p
()
1 +
t2
+12
:
III. d-DIM STUDENT T DISTRIBUTION
The same thing can be done in more than one dimension by dening:
ti =
r
2
zi
Page 2
2with zi normal variables with a given covariance matrix. If we use zero means and unit variances (but non-zero
correlations) for the zi variables, the same procedure done in section II gives the standard Student t distribuition:
p^;(t) =
(+d2 )
(2 )
q
()dj^j
"
1 +
tT^ 1t
# +d2
where:
^ is the correlation matrix:
^ =
0
B
B
B
@
1 1;2 1;d
2;1 1 2;d
...
...
. . .
...
d;1 d;2 d;d
1
C
C
C
A
is a parameter called \degrees of freedom"
The normalizing prefactor can also be written as:
1
Z()
= e F () =
(+d2 )
(2 )
q
()dj^j
=
(d2 )
B(2 ;
d
2 )
q
()dj^j
(1)
with B(x; y) = (x) (y) (x+Y ) being the Beta function.
For d = 2, the case for which we calculate the mutual information, this reduces to:
p;(x; y) =
(1 + 2 )
(2 )
p
1 2
1 +
q(x; y)
(1+ 2 )
with q(x; y) =
x2+y2 2xy
1 2
IV. ENTROPY AND MUTUAL INFORMATION
The dierential entropy of a given set of variables t distributed as p(t) is given by[1]:
H[t] =
Z
dnt p(t) log (p(t)) = Et [log p(t)]
The following trick will be useful for the calculation of the entropy of Student t variables. The logarithm function can
be written as the limit:
log(x) = lim
n!0
xn 1
n
= lim
n!0
d
dn
xn:
Using this trick on the integral that denes H[t] we have:
H[t] = lim
n!0
d
dn
Et [p(t)n] = lim
n!0
d
dn
Z
dnt p(t)n+1 (2)
The mutual information is dened as the reduction in the entropy of a given variable given knowledge of another:
I[t1; t2] = H[t1] H[t1jt2] (3)
where the second entropy is calculated using the conditional probability density of t1 given t2. As a functional of the
distributions of these variables, the mutual information is given by:
I[t1; t2] =
Z
pt1;t2(t1; t2) log
pt1;t2(t1; t2)
pt1(t1)pt2(t2)
(4)
correlations) for the zi variables, the same procedure done in section II gives the standard Student t distribuition:
p^;(t) =
(+d2 )
(2 )
q
()dj^j
"
1 +
tT^ 1t
# +d2
where:
^ is the correlation matrix:
^ =
0
B
B
B
@
1 1;2 1;d
2;1 1 2;d
...
...
. . .
...
d;1 d;2 d;d
1
C
C
C
A
is a parameter called \degrees of freedom"
The normalizing prefactor can also be written as:
1
Z()
= e F () =
(+d2 )
(2 )
q
()dj^j
=
(d2 )
B(2 ;
d
2 )
q
()dj^j
(1)
with B(x; y) = (x) (y) (x+Y ) being the Beta function.
For d = 2, the case for which we calculate the mutual information, this reduces to:
p;(x; y) =
(1 + 2 )
(2 )
p
1 2
1 +
q(x; y)
(1+ 2 )
with q(x; y) =
x2+y2 2xy
1 2
IV. ENTROPY AND MUTUAL INFORMATION
The dierential entropy of a given set of variables t distributed as p(t) is given by[1]:
H[t] =
Z
dnt p(t) log (p(t)) = Et [log p(t)]
The following trick will be useful for the calculation of the entropy of Student t variables. The logarithm function can
be written as the limit:
log(x) = lim
n!0
xn 1
n
= lim
n!0
d
dn
xn:
Using this trick on the integral that denes H[t] we have:
H[t] = lim
n!0
d
dn
Et [p(t)n] = lim
n!0
d
dn
Z
dnt p(t)n+1 (2)
The mutual information is dened as the reduction in the entropy of a given variable given knowledge of another:
I[t1; t2] = H[t1] H[t1jt2] (3)
where the second entropy is calculated using the conditional probability density of t1 given t2. As a functional of the
distributions of these variables, the mutual information is given by:
I[t1; t2] =
Z
pt1;t2(t1; t2) log
pt1;t2(t1; t2)
pt1(t1)pt2(t2)
(4)
Page 3
3this allows us to write the mutual information in the following useful form:
I[t1; t2] = H[t1] +H[t2] H[t1; t2] (5)
where H[ti] is the entropy associated with the marginal distribution of ti and H[t1; t2] is the entropy of the joint
distribution. The mutual information of a standard bivariate Student t distribution p;(x; y) will thus be given by:
I[X;Y ] = 2H[1] H[2] (6)
where H[n] is the entropy a n-dim Student t variable. The denition of mutual information can be generalized to
many variables as:
I[X1; X2; : : : ; Xd] =
Z
pX((x)) log
"
pX((x))
Qd
i=1 pXi(xi)
#
: (7)
For identically distributed variables (for pXi(x) = p(x) for all i = 1; 2; : : : ; d) we have:
I[X1; X2; : : : ; Xd] = dH[1] H[d] (8)
V. ENTROPY OF A 1-DIM STUDENT VARIABLE
As we did in eq.(2) lets calculate:
H[1] = lim
n!0
d
dn
Z
dt p(t)
n+1 (9)
Dening the normalization factor do be:
e F () =
(+12 )
(2 )
p
()
=
1
p
B(2 ;
1
2 )
(10)
we have:
H[1] = lim
n!0
d
dn
(
e (n+1)F ()
Z
dt
1 +
t2
12 (n+1)(+1)
)
: (11)
According to eq.(A5) in appendix A this integral is given by:
H[1] =
p
lim
n!0
d
dn
e (n+1)F ()B
1
2
n( + 1) +
2
;
1
2
(12)
applying the derivative and the limit and using eq.(10) we have:
H[1] = log
p
B
2
;
1
2
+
+ 1
2
+ 1
2
2
(13)
VI. ENTROPY OF A d-DIM STUDENT VARIABLE
We can as easily calculate de entropy of a d-variate Student distribution by using the same expedient. First write
H[n] as:
H[d] = lim
n!0
d
dn
Z
ddt p^;(t)
n+1
= lim
n!0
d
dn
8
<
:
1
Z()n+1
Z
ddt
"
1 +
tT^ 1t
# 12 (n+1)(+d)
9
=
;
I[t1; t2] = H[t1] +H[t2] H[t1; t2] (5)
where H[ti] is the entropy associated with the marginal distribution of ti and H[t1; t2] is the entropy of the joint
distribution. The mutual information of a standard bivariate Student t distribution p;(x; y) will thus be given by:
I[X;Y ] = 2H[1] H[2] (6)
where H[n] is the entropy a n-dim Student t variable. The denition of mutual information can be generalized to
many variables as:
I[X1; X2; : : : ; Xd] =
Z
pX((x)) log
"
pX((x))
Qd
i=1 pXi(xi)
#
: (7)
For identically distributed variables (for pXi(x) = p(x) for all i = 1; 2; : : : ; d) we have:
I[X1; X2; : : : ; Xd] = dH[1] H[d] (8)
V. ENTROPY OF A 1-DIM STUDENT VARIABLE
As we did in eq.(2) lets calculate:
H[1] = lim
n!0
d
dn
Z
dt p(t)
n+1 (9)
Dening the normalization factor do be:
e F () =
(+12 )
(2 )
p
()
=
1
p
B(2 ;
1
2 )
(10)
we have:
H[1] = lim
n!0
d
dn
(
e (n+1)F ()
Z
dt
1 +
t2
12 (n+1)(+1)
)
: (11)
According to eq.(A5) in appendix A this integral is given by:
H[1] =
p
lim
n!0
d
dn
e (n+1)F ()B
1
2
n( + 1) +
2
;
1
2
(12)
applying the derivative and the limit and using eq.(10) we have:
H[1] = log
p
B
2
;
1
2
+
+ 1
2
+ 1
2
2
(13)
VI. ENTROPY OF A d-DIM STUDENT VARIABLE
We can as easily calculate de entropy of a d-variate Student distribution by using the same expedient. First write
H[n] as:
H[d] = lim
n!0
d
dn
Z
ddt p^;(t)
n+1
= lim
n!0
d
dn
8
<
:
1
Z()n+1
Z
ddt
"
1 +
tT^ 1t
# 12 (n+1)(+d)
9
=
;
Page 4
4We can always simplify this integral by making the transformation x = U^t where U^ is the unitary matrix that
diagonalizes ^. Calling I the expression inside the curly braces, we have:
I =
1
Z()n+1
Z
ddx
"
1 +
dX
i=1
1
i
x2i
# 12 (n+1)(+d)
(14)
Where i is the eigenvalue of corresponding to the i-th direction. We can also choose variables ri =
r
1
i
xi and
have:
I =
2
d
2
d
2
q
dj^j
Z()n+1
Z 1
0
dr rd 1
1 + r2
12 (n+1)(+d) (15)
where Sd = 2
d
2
( d2 )
is the surface area of the d-dimensional sphere. The integral is just a denition of the Beta function,
as seen in eq. A9:
I =
2
d
2
d
2
q
dj^j
2Z()n+1
B
1
2
n( + d) +
2
;
d
2
: (16)
Substitution back in eq. 14 and use of the denition of Z() eq. 1 gives:
H[d] =
1
2
log
h
()dj^j
i
+ log
B
2 ;
d
2
d
2
+
+ d
2
+ d
2
2
: (17)
VII. MUTUAL INFORMATION
We can nally calculate de mutual information of the Student d-dimensional distribution using expression eq. 8
with the value of the entropy calculated in eq. 17. Simple substitution and rearrangements yield:
Id() =
1
2
log j^j+ log
(
B
2 ;
1
2
d
d
2
d
2B
2 ;
d
2
)
(18)
(d 1)
2
2
+
d( + 1)
2
+ 1
2
( + d)
2
+ d
2
(19)
Note that Inormal =
1
2 log j^j is just the mutual information of a normal distribution with correlation matrix
given by ^, and it's the only term depending on this correlation matrix. The 'excess' term, with respect to the normal,
is a measure of the deviation of the Student distribution from normal dependence. For d = 2 we have:
I2() = Inormal + Iexcess (20)
where:
IGauss =
1
2
log
1 2
(21)
and:
Iexcess = 2 log
r
2
B
2
;
1
2
2 +
+ (1 + )
+ 1
2
2
; (22)
where we used eq. A10 and eq. A12. A plot of this excess term is made in g.1 as a funcion of the number of degrees
of freedom .
diagonalizes ^. Calling I the expression inside the curly braces, we have:
I =
1
Z()n+1
Z
ddx
"
1 +
dX
i=1
1
i
x2i
# 12 (n+1)(+d)
(14)
Where i is the eigenvalue of corresponding to the i-th direction. We can also choose variables ri =
r
1
i
xi and
have:
I =
2
d
2
d
2
q
dj^j
Z()n+1
Z 1
0
dr rd 1
1 + r2
12 (n+1)(+d) (15)
where Sd = 2
d
2
( d2 )
is the surface area of the d-dimensional sphere. The integral is just a denition of the Beta function,
as seen in eq. A9:
I =
2
d
2
d
2
q
dj^j
2Z()n+1
B
1
2
n( + d) +
2
;
d
2
: (16)
Substitution back in eq. 14 and use of the denition of Z() eq. 1 gives:
H[d] =
1
2
log
h
()dj^j
i
+ log
B
2 ;
d
2
d
2
+
+ d
2
+ d
2
2
: (17)
VII. MUTUAL INFORMATION
We can nally calculate de mutual information of the Student d-dimensional distribution using expression eq. 8
with the value of the entropy calculated in eq. 17. Simple substitution and rearrangements yield:
Id() =
1
2
log j^j+ log
(
B
2 ;
1
2
d
d
2
d
2B
2 ;
d
2
)
(18)
(d 1)
2
2
+
d( + 1)
2
+ 1
2
( + d)
2
+ d
2
(19)
Note that Inormal =
1
2 log j^j is just the mutual information of a normal distribution with correlation matrix
given by ^, and it's the only term depending on this correlation matrix. The 'excess' term, with respect to the normal,
is a measure of the deviation of the Student distribution from normal dependence. For d = 2 we have:
I2() = Inormal + Iexcess (20)
where:
IGauss =
1
2
log
1 2
(21)
and:
Iexcess = 2 log
r
2
B
2
;
1
2
2 +
+ (1 + )
+ 1
2
2
; (22)
where we used eq. A10 and eq. A12. A plot of this excess term is made in g.1 as a funcion of the number of degrees
of freedom .
Page 5
5FIG. 1: bla
Appendix A: Useful facts
The marginal distributions for each variable of a n-dim Student t is given by a 1-dim Student t
From the integral identity[2]:
p^;(t) =
Z
dxN
t
i = 0; i =
x
;
q(x) (A1)
we have that the marginal distribution for tk is:
p(tk) =
Z
dxq(x)
Z
N
t
i = 0; i =
x
;
Y
i6=k
dti (A2)
The marginal distribution for the k-th variable of a n-dim normal distribution is just the 1-dim normal distri-
bution with k-th mean and k-th variance. This leaves us with:
p(tk) =
Z
dxq(x)
Z
N
tk
= 0; =
x
: (A3)
As shown in section II, this is just the 1-dim Student distribution:
p(tk) = p(tk) =
(+12 )
(2 )
p
()
1 +
t2k
+12
(A4)
Normalization integral of the Student distribution
From the normalization factor of the Student distribution we conclude already that:
Z 1
1
dx
1 + x
2
n =
p
(n 12 )
(n)
=
p
B
n
1
2
;
1
2
(A5)
where B(; ) = () () (+) is the Beta function.
Appendix A: Useful facts
The marginal distributions for each variable of a n-dim Student t is given by a 1-dim Student t
From the integral identity[2]:
p^;(t) =
Z
dxN
t
i = 0; i =
x
;
q(x) (A1)
we have that the marginal distribution for tk is:
p(tk) =
Z
dxq(x)
Z
N
t
i = 0; i =
x
;
Y
i6=k
dti (A2)
The marginal distribution for the k-th variable of a n-dim normal distribution is just the 1-dim normal distri-
bution with k-th mean and k-th variance. This leaves us with:
p(tk) =
Z
dxq(x)
Z
N
tk
= 0; =
x
: (A3)
As shown in section II, this is just the 1-dim Student distribution:
p(tk) = p(tk) =
(+12 )
(2 )
p
()
1 +
t2k
+12
(A4)
Normalization integral of the Student distribution
From the normalization factor of the Student distribution we conclude already that:
Z 1
1
dx
1 + x
2
n =
p
(n 12 )
(n)
=
p
B
n
1
2
;
1
2
(A5)
where B(; ) = () () (+) is the Beta function.
Page 6
6 Properties of the Beta function
The Beta function admits the following representations used in the previous calculations:
1. Denition by Gamma functions:
B(x; y) =
(x) (y)
(x+ y)
(A6)
2. Integral denition:
B(x; y) =
Z 1
0
tx 1(1 t)y 1 dt (A7)
3. Another integral denition:
B(x; y) = 2
Z 1
0
r2x 1
(1 + r2)x+y
dr (A8)
this last result implies that:
Z 1
0
rd 1
(1 + r2)
dr =
1
2
B
d
2
;
d
2
(A9)
4. When y = 1 we have:
B(x; 1) =
1
x
(A10)
5. Derivative of the Beta function
@
@x
B(x; y) = B(x; y) ( (x+ y) (x)) (A11)
where (x) is the digamma function.
6. Property of the digamma function:
(x+ 1) (x) =
1
x
(A12)
[1] Et[A(t)] denotes expected value of A(t)
[2] We use the notation N (t ji; i;) for a multivariate normal distribution with mean i, variance i and correlation matrix
^
The Beta function admits the following representations used in the previous calculations:
1. Denition by Gamma functions:
B(x; y) =
(x) (y)
(x+ y)
(A6)
2. Integral denition:
B(x; y) =
Z 1
0
tx 1(1 t)y 1 dt (A7)
3. Another integral denition:
B(x; y) = 2
Z 1
0
r2x 1
(1 + r2)x+y
dr (A8)
this last result implies that:
Z 1
0
rd 1
(1 + r2)
dr =
1
2
B
d
2
;
d
2
(A9)
4. When y = 1 we have:
B(x; 1) =
1
x
(A10)
5. Derivative of the Beta function
@
@x
B(x; y) = B(x; y) ( (x+ y) (x)) (A11)
where (x) is the digamma function.
6. Property of the digamma function:
(x+ 1) (x) =
1
x
(A12)
[1] Et[A(t)] denotes expected value of A(t)
[2] We use the notation N (t ji; i;) for a multivariate normal distribution with mean i, variance i and correlation matrix
^
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
1 Reader on Mendeley
by Discipline
100% Physics
by Academic Status
100% Ph.D. Student
by Country
100% Brazil


