Sign up & Download
Sign in

Entropy and Mutual Information of the bivariate Student t distribution

by Rafael Calsaverini
Entropy (2009)

Cite this document (BETA)

Available from Rafael Calsaverini's profile on Mendeley.
Page 1
hidden

Entropy and Mutual Information of the bivariate Student t distribution

Entropy and Mutual Information of the bivariate Student t distribution
Rafael Calsaverini
Dep. de Fsica Geral, Instituto de Fsica, Universidade de S~ao Paulo, Brazil
(Dated: May 27, 2009)
This article presents a simple derivation of the Entropy and Mutual Information of the Student
t distribution used in reference We de ne the distribution and the concept of Mutual Information
and calculate it using a simple 'replica-like' trick.
I. INTRODUCTION
We present here in some detail a simple derivation of the Mutual Information of a Student-t multivariate distribution
that was brie
y commented in reference The derivation itself was not central to the theme of that article and not
interesting enough to justify another refered article, but still interesting enough for a brief e-print that might be useful
for someone else.
II. STANDARD 1-DIM STUDENT T DISTRIBUTION
If we de ne de variable  =
P
i=1 zi with zi standard normally distributed variables, (zi  N (0; 1)), it's distribution
function is known as the 2-distribution and it is given by:
p(2 = x) = q(x) =
1
2

2 (

2 )
e
x
2 x

21
The t-distribution is said to be the distribution of the variable:
t =
r

2
z
with z  N (0; 1). The expression for this distribution is easily calculated if we write:
p(t) =
Z 1
0
p

t

2 = x

p

2 = x

dx
The distribution p(tj2 = x) is a normal with zero mean and variance x . So:
p(t) =
Z 1
0
r
x
2
e
x
2 t
2

q(x)dx
=
1
p
2
1
2

2 (

2 )
Z 1
0
x
1
2 e(1+
1
 t
2) 2 dx:
The last integral is easily solved remembering the de nition of the Gamma function:
1
xz
(z) =
Z 1
0
dssz1exs
Therefore we nally have:
p(t) =
(+12 )
(2 )
p
()

1 +
t2

 +12
:
III. d-DIM STUDENT T DISTRIBUTION
The same thing can be done in more than one dimension by de ning:
ti =
r

2
zi
Page 2
hidden
2with zi normal variables with a given covariance matrix. If we use zero means and unit variances (but non-zero
correlations) for the zi variables, the same procedure done in section II gives the standard Student t distribuition:
p^;(t) =
(+d2 )
(2 )
q
()dj^j
"
1 +
tT^1t

# +d2
where:
 ^ is the correlation matrix:
^ =
0
B
B
B
@
1 1;2    1;d
2;1 1    2;d
...
...
. . .
...
d;1 d;2    d;d
1
C
C
C
A
  is a parameter called \degrees of freedom"
The normalizing prefactor can also be written as:
1
Z()
= eF () =
(+d2 )
(2 )
q
()dj^j
=
(d2 )
B(2 ;
d
2 )
q
()dj^j
(1)
with B(x; y) = (x)(y)(x+Y ) being the Beta function.
For d = 2, the case for which we calculate the mutual information, this reduces to:
p;(x; y) =
(1 + 2 )
(2 )
p
1 2

1 +
q(x; y)

(1+ 2 )
with q(x; y) =
x2+y22xy
12
IV. ENTROPY AND MUTUAL INFORMATION
The di erential entropy of a given set of variables t distributed as p(t) is given by[1]:
H[t] =
Z
dnt p(t) log (p(t)) = Et [log p(t)]
The following trick will be useful for the calculation of the entropy of Student t variables. The logarithm function can
be written as the limit:
log(x) = lim
n!0
xn 1
n
= lim
n!0
d
dn
xn:
Using this trick on the integral that de nes H[t] we have:
H[t] = lim
n!0
d
dn
Et [p(t)n] = lim
n!0
d
dn
Z
dnt p(t)n+1 (2)
The mutual information is de ned as the reduction in the entropy of a given variable given knowledge of another:
I[t1; t2] = H[t1]H[t1jt2] (3)
where the second entropy is calculated using the conditional probability density of t1 given t2. As a functional of the
distributions of these variables, the mutual information is given by:
I[t1; t2] =
Z
pt1;t2(t1; t2) log

pt1;t2(t1; t2)
pt1(t1)pt2(t2)

(4)
Page 3
hidden
3this allows us to write the mutual information in the following useful form:
I[t1; t2] = H[t1] +H[t2]H[t1; t2] (5)
where H[ti] is the entropy associated with the marginal distribution of ti and H[t1; t2] is the entropy of the joint
distribution. The mutual information of a standard bivariate Student t distribution p;(x; y) will thus be given by:
I[X;Y ] = 2H[1]H[2] (6)
where H[n] is the entropy a n-dim Student t variable. The de nition of mutual information can be generalized to
many variables as:
I[X1; X2; : : : ; Xd] =
Z
pX((x)) log
"
pX((x))
Qd
i=1 pXi(xi)
#
: (7)
For identically distributed variables (for pXi(x) = p(x) for all i = 1; 2; : : : ; d) we have:
I[X1; X2; : : : ; Xd] = dH[1]H[d] (8)
V. ENTROPY OF A 1-DIM STUDENT VARIABLE
As we did in eq.(2) lets calculate:
H[1] = lim
n!0
d
dn
Z
dt p(t)
n+1 (9)
De ning the normalization factor do be:
eF () =
(+12 )
(2 )
p
()
=
1
p
B(2 ;
1
2 )
(10)
we have:
H[1] = lim
n!0
d
dn
(
e(n+1)F ()
Z
dt

1 +
t2

 12 (n+1)(+1)
)
: (11)
According to eq.(A5) in appendix A this integral is given by:
H[1] =
p
 lim
n!0
d
dn

e(n+1)F ()B

1
2
n( + 1) +

2
;
1
2

(12)
applying the derivative and the limit and using eq.(10) we have:
H[1] = log

p
B


2
;
1
2

+

 + 1
2



 + 1
2



2

(13)
VI. ENTROPY OF A d-DIM STUDENT VARIABLE
We can as easily calculate de entropy of a d-variate Student distribution by using the same expedient. First write
H[n] as:
H[d] = lim
n!0
d
dn
Z
ddt p^;(t)
n+1
= lim
n!0
d
dn
8
<
:
1
Z()n+1
Z
ddt
"
1 +
tT^1t

# 12 (n+1)(+d)
9
=
;
Page 4
hidden
4We can always simplify this integral by making the transformation x = U^t where U^ is the unitary matrix that
diagonalizes ^. Calling I the expression inside the curly braces, we have:
I =
1
Z()n+1
Z
ddx
"
1 +
dX
i=1

1
i

x2i
# 12 (n+1)(+d)
(14)
Where i is the eigenvalue of  corresponding to the i-th direction. We can also choose variables ri =
r
1
i

xi and
have:
I =
2
d
2


d
2

q
dj^j
Z()n+1
Z 1
0
dr rd1

1 + r2
 12 (n+1)(+d) (15)
where Sd = 2
d
2
( d2 )
is the surface area of the d-dimensional sphere. The integral is just a de nition of the Beta function,
as seen in eq. A9:
I =
2
d
2


d
2

q
dj^j
2Z()n+1
B

1
2
n( + d) +

2
;
d
2

: (16)
Substitution back in eq. 14 and use of the de nition of Z() eq. 1 gives:
H[d] =
1
2
log
h
()dj^j
i
+ log
B


2 ;
d
2



d
2
 +

 + d
2



 + d
2



2

: (17)
VII. MUTUAL INFORMATION
We can nally calculate de mutual information of the Student d-dimensional distribution using expression eq. 8
with the value of the entropy calculated in eq. 17. Simple substitution and rearrangements yield:
Id() =
1
2
log j^j+ log
(
B


2 ;
1
2
d


d
2


d
2B


2 ;
d
2

)
(18)

(d 1)
2


2

+
d( + 1)
2


 + 1
2


( + d)
2


 + d
2

(19)
Note that Inormal =
1
2 log j^j is just the mutual information of a normal distribution with correlation matrix
given by ^, and it's the only term depending on this correlation matrix. The 'excess' term, with respect to the normal,
is a measure of the deviation of the Student distribution from normal dependence. For d = 2 we have:
I2() = Inormal + Iexcess (20)
where:
IGauss =
1
2
log

1 2

(21)
and:
Iexcess = 2 log
r

2
B


2
;
1
2


2 + 

+ (1 + )



 + 1
2



2

; (22)
where we used eq. A10 and eq. A12. A plot of this excess term is made in g.1 as a funcion of the number of degrees
of freedom .
Page 5
hidden
5FIG. 1: bla
Appendix A: Useful facts
 The marginal distributions for each variable of a n-dim Student t is given by a 1-dim Student t
From the integral identity[2]:
p^;(t) =
Z
dxN

t


i = 0; i =

x
;

q(x) (A1)
we have that the marginal distribution for tk is:
p(tk) =
Z
dxq(x)
Z
N

t


i = 0; i =

x
;
Y
i6=k
dti (A2)
The marginal distribution for the k-th variable of a n-dim normal distribution is just the 1-dim normal distri-
bution with k-th mean and k-th variance. This leaves us with:
p(tk) =
Z
dxq(x)
Z
N

tk


 = 0;  =

x

: (A3)
As shown in section II, this is just the 1-dim Student distribution:
p(tk) = p(tk) =
(+12 )
(2 )
p
()

1 +
t2k

 +12
(A4)
 Normalization integral of the Student distribution
From the normalization factor of the Student distribution we conclude already that:
Z 1
1
dx

1 + x
2

n =
p
(n 12 )
(n)
=
p
B

n
1
2
;
1
2

(A5)
where B(; ) = ()()(+) is the Beta function.
Page 6
hidden
6 Properties of the Beta function
The Beta function admits the following representations used in the previous calculations:
1. De nition by Gamma functions:
B(x; y) =
(x)(y)
(x+ y)
(A6)
2. Integral de nition:
B(x; y) =
Z 1
0
tx1(1 t)y1 dt (A7)
3. Another integral de nition:
B(x; y) = 2
Z 1
0
r2x1
(1 + r2)x+y
dr (A8)
this last result implies that:
Z 1
0
rd1
(1 + r2)
dr =
1
2
B


d
2
;
d
2

(A9)
4. When y = 1 we have:
B(x; 1) =
1
x
(A10)
5. Derivative of the Beta function
@
@x
B(x; y) = B(x; y) ( (x+ y) (x)) (A11)
where (x) is the digamma function.
6. Property of the digamma function:
(x+ 1) (x) =
1
x
(A12)
[1] Et[A(t)] denotes expected value of A(t)
[2] We use the notation N (t ji; i;) for a multivariate normal distribution with mean i, variance i and correlation matrix
^

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

1 Reader on Mendeley
by Discipline
 
100% Physics
by Academic Status
 
100% Ph.D. Student
by Country
 
100% Brazil