Definition
Let X and Y be random variables with joint probability mass function (or density) f(x,y). The covariance between X and Y is a number:
Remark
By definitions of Cov(X,Y) and E[g(X,Y)], assuming g(x,y)=(x - m X)(y - m Y), we can get the following:
for discrete X, Y,
for continuous X, Y.
Denotation: Instead of Cov(X,Y) we often write sXY.
Interpretation. Covariance is a kind of a measure of relationship between two random variables:
(a) If large values of X (bigger then m X) tend to occur with large values of Y (bigger then m Y ), and small values of X (smaller then m X ) occur with small values of Y (smaller then m Y) then Cov (X,Y) > 0.
(b) If large values of X (bigger then od m X ) tend to occur with small values of Y ( smaller than m Y) and small values of X (smaller than m X) occur with large values of Y (bigger then m Y) then Cov (X,Y) < 0.
(c) You can notice that for X = Y Cov (X,Y) = Var (X) ³ 0.
Proposition
Cov(X,Y) = E(XY) - m X m Y.
Proof
Cov(X,Y) = E[(X - m X ) (Y - m Y)] = E(XY - Xm Y - Ym X + m X m Y ) =
= E(XY) - E(Xm Y ) - E(Ym X ) + m X m Y = E(XY) - m X m Y.
Theorem
If random variables X and Y are independent, then Cov(X,Y) = 0.
Proof
For independent random variables E(XY) = E(X) E(Y). On the basis of that and by the covariance formula we have:
Cov(X,Y) = E(XY) - m X m Y = E(X) E(Y) - m X m Y = 0.
Remark
The opposite theorem usually doesn't hold. For example let (X,Y) be discrete random variables with the joint probability mass function given as follows
We can notice, that value of X determines value of Y, therefore, X and Y are dependent. At the same time EX= EY = 0 and EXY = (1/4) (2´ 2 + 2´ (- 2) + (- 4)´ 1 + 4´ (- 1)) = 0. Therefore Cov(X,Y) = 0.
Theorem
For any constants a, b
Var(aX + bY) = a2 Var(X) + b2 Var(Y) + 2abCov(X,Y).
Proof
E{ [(aX + bY) – (am X + bm Y )]2 } = E{ [a(X – m X) + b(Y – m Y )]2 } =
= E{ [a(X – m X)]2 } + E{ [2ab(X – m X) (Y – m Y )]} + E{ [b(Y – m Y )]2 } =
= a2 Var(X) + 2abCov(X,Y) + b2 Var(Y).
Corollary
If X and Y are independent random variables, then
Var(aX + bY) = a2 Var(X) + b2 Var(Y).
Example
Let X1, ... , X5 be numbers of spots in 5 independent rolls of a die. Then
Var( (X1 + X2 ) / 2 ) = (1/2) Var(X1), and Var( (X1 + X2 + ... + X5) / 5 ) = (1/5) Var(X1).
We can see that the variance of the average number of spots decreases inversely to the number of rolls, while its' standard deviation decreases inversely to the square root of the number of rolls. Similar property holds for the variance of the average from the results obtained in independent experiments.