Conditional expectations.

发布于 5 天前  58 次阅读


Deriving equations always have been a fascinating thing, due to current project requirement I recently started reading All of Statistics, in several chapters it omit some proofs which I'm going to outline here

The properties of conditional expectations.

First we lay down the definition of conditional expectations, if X and Y are two random variables, then

    \[\EE(Y|X)=\int_{-\infty}^{\infty} y p(y|x) dy\]

where p(y|x)=\frac{p(y,x)}{p(x)}, as expected.

Notice that \EE(Y|X) is itself a function of x, and Y itself can be functions, just like in normal expectations, we have

    \[ \EE(f(Y)|X)=\int_{-\infty}^\infty f(y)p(y|x)dy \]

This is in fact a not so obvious result, because normally by definition of expectation we should have

    \[ \EE(f(Y)|X)=\int_{-\infty}^\infty f(y)p(f(y)|x)dy \]

The proof I will omitted here but it's nonetheless named Law of unconscious statistician. Today we simply want to discuss some of its properties, which are easy and fun to derive while remaining not-so-trivial.

Theorem 1: \EE(c|X)=c, where c is a constant.

Proof: 

(1)   \begin{align*}\EE(c|X)&=\int_{-\infty}^\infty c p(y|x)dy\\&= c \int_{-\infty}^\infty \cfrac{p(y,x)}{p(x)} dy\\&= c\cdot \cfrac{1}{p(x)} \int_{-\infty}^\infty p(y,x) dy\\&= c\cdot \cfrac{1}{p(x)} \cdot f(x)\\&=c\end{align*}

The author mentioned specifically about the regression problem, we use the function r(x)=\EE(Y|X=x) to represent the result of the regression, and if Y is the random variable of the value on the de facto curve, then we can use the random variable Y-r(X) to represent the error between the result of regression and the real curve, the books says that if we let \epsilon=Y-r(X), then \EE(\epsilon)=0. To show this, we need first to establish a lemma.

Lemma 1: (\EE(\EE(Y|X))=\EE(Y)

Proof:  

(2)    \begin{align*} \EE(\EE(Y|X))&=\int_{-\infty}^\infty E(Y|X)p(x) dx\\ &= \int_{-\infty}^\infty \int_{-\infty}^\infty y p(y|x) dy p(x) dx\\ \end{align*}

Notice in the first step, since \EE(Y|X) is a function g(x) of x, thus when we expand the outer expectation we are actually expanding \EE(g(X)) for some g, thus we first integrating w.r.t. x, and when we expand further the inner expectation \EE(Y|X) itself, by definition it's an expectation of Y so we are integrating w.r.t. y. Now, since p(x) itself is a constant w.r.t. y, we can move it inside the inner integral, and furthermore, in statistics we often make some aggressive assumptions, for example, here we suppose that the iterated integral automatically satisfies the Fubini's theorem, thus we can exchange their order,

(3)    \begin{align*} &\int_{-\infty}^\infty \int_{-\infty}^\infty y p(y|x) dy p(x) dx\\ =& \int_{-\infty}^\infty\int_{-\infty}^\infty y p(x)p(y|x) dx dy \end{align*}

But notice that p(x)p(y|x) is just p(y,x),and since y is constant w.r.t. x we can move it out of the inner integral, thus

(4)    \begin{align*} &\int_{-\infty}^\infty\int_{-\infty}^\infty y p(x)p(y|x) dx dy\\ =&\int_{-\infty}^\infty y \int_{-\infty}^\infty  p(y,x) dx dy \\ =&\int_{-\infty}^\infty y p(y) dy\\ =& \EE(Y) \end{align*}

Having proven this, we need to prove another lemma, that is the linearity of conditional expectation, which the author also lacks in the original text

Lemma 2: \EE(aX+c|Y)=a \EE(X|Y)+c

Proof:

(5)   \begin{align*}\EE(aX+c|Y)&=\int_{-\infty}^\infty (ax+c)p(x|y) dx\\&=a \int_{-\infty}^\infty x p(x|y) dx + c \int_{-\infty}^\infty p(x|y) dx\\&=a \EE(X|Y) + c\int_{-\infty}^\infty \frac{p(x,y)}{p(y)} dx\\&=a \EE(X|Y) + c\end{align*}

Now we can prove that \EE(\epsilon)=0 by noticing that

(6)    \begin{align*} \EE(\epsilon) &= \EE(\EE(\epsilon|X)) \\ &=\EE(\EE(Y-r(X)|X))\\ &=\EE(\EE(Y|X)-\EE(r(X)|X))\\ &=\EE(\EE(Y|X)-r(X)) \end{align*}

But \EE(Y|X) is by definition just r(X), so \EE(\EE(Y|X)-r(X))=\EE(r(X)-r(X))=0.

This means that the error between the result of regression and the actual curve can be modeled by a distribution whose expectation is 0, in fact, you should have no surprise that normally the error \epsilon is modeled by a normal distribution centered at 0, i.e. \epsilon \sim \mathcal{N}(0, \sigma^2). That is, every curve regression problem can be modeled by the result of regression, plus a perturbation function \epsilon(x) that follows the normal distribution centered at 0.

Jusqu'à ce que le mort nous sépare.
最后更新于 2025-12-12