Fisher information matrix linear regression
Answers
Answered by
0
I'm going to assume that the variance σ2σ2is known, since you appear to only be considering the parameter vector ββ as your unknowns. If I observe a single instance (x,y)(x,y) then the log-likelihood of the data is given by the density
ℓ(β)=−12log(2πσ2)−−(y−xTβ)22σ2.ℓ(β)=−12log(2πσ2)−−(y−xTβ)22σ2.
This is just the log of the Gaussian density. The Fisher information matrix is just the expected vaue of the negative of the Hessian matrix of ℓ(β)ℓ(β). So, taking the gradient gives
S(β)=∇β−(y−xTβ)22σ2=∇β[−y22σ2+yxTβσ2−βTxxTβ2σ2]=yxσ2−xxTβσ2=(y−xTβ)xσ2.S(β)=∇β−(y−xTβ)22σ2=∇β[−y22σ2+yxTβσ2−βTxxTβ2σ2]=yxσ2−xxTβσ2=(y−xTβ)xσ2.
Taking another derivative, the Hessian is
H(β)=∂∂βT(y−xTβ)xσ2=∂xy∂βT−∂xxTβ∂βT=−xxTσ2,H(β)=∂∂βT(y−xTβ)xσ2=∂xy∂βT−∂xxTβ∂βT=−xxTσ2,
so the Fisher information is
I(β)=−EβH(β)=xxTσ2.I(β)=−EβH(β)=xxTσ2.
Because gradients and Hessians are additive, if I observe nn data items I just add the individual Fisher information matricies,
I(β)=∑ixixTiσ2,I(β)=∑ixixiTσ2,
which, if XT=(x1,x2,…,xn)XT=(x1,x2,…,xn), can be compactly written as
I(β)=XTX/σ2.I(β)=XTX/σ2.
It is well-known that the variance of the MLE β^β^ in a linear model is given by σ2(XTX)−1σ2(XTX)−1, and in more general settings the asymptotic variance of the MLE should be equal to the inverse of the Fisher information, so we know we've got the right answer
Plzzzzz mark me brainlist plzz....... .
ℓ(β)=−12log(2πσ2)−−(y−xTβ)22σ2.ℓ(β)=−12log(2πσ2)−−(y−xTβ)22σ2.
This is just the log of the Gaussian density. The Fisher information matrix is just the expected vaue of the negative of the Hessian matrix of ℓ(β)ℓ(β). So, taking the gradient gives
S(β)=∇β−(y−xTβ)22σ2=∇β[−y22σ2+yxTβσ2−βTxxTβ2σ2]=yxσ2−xxTβσ2=(y−xTβ)xσ2.S(β)=∇β−(y−xTβ)22σ2=∇β[−y22σ2+yxTβσ2−βTxxTβ2σ2]=yxσ2−xxTβσ2=(y−xTβ)xσ2.
Taking another derivative, the Hessian is
H(β)=∂∂βT(y−xTβ)xσ2=∂xy∂βT−∂xxTβ∂βT=−xxTσ2,H(β)=∂∂βT(y−xTβ)xσ2=∂xy∂βT−∂xxTβ∂βT=−xxTσ2,
so the Fisher information is
I(β)=−EβH(β)=xxTσ2.I(β)=−EβH(β)=xxTσ2.
Because gradients and Hessians are additive, if I observe nn data items I just add the individual Fisher information matricies,
I(β)=∑ixixTiσ2,I(β)=∑ixixiTσ2,
which, if XT=(x1,x2,…,xn)XT=(x1,x2,…,xn), can be compactly written as
I(β)=XTX/σ2.I(β)=XTX/σ2.
It is well-known that the variance of the MLE β^β^ in a linear model is given by σ2(XTX)−1σ2(XTX)−1, and in more general settings the asymptotic variance of the MLE should be equal to the inverse of the Fisher information, so we know we've got the right answer
Plzzzzz mark me brainlist plzz....... .
Similar questions