In linear regression, confidence interval (CI) of population *DV* is narrower than that of predicted *DV*. With the assumption of generalizability, CI of \tilde{Y}_{\left[1\times1\right]} at x_{\left[1\times p\right]} is

\;\hat{Y}\pm\left(x\left(X^{\tau}X_{\left[N\times p\right]}\right)^{-1}x^{\tau}\right)^{\frac{1}{2}}\hat{\sigma}t_{\frac{\alpha}{2},N-p},

while CI of Y\left(x\right)=\tilde{Y}\left(x\right)+\varepsilon is

\;\hat{Y}\pm\left(1+x\left(X^{\tau}X_{\left[N\times p\right]}\right)^{-1}x^{\tau}\right)^{\frac{1}{2}}\hat{\sigma}t_{\frac{\alpha}{2},N-p}.

The pivot methods of both are quite similar as following.

\;\frac{\hat{Y}-\tilde{Y}}{s_{\hat{Y}}}\sim t_{df=N-p} ,

so \tilde{Y}_{critical}=\hat{Y}-s_{\hat{Y}}\times t_{critical} .

\;\frac{\hat{Y}-Y}{s_{\left(\hat{Y}-Y\right)}}\sim t_{df=N-p} ,

so Y_{critical}=\hat{Y}-s_{\left(\hat{Y}-Y\right)}\times t_{critical}=\hat{Y}-s_{\left(\hat{Y}-\tilde{Y}-\varepsilon\right)}\times t_{critical}

R^{2} of linear regression is the point estimate of

\;\eta^{2}\equiv\frac{SS\left(\tilde{Y}_{\left[N\times1\right]}\right)}{SS\left(\tilde{Y}_{\left[N\times1\right]}\right)+N\sigma^{2}}for fixed *IV*(s) model. Or, it is the point estimate of \rho^{2} wherein \rho denotes the correlation of *Y* and X\beta, the linear composition of random *IV*(s) . The CI of \rho^{2} is wider than that of \eta^{2} with the same R^{2} and confidence level.

[update] It is obvious that CI of \rho^{2} relies on the distribution presumption of *IV*(s) and *DV*, as fixed *IV*(s) are just special cases of generally random *IV*(s). Usually, the presumption is that all *IV*(s) and *DV * are from multivariate normal distribution.

In the bivariate normal case with a single random *IV*, through Fisher's *z*-transform of Pearson's *r*, CI of the re-sampled R^{\prime2}=r^{\prime2} can also be constructed. Intuitively, it should be wider than CI of \rho^{2} .

Thus,

\;\tanh^-\left(r^{\prime}\right)-\tanh^-\left(r\right){appr\atop \sim}N\left(0,\frac{2}{N-3}\right)CI of \tanh^-\left(r^{\prime}\right) can be constructed as \tanh^-\left(r\right)\pm\sqrt{\frac{2}{N-3}}z_{\frac{\alpha}{2}} . With the reverse transform \tanh\left(.\right), the CI bounds of R^{\prime2} are

\;\left(\max\left(0,\tanh\left(\tanh^-\left(R\right)-\sqrt{\frac{2}{N-3}}z_{1-\frac{\alpha}{2}}\right)\right)\right)^{2}and

\;\left(\tanh\left(\tanh^{-1}\left(R\right)+\sqrt{\frac{2}{N-3}}z_{1-\frac{\alpha}{2}}\right)\right)^{2}.

In multiple *p* *IV*(s) case, Fisher's *z*-transform is

\;\left(N-2-p\right)\left(\tanh^-\left(R\right)\right)^{2}\;{appr\atop \sim}\;\chi_{df=p,ncp=\left(N-2-p\right)\left(\tanh^-\left(\rho\right)\right)^{2}}^{2} .

Although it could also be used to construct CI of \rho^{2} , it is inferior to noncentral *F* approximation of *R* (Lee, 1971). The latter is the algorithm adopted by MSDOS software* R2* (Steiger & Fouladi, 1992) and *R-*function ci.R2(...) within package MBESS (Kelley, 2008).

In literature, "CI(s) of R-square" are hardly the literal CI(s) of R^{2} in replication once more. Most of them actually refer to CI of \rho^{2} . Authors in social science unfamiliar to L^AT_EX hate to type \rho when they feel convenient to type *r* or *R*. Users of experimentally designed fixed *IV*(s) should have reported CI of \eta^{2} . However, if they were too familiar to Steiger's software *R2* to ignore his series papers on CI of effect size, it would be significant chance for them to report a loose CI of \rho^{2}, even in a looser name "CI of R^{2}".

----

Lee, Y. S. (1971). Some results on the sampling distribution of the multiple correlation coefficient. *Journal of the Royal Statistical Society, B, 33*, 117–130.

Kelley, K. (2008). MBESS: Methods for the Behavioral, Educational, and Social Sciences. R package version 1.0.1. [Computer software]. Available from http://www.indiana.edu/~kenkel

Steiger, J. H., & Fouladi, R. T. (1992). R2: A computer program for interval estimation, power calculation, and hypothesis testing for the squared multiple correlation. *Behavior research methods, instruments and computers, 4*, 581–582.

R Code of Part I:

R Code of Part II:

## One thought on ““Confidence interval of R-square”, but, which one?”