## “Confidence interval of R-square”, but, which one?

In linear regression, confidence interval (CI) of population DV is narrower than that of predicted DV. With the assumption of generalizability, CI of $\tilde{Y}_{\left[1\times1\right]}$ at $x_{\left[1\times p\right]}$ is

$\;\hat{Y}\pm\left(x\left(X^{\tau}X_{\left[N\times p\right]}\right)^{-1}x^{\tau}\right)^{\frac{1}{2}}\hat{\sigma}t_{\frac{\alpha}{2},N-p}$,

while CI of $Y\left(x\right)=\tilde{Y}\left(x\right)+\varepsilon$ is

$\;\hat{Y}\pm\left(1+x\left(X^{\tau}X_{\left[N\times p\right]}\right)^{-1}x^{\tau}\right)^{\frac{1}{2}}\hat{\sigma}t_{\frac{\alpha}{2},N-p}$.

The pivot methods of both are quite similar as following.

$\;\frac{\hat{Y}-\tilde{Y}}{s_{\hat{Y}}}\sim t_{df=N-p}$ ,

so $\tilde{Y}_{critical}=\hat{Y}-s_{\hat{Y}}\times t_{critical}$ .

$\;\frac{\hat{Y}-Y}{s_{\left(\hat{Y}-Y\right)}}\sim t_{df=N-p}$,

so $Y_{critical}=\hat{Y}-s_{\left(\hat{Y}-Y\right)}\times t_{critical}=\hat{Y}-s_{\left(\hat{Y}-\tilde{Y}-\varepsilon\right)}\times t_{critical}$

$R^{2}$ of linear regression is the point estimate of

$\;\eta^{2}\equiv\frac{SS\left(\tilde{Y}_{\left[N\times1\right]}\right)}{SS\left(\tilde{Y}_{\left[N\times1\right]}\right)+N\sigma^{2}}$

for fixed IV(s) model. Or, it is the point estimate of $\rho^{2}$ wherein $\rho$ denotes the correlation of Y and $X\beta$, the linear composition of random IV(s) . The CI of $\rho^{2}$ is wider than that of $\eta^{2}$ with the same $R^{2}$ and confidence level.

[update] It is obvious that CI of $\rho^{2}$ relies on the distribution presumption of IV(s) and DV, as fixed IV(s) are just special cases of generally random IV(s). Usually, the presumption is that all IV(s) and DV are from multivariate normal distribution.

In the bivariate normal case with a single random IV, through Fisher's z-transform of Pearson's r, CI of the re-sampled $R^{\prime2}=r^{\prime2}$ can also be constructed. Intuitively, it should be wider than CI of $\rho^{2}$.

$\;\tanh^-\left(r\right)\equiv\frac{1}{2}\log\frac{1+r}{1-r}\;{appr\atop \sim}\; N\left(\tanh^-\left(\rho\right),\frac{1}{N-3}\right)$

Thus,

$\;\tanh^-\left(r^{\prime}\right)-\tanh^-\left(r\right){appr\atop \sim}N\left(0,\frac{2}{N-3}\right)$

CI of $\tanh^-\left(r^{\prime}\right)$ can be constructed as $\tanh^-\left(r\right)\pm\sqrt{\frac{2}{N-3}}z_{\frac{\alpha}{2}}$ . With the reverse transform $\tanh\left(.\right)$, the CI bounds of $R^{\prime2}$ are

$\;\left(\max\left(0,\tanh\left(\tanh^-\left(R\right)-\sqrt{\frac{2}{N-3}}z_{1-\frac{\alpha}{2}}\right)\right)\right)^{2}$

and

$\;\left(\tanh\left(\tanh^{-1}\left(R\right)+\sqrt{\frac{2}{N-3}}z_{1-\frac{\alpha}{2}}\right)\right)^{2}$.

In multiple p IV(s) case, Fisher's z-transform is

$\;\left(N-2-p\right)\left(\tanh^-\left(R\right)\right)^{2}\;{appr\atop \sim}\;\chi_{df=p,ncp=\left(N-2-p\right)\left(\tanh^-\left(\rho\right)\right)^{2}}^{2}$ .

Although it could also be used to construct CI of $\rho^{2}$ , it is inferior to noncentral F approximation of R (Lee, 1971). The latter is the algorithm adopted by MSDOS software R2 (Steiger & Fouladi, 1992) and R-function ci.R2(...) within package MBESS (Kelley, 2008).

In literature, "CI(s) of R-square" are hardly the literal CI(s) of $R^{2}$ in replication once more. Most of them actually refer to CI of $\rho^{2}$ . Authors in social science unfamiliar to $L^AT_EX$ hate to type $\rho$ when they feel convenient to type r or R. Users of experimentally designed fixed IV(s) should have reported CI of $\eta^{2}$ . However, if they were too familiar to Steiger's software R2 to ignore his series papers on CI of effect size, it would be significant chance for them to report a loose CI of $\rho^{2}$, even in a looser name "CI of $R^{2}$".

----

Lee, Y. S. (1971). Some results on the sampling distribution of the multiple correlation coefficient. Journal of the Royal Statistical Society, B, 33, 117–130.

Kelley, K. (2008). MBESS: Methods for the Behavioral, Educational, and Social Sciences. R package version 1.0.1. [Computer software]. Available from http://www.indiana.edu/~kenkel

Steiger, J. H., & Fouladi, R. T. (1992). R2: A computer program for interval estimation, power calculation, and hypothesis testing for the squared multiple correlation. Behavior research methods, instruments and computers, 4, 581–582.

R Code of Part I:

R Code of Part II:

## WordPress (and WPMU) Plugin for R Web Interface

 Download: RwebFriend.zip [Update] Including Chinese UTF8 Version Plugin Name: RwebFriend Plugin URL: http://xiaoxu.lxxm.com/RwebFriend Description: Set Rweb url options and transform [rcode]...[/rcode] or ... tag-pair into TEXTAREA which supports direct submit to web interface of R. *Credit notes：codes of two relevant plugins are studied and imported. One of the plugins deals with auto html tags within TEXTAREA tag-pair, the other stops WordPress to auto-transform quotation marks. Version: 1.0 Author: Xiaoxu LI Author URI: http://xiaoxu.lxxm.com/ Setup:Wordpress 3.5 WordPress 3.4 Usage:

[update] The free Chinese wordpress platform yo2.cn has installed this plugin.  See my demo.

More online demos -- http://wiki.qixianglu.cn/rwebfriendttest/

[update, June 2009] 72pines.com (here!) installed this plugin. Try

[update2009JUL18]Test installed packages of Rweb:
http://pbil.univ-lyon1.fr/cgi-bin/Rweb/Rweb.cgi
https://rweb.stat.umn.edu/cgi-bin/Rweb/Rweb.cgi

## Type III ANOVA in R

Type III ANOVA SS for factor A within interaction of factor B is defined as $SS_{A:B+A+B}-SS_{A:B+B}$, wherein A:B  is the pure interaction effect orthogonal to main effects of A, B, and intercept. There are some details in R to get pure interaction dummy IV(s).

Data is from SAS example PROC GLM, Example 30.3: Unbalanced ANOVA for Two-Way Design with Interaction

##
##Data from http://www.otago.ac.nz/sas/stat/chap30/sect52.htm
##
drug <- as.factor(c(t(t(rep(1,3)))%*%t(1:4))); ##Factor A
disease <- as.factor(c(t(t(1:3)) %*% t(rep(1,4))));##Factor B
y <- t(matrix(c(
42 ,44 ,36 ,13 ,19 ,22
,33 ,NA ,26 ,NA ,33 ,21
,31 ,-3 ,NA ,25 ,25 ,24
,28 ,NA ,23 ,34 ,42 ,13
,NA ,34 ,33 ,31 ,NA ,36
,3 ,26 ,28 ,32 ,4 ,16
,NA ,NA ,1 ,29 ,NA ,19
,NA ,11 ,9 ,7 ,1 ,-6
,21 ,1 ,NA ,9 ,3 ,NA
,24 ,NA ,9 ,22 ,-2 ,15
,27 ,12 ,12 ,-5 ,16 ,15
,22 ,7 ,25 ,5 ,12 ,NA
),nrow=6));
## verify data with http://www.otago.ac.nz/sas/stat/chap30/sect52.htm
(cbind(drug,disease,y));
##
## make a big table
y <- c(y);
drug <- rep(drug,6);
disease <- rep(disease,6);
##
## Design the PURE interaction dummy variables
m <- model.matrix(lm(rep(0,length(disease)) ~ disease + drug +disease:drug));
##! If lm(y~ ...) is used, the is.na(y) rows will be dropped. The residuals will be orthogonal to observed A, & B rather than designed cell A & B. It will be Type II SS rather than Type III SS.
c <- attr(m,"assign")==3;
(IV_Interaction <-residuals( lm(m[,c] ~ m[,!c])));
##
## verify data through type I & II ANOVA to http://www.otago.ac.nz/sas/stat/chap30/sect52.htm
## Type I ANOVA of A, defined by SS_A --
anova(lm(y~drug*disease));
##
## Type II ANOVA of A, defined by SS_{A+B}-SS_B --
require(car);
Anova(lm(y~drug*disease),type='II');
anova(lm(y~disease),lm(y~drug + disease))
##
##
## Type III ANOVA of A defined by SS_{A:B+A+B}-SS_{A:B+B}
t(t(c( anova(lm(y~IV_Interaction+disease),lm(y~disease * drug))$'Sum of Sq'[2] ,anova(lm(y~IV_Interaction+drug),lm(y~disease*drug))$'Sum of Sq'[2]
,anova(lm(y~disease+drug),lm(y~disease*drug))$'Sum of Sq'[2]))) ## ## Currently, Anova(...) of Prof John Fox's car package (V. 1.2-8 or 1.2-9) used "impure" interaction dummy IV(s), which made its type III result relying upon the order of factor levels. I think in its next version, the "pure" interaction dummy IV(s) will be adopted to give consistent type III SS. [update:] In Prof John FOX's car package, with parameter contrasts in inputted lm object, Example(Anova) gave type III SS consistent to other softwares. In this case, the code line should be -- Anova(lm(y~drug*disease, contrasts=list(drug=contr.sum, disease=contr.sum)),type='III'); Contrasts patterns are defined within lm(...) rather than Anova(...). An lm object with default contrasts parameter is inappropriate to calculate type III SS, or the result will rely on the level names in any nominal factor -- require(car); M2<-Moore; M2$f1<-M2$fcategory; M2$f2<-as.factor(- as.integer(M2$fcategory)); mod1<-lm(formula = conformity ~ f1 * partner.status,data=M2); mod2<-lm(formula = conformity ~ f2 * partner.status,data=M2); c(Anova(mod1,type='III')$'Sum Sq'[3],Anova(mod2,type='III')\$'Sum Sq'[3])

There was hot discussion of type III ANOVA on R-help newsgroup. Thomas Lumley thought Types of SS nowadays don't have to make any real sense --

http://tolstoy.newcastle.edu.au/R/help/05/04/3009.html

This is one of many examples of an attempt to provide a mathematical answer to something that isn't a mathematical question.

As people have already pointed out, in any practical testing situation you have two models you want to compare. If you are working in an interactive statistical environment, or even in a modern batch-mode system, you can fit the two models and compare them. If you want to compare two other models, you can fit them and compare them.

However, in the Bad Old Days this was inconvenient (or so I'm told). If you had half a dozen tests, and one of the models was the same in each test, it was a substantial saving of time and effort to fit this model just once.

This led to a system where you specify a model and a set of tests: eg I'm going to fit y~a+b+c+d and I want to test (some of) y~a vs y~a+b, y~a+b vs y~a+b+c and so on. Or, I want to test (some of) y~a+b+c vs y~a+b+c+d, y~a+b+d vs y~a+b+c+d and so on. This gives the "Types" of sums of squares, which are ways of specifying sets of tests. You could pick the "Type" so that the total number of linear models you had to fit was minimized. As these are merely a computational optimization, they don't have to make any real sense. Unfortunately, as with many optimizations, they have gained a life of their own.

The "Type III" sums of squares are the same regardless of order, but this is a bad property, not a good one. The question you are asking when you test "for" a term X really does depend on what other terms are in the model, so order really does matter. However, since you can do anything just by specifying two models and comparing them, you don't actually need to worry about any of this.

-thomas

## DV predicted by two IVs, vs. triangular pyramid

-- Diagram from Wiki

It is easier to imagine relation in three spatial vectors by their angles, than by their correlations. For standardized $DV$ $Y=\left(y_{1},y_{2},\dots,y_{N}\right)^{\tau}$ and $IV$s $X_{1}=\left(x_{1,1},x_{2,1},\dots,x_{N,1}\right)^{\tau}$, $X_{2}=\left(x_{1,2},x_{2,2},\dots,x_{N,2}\right)^{\tau}$, cosines of three angles of the triangular pyramid determinate the correlation matrix, thus, all statistics of the regressions $Y=\beta_{1}X_{1}+\beta_{2}X_{2}+\varepsilon$ and $Y=\beta_{1}X_{1}+\varepsilon$ . Unexpected but imaginative results on the impact of introducing $X_{2}$ are --

1. Both $IV$s are nearly independent of $DV$. Togethor they predict $DV$ almost perfectly ($\angle YX_{1}=\angle YX_{2}=89^{\circ}$ and $\angle X_{1}X_{2}=177.9^{\circ}$).

2. Both $IV$s are almost perfectly correlated with $DV$. Togethor, one of the regressive coefficient is significantly negative ($1^{\circ}$, $0.6^{\circ}$ and $0.5^{\circ}$ respectively).

3. Redundancy (Cohen, Cohen, West, & Aiken, 2003) increases to full and then decreases to zero and even negative ($\angle YX_{1}=60^{\circ}$, $\angle YX_{2}=45{}^{\circ}$ and $\angle X_{1}X_{2}$ closes from $90^{\circ}$ to $45^{\circ}$ then to $15^{\circ}+\epsilon$ ).

--
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S.  (2003). Applied multiple regression/correlation analysis for the behavioral sciences(3rd ed.) Mahwah, NJ: Lawrence Erlbaum Associates.

## Correction: the convenient radius for 95% confidence interval of t-test

-- What do you call a tea party with more than 30 people?
-- A Z party!!!
Joke #123 on http://www.ilstu.edu/~gcramsey/Gallery.html

2*SE is a popular convenient radius to eye 95% CI for t-test. Statisticians take t with df>=30 as z. However, I was incorrect to teach that 1.96*SE could be the precise radius when df>=30.

Guess what is the critical df to use decimally precise 1.96*SE. Larger than expected --

## Unexpectedly, the theoretically best reject-region of T-test is bounded.

$f_{t}\left(x;\mu,df\right)\equiv C\left(df\right)\left(1+\frac{\left(x-\mu\right)^{2}}{df}\right)^{-\frac{df+1}{2}}$ $\lambda\left(x;\mu_{0},\mu_{1},df\right)\equiv\frac{f_{t}\left(x;\mu_{1},df\right)}{f_{t}\left(x;\mu_{0},df\right)}=\left(\frac{v+\left(x-\mu_{1}\right)^{2}}{v+\left(x-\mu_{0}\right)^{2}}\right)^{-\frac{df+1}{2}}{\longrightarrow\atop x\rightarrow\infty}1$

For NHST $H_{0}:T\sim t_{df}$ vs $H_{1}:T-1\sim t_{df}$, theoretically, $p\left(t\right)=\int_{\left\{ x:\lambda\left(x\right)\ge\lambda\left(t\right)\right\} }f_{t}\left(x,\mu_{0},df\right)dx$ is s.t. $\lim_{t\rightarrow\infty}p\left(t\right)=\frac{1}{2}$ , rather than zero. Nevertheless, pratically a large t, rejecting both $H_{0}$ and $H_{1}$, should not be counted as any evidence to retain or reject $H_{0}$.

To verify the shape of $\lambda\left(x\right)$ --

## Confidence Region and Not-reject Region

Either Confidence Interval (CI) or Null Hypothesis Significance Test (NHST) has the same business, to advise whether some sample $X\equiv\left(X_{1},X_{2},\dots,X_{n}\right)$ is or is not disliked by some hypothesized parameter $\vartheta$.

NHST.com manages a database. For each Miss $\vartheta$, NHST spies out all she dislikes. Mr X logs in NHST.com and inputs a girl name and his credit card number, to bet his luck whispering-- Does she dislike me?

CI.com manages a database too. For each Mr X, CI only needs his credit card with his name X on it, then serves him a full list of available girls.

NHST.com has been historically monopolizing the market. Nevertheless, somebody prefer visiting CI.com and find that the two may share database in most cases.

Not-reject Region of $\vartheta$ is defined as $A\left(\vartheta\right)=\left\{ x:\vartheta\; doesn't\;dislike\;x\right\}$.

Confidence Region of x is defined as $S\left(x\right)\equiv\left\{ \vartheta:\vartheta\; doesn't\;dislike\;x\right\}$.

$\theta\in S\left(X\right)\Leftrightarrow \theta\,does\,not\,dislike\,X$ $\Leftrightarrow\,X\in\,A\left(\theta\right)$

So, $Pr_{\vartheta}\left(\vartheta\in S\left(X\right)\right)\ge1-\alpha,\forall\vartheta\Longleftrightarrow Pr_{\vartheta}\left(X\notin A\left(\vartheta\right)\right)\le\alpha,\forall\vartheta$

## Automatize LISREL jobs

LISREL routine can run in DOS or in command line mode of windows (windows-key + R -> CMD) . The command line is just like --

D:\My Documents>"C:\Program Files\lisrel87\lisrel87.exe" "C:\Program Files\lisrel87\LS8EX\EX61.LS8" D:\myOutput.out

1. You only need edit and input the bold part.
2. Quotation marks are used wherever the paths or filenames include blanks.
3. The 2nd argument is the output file. You can still specify more output options in your .ls8 file.
4. A .bat file can automatize batches of such lisrel jobs.

## Developing normal pdf from symmetry & independence

When I was in the 3rd grade of my middle school, I enjoyed my town bookstore as a standing library. There a series of six math-story books by Zhang Yuan-Nan impressed me a lot. I cited a case from one in my PPT when I taught the normal distribution -- the normal pdf can be derived from simple symmetry & independence conditions.

Today I can even google out an illegal pdf of its new edition to verify the case (2005, pp. 89). Actually I have bought the new edition series (now 3 books) and lent them to students. Those conditions are as instinctive as--

1. For white noise errors on 2-D, the independence means pdf at $(x,y)$ is the product of 1-D pdf, that is, $\phi\left(x\right)\phi\left(y\right)$.

2. The symmetry means pdf at $(x,y)$ is just a function of $x^{2}+y^{2}$, nothing to do with direction. That is, $\phi\left(x\right)\phi\left(y\right)=f\left(x^{2}+y^{2}\right)$.

So, $f\left(x^{2}\right)f\left(y^{2}\right)=f\left(x^{2}+0\right)f\left(0+y^{2}\right)=\phi^{2}\left(0\right)f\left(x^{2}+y^{2}\right)$.

For middle school students, the book stated a gap here to arrive at the final result $f\left(x^{2}\right)=ke^{bx^{2}}$, which is $\phi\left(x\right)=\frac{1}{\phi\left(0\right)}f\left(x^{2}+0\right)=\frac{k}{\phi\left(0\right)}e^{bx^{2}}$.

I think non-math graduate students with interests can close the gap by themselves with following small hints.

Denote $\alpha=x^{2},\beta=y^{2}$.
We have
$\log f\left(\alpha\right)+\log f\left(\beta\right)=\log\phi^{2}\left(0\right)+\log f\left(\alpha+\beta\right)$,
or
$\;\;\left[\log f\left(\alpha\right)-\log\phi^{2}\left(0\right)\right]+\left[\log f\left(\beta\right)-\log\phi^{2}\left(0\right)\right]$  $=\left(\log f\left(\alpha+\beta\right)-\log\phi^{2}\left(0\right)\right)$.
Denote $g\left(\alpha\right)=\log f\left(\alpha\right)-\log\phi^{2}\left(0\right)$.
That is, $g\left(\alpha\right)+g\left(\beta\right)=g\left(\alpha+\beta\right)$.

Now to prove $g\left(\frac{m}{n}\right)=\frac{m}{n}g\left(1\right),\forall m,n\in\mathbb{N}$. With continuousness, it gets $g\left(\alpha\right)=\alpha g\left(1\right),\forall\alpha\in\mathbb{R}$.

## My first wp plugin work: LaTeX_Math_cgi 1.0

It is actually used as a mimeTeX plugin rather than just a $L^{A}T_{E}X$ plugin. My contribution is technically trivial. But, you must need it if you change from a light theme to a cool black one while find the default mimeTeX images are black in front and transparent in background. This plugin provides mimeTeX's \reverse and \opaque options to tune your math forms to your wp theme without editing them one by one.