## 《结构方程模型及其应用》(侯, 温, & 成,2004)部分章节R代码

John FOX教授的sem包试写了这本教材的几个例子，结果都与LISREL8报告的Minimum Fit Function Chi-Square吻合。不过LISREL其它拟合指标用的是Normal Theory Weighted Least Squares Chi-Square?，所以看上去比Minimum Fit Function Chi-Square报告的结果要好那么一些些。LISREL历史上推的GFI/AGFI曾因为经常将差报好被批评，圈内朋友私下嘲笑这样LISREL就更好卖了，买软件的用户也高兴将差报好，只有读Paper的人上当。

[update] AMos, Mplus 与 SAS/STAT CALIS 缺省报告与使用的都是 Minimum Fit Function Chi-Square，可通过各种软件(Albright & Park, 2009)的结果对比查验。Normal Theory Weighted Least Squares Chi-Square并非总是比Minimum Fit Function Chi-Square报告更“好”的拟合结果（虽然常常如此）。Olsson, Foss, 和 Breivik (2004) 用模拟数据对比了二者，确证Minimum Fit Function Chi-Square计算得到的拟合指标在小样本之外的情形都比Normal Theory Weighted Least Squares Chi-Square的指标更适用。

R的sem包目前在迭代初值的计算上还做得不够好。我遇到的几个缺省初值下迭代不收敛案例，将参数设定合理范围的任意初值（比如TX，TY都设成0-1之间的正实数）之后都收敛了。

--

Albright, J. J., & Park., H. M. (2009). Confirmatory factor analysis using Amos, LISREL, Mplus, and SAS/STAT CALIS. The University Information Technology Services Center for Statistical and Mathematical Computing, Indiana University. Retrieved July 7, 2009, from http://www.indiana.edu/~statmath/stat/all/cfa/cfa.pdf.

Olsson, U. H., Foss, T., & Breivik, E. (2004). Two equivalent discrepancy functions for maximum likelihood estimation: Do their test statistics follow a non-central chi-square distribution under model misspecification? Sociological Methods Research, 32(4), 453-500.

[update]R中的sem包妙处之一是可在线实现结构方程应用界面。下面这个最粗陋的例子对应原书Chap3_1_4_CFA_MB.LS8的结果。
--荣耀属于sem package的作者Rweb的作者、以及服务器运算资源的提供者。

##Input Correlation Matrix
R.DHP<-matrix(0,ncol=17,nrow=17);
R.DHP[col(R.DHP) >= row(R.DHP)] <- c(
1,
.34,1,
.38,.35,1,
.02,.03,.04,1,
.15,.19,.14,.02,1,
.17,.15,.20,.01,.42,1,
.20,.13,.12,.00,.40,.21,1,
.32,.32,.21,.03,.10,.10,.07,1,
.10,.17,.12,.02,.15,.18,.23,.13,1,
.14,.16,.15,.03,.14,.19,.18,.18,.37,1,
.14,.15,.19,.01,.18,.30,.13,.08,.38,.38,1,
.18,.16,.24,.02,.14,.21,.21,.22,.06,.23,.18,1,
.19,.20,.15,.01,.14,.24,.09,.24,.15,.21,.21,.45,1,
.18,.21,.18,.03,.25,.18,.18,.18,.22,.12,.24,.28,.35,1,
.08,.18,.16,.01,.22,.20,.22,.12,.12,.16,.21,.25,.20,.26,1,
.12,.16,.25,.02,.15,.12,.20,.14,.17,.20,.14,.20,.15,.20,.50,1,
.20,.16,.18,.04,.25,.14,.21,.17,.21,.21,.23,.15,.21,.22,.29,.41,1
);
R.DHP<-t(R.DHP);
colnames(R.DHP)<-rownames(R.DHP)<-paste('X',1:17,sep='');
print('Inputted Correlation Matrix');
print(R.DHP);
##
##
##Input Model Specification of Chap3_1_4_CFA_MB.LS8
require(sem);
model.B <- matrix(ncol=3,byrow=TRUE,data=c(
'X1 <-> X1' , 'TD1_1'  , NA  ,
'X2 <-> X2' , 'TD2_2'  , NA  ,
'X3 <-> X3' , 'TD3_3'  , NA  ,
'X5 <-> X5' , 'TD5_5'  , NA  ,
'X6 <-> X6' , 'TD6_6'  , NA  ,
'X7 <-> X7' , 'TD7_7'  , NA  ,
'X8 <-> X8' , 'TD8_8'  , NA  ,
'X9 <-> X9' , 'TD9_9'  , NA  ,
'X10<-> X10', 'TD10_10', NA  ,
'X11<-> X11', 'TD11_11', NA  ,
'X12<-> X12', 'TD12_12', NA  ,
'X13<-> X13', 'TD13_13', NA  ,
'X14<-> X14', 'TD14_14', NA  ,
'X15<-> X15', 'TD15_15', NA  ,
'X16<-> X16', 'TD16_16', NA  ,
'X17<-> X17', 'TD17_17', NA  ,
'xi1<-> xi1', NA       , '1' ,
'xi2<-> xi2', NA       , '1' ,
'xi3<-> xi3', NA       , '1' ,
'xi4<-> xi4', NA       , '1' ,
'xi5<-> xi5', NA       , '1' ,
'xi1<-> xi2', 'PH12'   , NA  ,
'xi1<-> xi3', 'PH13'   , NA  ,
'xi1<-> xi4', 'PH14'   , NA  ,
'xi1<-> xi5', 'PH15'   , NA  ,
'xi2<-> xi3', 'PH23'   , NA  ,
'xi2<-> xi4', 'PH24'   , NA  ,
'xi2<-> xi5', 'PH25'   , NA  ,
'xi3<-> xi4', 'PH34'   , NA  ,
'xi3<-> xi5', 'PH35'   , NA  ,
'xi4<-> xi5', 'PH45'   , NA  ,
'X1 <- xi1' , 'LX1_1'  , NA  ,
'X2 <- xi1' , 'LX2_1'  , NA  ,
'X3 <- xi1' , 'LX3_1'  , NA  ,
'X5 <- xi2' , 'LX5_2'  , NA  ,
'X6 <- xi2' , 'LX6_2'  , NA  ,
'X7 <- xi2' , 'LX7_2'  , NA  ,
'X8 <- xi1' , 'LX8_1'  , NA  ,
'X9 <- xi3' , 'LX9_3'  , NA  ,
'X10<- xi3' , 'LX10_3' , NA  ,
'X11<- xi3' , 'LX11_3' , NA  ,
'X12<- xi4' , 'LX12_4' , NA  ,
'X13<- xi4' , 'LX13_4' , NA  ,
'X14<- xi4' , 'LX14_4' , NA  ,
'X15<- xi5' , 'LX15_5' , NA  ,
'X16<- xi5' , 'LX16_5' , NA  ,
'X17<- xi5' , 'LX17_5' , NA  )
);
class(model.B)<-'mod';
##
##
N=350;##sample size;
##R.DHP[-4,-4] excludes X4
## Result
(summary(sem.B<-sem(model.B, R.DHP[-4,-4], N)));
## Residuals
(round(residuals(sem.B),3));
#####################
boxplot.matrix = function(M,ylim=c(-1,1)) {
M = as.matrix(M);
boxplot(c(M[row(M)>col(M)]),at=1,xlab='',ylab='',ylim=ylim);
points(rep(1,length(c(M[row(M)>col(M)]))),c(M[row(M)>col(M)]),pch='-',col='red');
stem(c(M[row(M)>col(M)]));
boxplot.stats(c(M[row(M)>col(M)]));
}
####################
boxplot(residuals(sem.B));



RWebFriend for WordPress 在线示例(yo2.cn上的示例,  奇想录上的临时示例)

[update2009.07.11]文中MediaWiki与R集成的实验平台已不再开放编辑权限，原例转到另一个平台上（界面是西班牙语，找不到英语界面的平台）。希望能有帮手提供linux服务器自建一个平台。

## “Confidence interval of R-square”, but, which one?

In linear regression, confidence interval (CI) of population DV is narrower than that of predicted DV. With the assumption of generalizability, CI of $\tilde{Y}_{\left[1\times1\right]}$ at $x_{\left[1\times p\right]}$ is

$\;\hat{Y}\pm\left(x\left(X^{\tau}X_{\left[N\times p\right]}\right)^{-1}x^{\tau}\right)^{\frac{1}{2}}\hat{\sigma}t_{\frac{\alpha}{2},N-p}$,

while CI of $Y\left(x\right)=\tilde{Y}\left(x\right)+\varepsilon$ is

$\;\hat{Y}\pm\left(1+x\left(X^{\tau}X_{\left[N\times p\right]}\right)^{-1}x^{\tau}\right)^{\frac{1}{2}}\hat{\sigma}t_{\frac{\alpha}{2},N-p}$.

The pivot methods of both are quite similar as following.

$\;\frac{\hat{Y}-\tilde{Y}}{s_{\hat{Y}}}\sim t_{df=N-p}$ ,

so $\tilde{Y}_{critical}=\hat{Y}-s_{\hat{Y}}\times t_{critical}$ .

$\;\frac{\hat{Y}-Y}{s_{\left(\hat{Y}-Y\right)}}\sim t_{df=N-p}$,

so $Y_{critical}=\hat{Y}-s_{\left(\hat{Y}-Y\right)}\times t_{critical}=\hat{Y}-s_{\left(\hat{Y}-\tilde{Y}-\varepsilon\right)}\times t_{critical}$

$R^{2}$ of linear regression is the point estimate of

$\;\eta^{2}\equiv\frac{SS\left(\tilde{Y}_{\left[N\times1\right]}\right)}{SS\left(\tilde{Y}_{\left[N\times1\right]}\right)+N\sigma^{2}}$

for fixed IV(s) model. Or, it is the point estimate of $\rho^{2}$ wherein $\rho$ denotes the correlation of Y and $X\beta$, the linear composition of random IV(s) . The CI of $\rho^{2}$ is wider than that of $\eta^{2}$ with the same $R^{2}$ and confidence level.

[update] It is obvious that CI of $\rho^{2}$ relies on the distribution presumption of IV(s) and DV, as fixed IV(s) are just special cases of generally random IV(s). Usually, the presumption is that all IV(s) and DV are from multivariate normal distribution.

In the bivariate normal case with a single random IV, through Fisher's z-transform of Pearson's r, CI of the re-sampled $R^{\prime2}=r^{\prime2}$ can also be constructed. Intuitively, it should be wider than CI of $\rho^{2}$.

$\;\tanh^-\left(r\right)\equiv\frac{1}{2}\log\frac{1+r}{1-r}\;{appr\atop \sim}\; N\left(\tanh^-\left(\rho\right),\frac{1}{N-3}\right)$

Thus,

$\;\tanh^-\left(r^{\prime}\right)-\tanh^-\left(r\right){appr\atop \sim}N\left(0,\frac{2}{N-3}\right)$

CI of $\tanh^-\left(r^{\prime}\right)$ can be constructed as $\tanh^-\left(r\right)\pm\sqrt{\frac{2}{N-3}}z_{\frac{\alpha}{2}}$ . With the reverse transform $\tanh\left(.\right)$, the CI bounds of $R^{\prime2}$ are

$\;\left(\max\left(0,\tanh\left(\tanh^-\left(R\right)-\sqrt{\frac{2}{N-3}}z_{1-\frac{\alpha}{2}}\right)\right)\right)^{2}$

and

$\;\left(\tanh\left(\tanh^{-1}\left(R\right)+\sqrt{\frac{2}{N-3}}z_{1-\frac{\alpha}{2}}\right)\right)^{2}$.

In multiple p IV(s) case, Fisher's z-transform is

$\;\left(N-2-p\right)\left(\tanh^-\left(R\right)\right)^{2}\;{appr\atop \sim}\;\chi_{df=p,ncp=\left(N-2-p\right)\left(\tanh^-\left(\rho\right)\right)^{2}}^{2}$ .

Although it could also be used to construct CI of $\rho^{2}$ , it is inferior to noncentral F approximation of R (Lee, 1971). The latter is the algorithm adopted by MSDOS software R2 (Steiger & Fouladi, 1992) and R-function ci.R2(...) within package MBESS (Kelley, 2008).

In literature, "CI(s) of R-square" are hardly the literal CI(s) of $R^{2}$ in replication once more. Most of them actually refer to CI of $\rho^{2}$ . Authors in social science unfamiliar to $L^AT_EX$ hate to type $\rho$ when they feel convenient to type r or R. Users of experimentally designed fixed IV(s) should have reported CI of $\eta^{2}$ . However, if they were too familiar to Steiger's software R2 to ignore his series papers on CI of effect size, it would be significant chance for them to report a loose CI of $\rho^{2}$, even in a looser name "CI of $R^{2}$".

----

Lee, Y. S. (1971). Some results on the sampling distribution of the multiple correlation coefficient. Journal of the Royal Statistical Society, B, 33, 117–130.

Kelley, K. (2008). MBESS: Methods for the Behavioral, Educational, and Social Sciences. R package version 1.0.1. [Computer software]. Available from http://www.indiana.edu/~kenkel

Steiger, J. H., & Fouladi, R. T. (1992). R2: A computer program for interval estimation, power calculation, and hypothesis testing for the squared multiple correlation. Behavior research methods, instruments and computers, 4, 581–582.

R Code of Part I:

R Code of Part II:

## WordPress (and WPMU) Plugin for R Web Interface

 Download: RwebFriend.zip [Update] Including Chinese UTF8 Version Plugin Name: RwebFriend Plugin URL: http://xiaoxu.lxxm.com/RwebFriend Description: Set Rweb url options and transform [rcode]...[/rcode] or ... tag-pair into TEXTAREA which supports direct submit to web interface of R. *Credit notes：codes of two relevant plugins are studied and imported. One of the plugins deals with auto html tags within TEXTAREA tag-pair, the other stops WordPress to auto-transform quotation marks. Version: 1.0 Author: Xiaoxu LI Author URI: http://xiaoxu.lxxm.com/ Setup:Wordpress 3.5 WordPress 3.4 Usage:

[update] The free Chinese wordpress platform yo2.cn has installed this plugin.  See my demo.

More online demos -- http://wiki.qixianglu.cn/rwebfriendttest/

[update, June 2009] 72pines.com (here!) installed this plugin. Try

[update2009JUL18]Test installed packages of Rweb:
http://pbil.univ-lyon1.fr/cgi-bin/Rweb/Rweb.cgi
https://rweb.stat.umn.edu/cgi-bin/Rweb/Rweb.cgi

## Type III ANOVA in R

Type III ANOVA SS for factor A within interaction of factor B is defined as $SS_{A:B+A+B}-SS_{A:B+B}$, wherein A:B  is the pure interaction effect orthogonal to main effects of A, B, and intercept. There are some details in R to get pure interaction dummy IV(s).

Data is from SAS example PROC GLM, Example 30.3: Unbalanced ANOVA for Two-Way Design with Interaction

## Understanding QQ plots

## Try distributions like rchisq, rt, runif, rf to view its heavy, or light, left, or right tail.

 n <- 30; ry <- rnorm(n); qqnorm(ry);qqline(ry); max(ry) min(ry) ##view and guess what are x(s) and y(s) I <- rep(1,n); qr <- ((ry%*%t(I) > I %*% t(ry))+.5*(ry %*% t(I) == I%*%t(ry)))%*%I *(1/n);##qr are the sample quantiles points(qr,ry,col="blue"); ##to view the fact, try the following points(qr,qr*0,col="green",pch="|"); rx <- qnorm(qr); points(rx,ry,col="red",pch="O"); ##Red O(s) circle black o(s) exactly.

## 03DEC2007 R-workshop sponsored by dept of psy, ZSU(=SYSU, Guang-Zhou)

Here is the updated PPT for the talk in the afternoon--which includes the zipped example codes and set-up steps for the workshop in the evening within the 3rd page. The listed anonymous on-line test (result statistics) on p-value interpretation was cited indirectly From Gigerenzer, Krauss, & Vitouch (2004).

There is an advert on http://www.psy.sysu.edu.cn/detail_news.asp?id=258 and a formal CV of the speaker is available on http://lixiaoxu.googlepageS.com

## Classic Neyman-Pearson approach demo

It notes here that N-P approach does not utilize the information in the accurate p value. Actually, at the time N-P approach was firstly devised, the accurate p value was not available usually. Now almost all statistic softwares provide accurate p values and the N-P approach becomes obsolete. Wilkinson & APA TFSI (1999) recommended to report the accurate p value rather than just significance/insignificance, unless p is smaller than any meaningful precision.

--

Wilkinson, L. & APA TFSI (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594-604.