## Understanding QQ plots

## Try distributions like rchisq, rt, runif, rf to view its heavy, or light, left, or right tail.

 n <- 30; ry <- rnorm(n); qqnorm(ry);qqline(ry); max(ry) min(ry) ##view and guess what are x(s) and y(s) I <- rep(1,n); qr <- ((ry%*%t(I) > I %*% t(ry))+.5*(ry %*% t(I) == I%*%t(ry)))%*%I *(1/n);##qr are the sample quantiles points(qr,ry,col="blue"); ##to view the fact, try the following points(qr,qr*0,col="green",pch="|"); rx <- qnorm(qr); points(rx,ry,col="red",pch="O"); ##Red O(s) circle black o(s) exactly.

## 03DEC2007 R-workshop sponsored by dept of psy, ZSU(=SYSU, Guang-Zhou)

Here is the updated PPT for the talk in the afternoon--which includes the zipped example codes and set-up steps for the workshop in the evening within the 3rd page. The listed anonymous on-line test (result statistics) on p-value interpretation was cited indirectly From Gigerenzer, Krauss, & Vitouch (2004).

There is an advert on http://www.psy.sysu.edu.cn/detail_news.asp?id=258 and a formal CV of the speaker is available on http://lixiaoxu.googlepageS.com

## Classic Neyman-Pearson approach demo

It notes here that N-P approach does not utilize the information in the accurate p value. Actually, at the time N-P approach was firstly devised, the accurate p value was not available usually. Now almost all statistic softwares provide accurate p values and the N-P approach becomes obsolete. Wilkinson & APA TFSI (1999) recommended to report the accurate p value rather than just significance/insignificance, unless p is smaller than any meaningful precision.

--

Wilkinson, L. & APA TFSI (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594-604.

## Different corr(s) of different IV scopes with same regression coef

$Y=\alpha+\beta X+\varepsilon,\;\varepsilon\sim N\left(0,\sigma^{2}\right)$

With $\alpha,\beta, \sigma$ known in the linear relationship, can the correlation in the scatter plot of Y against X be estimated from the linear formula?

You may recall in Hierarchical Linear Model class, the scopes of the W dramatically impact the regression coefficients of F~W in the following R demo (hlm.jpg). While this time the regression coefficient has been fixed to a known $\beta$. So the scopes of X would never impact the regression coefficient. However, it proved that the correlation r could range from zero to unit (or -1) according to the variance of X in the final close form $r=\frac{\beta\mbox{Var}\left(X\right)}{\mbox{Std}\left(Y\right)\mbox{Std}\left(X\right)}=\beta\frac{\mbox{Std}\left(X\right)}{\sqrt{\beta^{2}\mbox{Var}\left(X\right)+\sigma^{2}}}$.

Let me quote as the final words from Cohen (1994; p.1001; Where the role of IV is replaced by that of DV within typical contexts like ANOVA) --

... standardized effect size measures, such as d and f, developed in power analysis (Cohen, 1988) are, like correlations, also dependent on population variability of the dependent variable and are properly used only when that fact is kept in mind.

--

Cohen, J. (1994). The earth is round (p<.05). American Psychologist, 49, 997-1003.

--

Compare to the following case: different corr(s) of different IV scopes with hierarchical regression coefficients --

## “Effect Size” — same data, different interpretations

Just a short R-script note to embody the 3-page-paper of Rosenthal & Rubin (1982).

Table 1. (p. 167) listed a simple set-up. There was a between-subject treatment. Control group includes 34 alive cases and 66 dead cases. Treatment group includes 66 alive cases and 34 dead cases. The question is what is the percentage of the variance explained by the nominal IV indicating the group?

The authors pointed out that one may interpret the data result as death rate was reduced by 32% while the other may interpret the same as 10.24% variance was explained. Let's demo it more dramatically to imagine just 4% explained variance would reduce death rate by 20%.

--

Rosenthal, R. & Rubin, D. B. (1982). A simple, general purpose display of magnitude of experimental effect. Journal of Educational Psychology, 74, 166-169.

## Anscombe’s 4 Regressions — A Trivially Updated Demo

##---------- ## This is a trivially updated version based on the R document "?anscombe". require(stats); require(graphics) anscombe

##-- now some "magic" to do the 4 regressions in a loop:##< - ff = y ~ x for(i in 1:4) { ff[2:3] = lapply(paste(c("y","x"), i, sep=""), as.name) assign(paste("lm.",i,sep=""), lmi <- lm(ff, data= anscombe)) }

 ## See how close they are (numerically!) sapply(objects(pattern="lm\\.[1-4]$"), function(n) coef(get(n))) lapply(objects(pattern="lm\\.[1-4]$"), function(n) coef(summary(get(n)))) ## Now, do what you should have done in the first place: PLOTS op <- par(mfrow=c(4,3),mar=.1+c(4,4,1,1), oma= c(0,0,2,0)) for(i in 1:4) { ff[2:3] <- lapply(paste(c("y","x"), i, sep=""), as.name) plot(ff, data =anscombe, col="red", pch=21, bg = "orange", cex = 1.2, xlim=c(3,19), ylim=c(3,13)) abline(get(paste("lm.",i,sep="")), col="blue") plot(lm(ff, data =anscombe),which=1,col="red", pch=21, bg = "orange", cex = 1.2 ,sub.caption="",caption="" ) plot(lm(ff, data =anscombe),which=2,col="red", pch=21, bg = "orange", cex = 1.2 ,sub.caption="",caption="" ) } mtext("Anscombe's 4 Regression data sets", outer = TRUE, cex=1.5) par(op) ## 

## Anscombe, F. J. (1973). Graphs in statistical analysis. American Statistician, 27, 17–21.

## 读汪丁丁《为中国股民找一个理由》所想到与读到

...当代中国人生活在千年未有之变局之内，经历着三重转型同时发生的阶段，所以，每一中国人的未来，充满着奈特意义上的不确定性。这种不确定性是不可预期且不可重复的，当代实验经济学家称之为“ambiguity”，以区分于“risk(风险)”。

...

Knight原著并不易读。甚至只是翻查《新帕尔格雷夫经济学大辞典》1987版1996中译本的UncertaintyKnight辞条，就已经令人云里雾里。其中Knight辞条执笔者是G. J. Stigler，他对Knight在Uncertainty上的“贡献”略有微词。Knight原著第7章注解1也小心的指出他打算规避认识论/知识论的讨论。这给我的感觉就好比：讨论一个被定义为“本质上不可讨论的对象”的对象。须知Uncertainty在Knight原著中唯一的内涵就是不可测度，于是所有对它的减少(eliminate)都是对它的否定。一旦比较它有多么地“不可测度”，就是在否定“不可测度”的本质。从罗素悖论的经验，我实在怀疑“不可测度性”程度的比较注定要引出悖论。

--

Knight, F. H. (1921). Risk, Uncertainty, and Profit. Boston, MA: Hart, Schaffner & Marx.