Understanding QQ plots

## Try distributions like rchisq, rt, runif, rf to view its heavy, or light, left, or right tail.


n <- 30;
ry <- rnorm(n);
qqnorm(ry);qqline(ry);
max(ry)
min(ry)
##view and guess what are x(s) and y(s)
I <- rep(1,n);
qr <- ((ry%*%t(I) > I %*% t(ry))+.5*(ry %*% t(I) == I%*%t(ry)))%*%I *(1/n);##qr are the sample quantiles
points(qr,ry,col="blue"); ##to view the fact, try the following
points(qr,qr*0,col="green",pch="|");
rx <- qnorm(qr);
points(rx,ry,col="red",pch="O");
##Red O(s) circle black o(s) exactly.

03DEC2007 R-workshop sponsored by dept of psy, ZSU(=SYSU, Guang-Zhou)

Here is the updated PPT for the talk in the afternoon--which includes the zipped example codes and set-up steps for the workshop in the evening within the 3rd page. The listed anonymous on-line test (result statistics) on p-value interpretation was cited indirectly From Gigerenzer, Krauss, & Vitouch (2004).

There is an advert on http://www.psy.sysu.edu.cn/detail_news.asp?id=258 and a formal CV of the speaker is available on http://lixiaoxu.googlepageS.com

Classic Neyman-Pearson approach demo

It notes here that N-P approach does not utilize the information in the accurate p value. Actually, at the time N-P approach was firstly devised, the accurate p value was not available usually. Now almost all statistic softwares provide accurate p values and the N-P approach becomes obsolete. Wilkinson & APA TFSI (1999) recommended to report the accurate p value rather than just significance/insignificance, unless p is smaller than any meaningful precision.




--

Wilkinson, L. & APA TFSI (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594-604.

Different corr(s) of different IV scopes with same regression coef

Y=\alpha+\beta X+\varepsilon,\;\varepsilon\sim N\left(0,\sigma^{2}\right)

With \alpha,\beta, \sigma known in the linear relationship, can the correlation in the scatter plot of Y against X be estimated from the linear formula?

You may recall in Hierarchical Linear Model class, the scopes of the W dramatically impact the regression coefficients of F~W in the following R demo (hlm.jpg). While this time the regression coefficient has been fixed to a known \beta. So the scopes of X would never impact the regression coefficient. However, it proved that the correlation r could range from zero to unit (or -1) according to the variance of X in the final close form r=\frac{\beta\mbox{Var}\left(X\right)}{\mbox{Std}\left(Y\right)\mbox{Std}\left(X\right)}=\beta\frac{\mbox{Std}\left(X\right)}{\sqrt{\beta^{2}\mbox{Var}\left(X\right)+\sigma^{2}}}.

Let me quote as the final words from Cohen (1994; p.1001; Where the role of IV is replaced by that of DV within typical contexts like ANOVA) --

... standardized effect size measures, such as d and f, developed in power analysis (Cohen, 1988) are, like correlations, also dependent on population variability of the dependent variable and are properly used only when that fact is kept in mind.

--

Cohen, J. (1994). The earth is round (p<.05). American Psychologist, 49, 997-1003.

--

Compare to the following case: different corr(s) of different IV scopes with hierarchical regression coefficients --



“Effect Size” — same data, different interpretations



Just a short R-script note to embody the 3-page-paper of Rosenthal & Rubin (1982).

Table 1. (p. 167) listed a simple set-up. There was a between-subject treatment. Control group includes 34 alive cases and 66 dead cases. Treatment group includes 66 alive cases and 34 dead cases. The question is what is the percentage of the variance explained by the nominal IV indicating the group?

The authors pointed out that one may interpret the data result as death rate was reduced by 32% while the other may interpret the same as 10.24% variance was explained. Let's demo it more dramatically to imagine just 4% explained variance would reduce death rate by 20%.

--

Rosenthal, R. & Rubin, D. B. (1982). A simple, general purpose display of magnitude of experimental effect. Journal of Educational Psychology, 74, 166-169.

Anscombe’s 4 Regressions — A Trivially Updated Demo

##----------
## This is a trivially updated version based on the R document "?anscombe".
require(stats); require(graphics)
anscombe

##-- now some "magic" to do the 4 regressions in a loop:##< -
ff = y ~ x
for(i in 1:4) {
ff[2:3] = lapply(paste(c("y","x"), i, sep=""), as.name)
assign(paste("lm.",i,sep=""), lmi <- lm(ff, data= anscombe))
}

## See how close they are (numerically!)
sapply(objects(pattern="lm\\.[1-4]$"), function(n) coef(get(n)))
lapply(objects(pattern="lm\\.[1-4]$"),
function(n) coef(summary(get(n))))

## Now, do what you should have done in the first place: PLOTS
op <- par(mfrow=c(4,3),mar=.1+c(4,4,1,1), oma= c(0,0,2,0))
for(i in 1:4) {
ff[2:3] <- lapply(paste(c("y","x"), i, sep=""), as.name)
plot(ff, data =anscombe, col="red", pch=21, bg = "orange", cex = 1.2,
xlim=c(3,19), ylim=c(3,13))
abline(get(paste("lm.",i,sep="")), col="blue")
plot(lm(ff, data =anscombe),which=1,col="red", pch=21, bg = "orange", cex = 1.2
,sub.caption="",caption="" )
plot(lm(ff, data =anscombe),which=2,col="red", pch=21, bg = "orange", cex = 1.2
,sub.caption="",caption="" )
}
mtext("Anscombe's 4 Regression data sets", outer = TRUE, cex=1.5)
par(op)

##

## Anscombe, F. J. (1973). Graphs in statistical analysis. American Statistician, 27, 17–21.

自由度的几何:对截距项投影残差向量的长度平方

这是《相关系数的几何:对截距投影的残差向量之间交角余弦》示意图,恰好可以用于解释为什么 \sum_{i=1}^{n}\left(X_{i}-\bar{X}\right)^{2}满足的 \chi^2分布dfn-1而不是n

其中 X_{i}\equiv\mu+\varepsilon_{i} \left[\begin{array}{c}\varepsilon_{1}\\\varepsilon_{2}\\\vdots\\\varepsilon_{n}\end{array}\right]n维空间中的标准正态随机向量。那么,容易知道有 \sum_{i=1}^{n}\left(X_{i}-\bar{X}\right)^{2}=\sum_{i=1}^{n}\left(\varepsilon{}_{i}-\bar{\varepsilon}\right)^{2}。这个表达式就是向量 \left[\begin{array}{c}\varepsilon_{1}\\\varepsilon_{2}\\\vdots\\\varepsilon_{n}\end{array}\right]-\left[\begin{array}{c}\bar{\varepsilon}\\\bar{\varepsilon}\\\vdots\\\bar{\varepsilon}\end{array}\right]长度的平方。我们已经知道, \left[\begin{array}{c}\bar{\varepsilon}\\\bar{\varepsilon}\\\vdots\\\bar{\varepsilon}\end{array}\right]就是 \left[\begin{array}{c}\varepsilon_{1}\\\varepsilon_{2}\\\vdots\\\varepsilon_{n}\end{array}\right]在截距向量(日晷指针) \left[\begin{array}{c}1\\1\\\vdots\\1\end{array}\right]上的投影。自然, \left[\begin{array}{c}\varepsilon_{1}\\\varepsilon_{2}\\\vdots\\\varepsilon_{n}\end{array}\right]-\left[\begin{array}{c}\bar{\varepsilon}\\\bar{\varepsilon}\\\vdots\\\bar{\varepsilon}\end{array}\right]就是对截距项投影残差向量,也就是在日晷盘上的投影。

日晷所处空间的n是3。如果我们对 \left[\begin{array}{c}\varepsilon_{1}\\\varepsilon_{2}\\\varepsilon_{3}\end{array}\right]抽样许多次,就会看到三维空间中各个方向对称的标准正态分布散点图。这些散点图在日晷盘上的投影就是二维空间标准正态分布散点图。日晷盘中这些点对应向量的长度平方自然是 \chi^2_{df=2}的抽样。

习题:一类错误的注水

一个研究者每次都先看一下计算出的统计量再决定对零假设 \mu=0做单尾检验还是双尾检验。如果统计量 \bar{X}>0,就设对立假设为 \mu>0;如果统计量 \bar{X}<0,就设对立假设为 \mu<0。假如他的 \alpha=0.05请问他真实的一类错误率是多少?具体说,有许多次的实验,真实情形都是 \mu=0,他能检验出显著拒绝的比例会趋近于多少?

读汪丁丁《为中国股民找一个理由》所想到与读到

汪文注明首发《IT经理世界》,我读于CCER新闻。特别摘出部分:

...当代中国人生活在千年未有之变局之内,经历着三重转型同时发生的阶段,所以,每一中国人的未来,充满着奈特意义上的不确定性。这种不确定性是不可预期且不可重复的,当代实验经济学家称之为“ambiguity”,以区分于“risk(风险)”。

股票市场固然风险很高,可是,我们综观股市之外的种种生活,风险何尝不高呢?与其走出股市得一人生之幻灭,为何不走入股市搏一幻灭之人生?或者,用经济学的术语再说一遍:股市之外的高风险人生,却并不带来相应的高回报。大众纷纷进入股市,是因为他们知道在那里承担风险至少有带来相应回报的可能性。也就是说,与其终生储蓄在银行里并希望渺茫地预期不断上涨的养老、医疗、住房、教育和日常生活的费用不至于完全侵蚀了他们微不足道的储蓄,不如以这微不足道的储蓄充当投资股市的本钱,反而是更富理性的选择。

...

这涉及到我备课过程中原先没意识到也许密切关联的两个论题。第一是奈特/Knight(1921)的可测度的Risk和不可测度的Uncertainty的区分。汪文第二段中的“风险”显然是Uncertainty而不是Risk。有意思的是,不可测度的Uncertainty却是可比较高低的(这不是汪的创见,而是Knight原著的意见)。用心理计量学术语,Uncertainty不是scale变量,但却是ordinal变量,而且很可能还是连续的ordinal变量。

Knight原著并不易读。甚至只是翻查《新帕尔格雷夫经济学大辞典》1987版1996中译本的UncertaintyKnight辞条,就已经令人云里雾里。其中Knight辞条执笔者是G. J. Stigler,他对Knight在Uncertainty上的“贡献”略有微词。Knight原著第7章注解1也小心的指出他打算规避认识论/知识论的讨论。这给我的感觉就好比:讨论一个被定义为“本质上不可讨论的对象”的对象。须知Uncertainty在Knight原著中唯一的内涵就是不可测度,于是所有对它的减少(eliminate)都是对它的否定。一旦比较它有多么地“不可测度”,就是在否定“不可测度”的本质。从罗素悖论的经验,我实在怀疑“不可测度性”程度的比较注定要引出悖论。

这便引出与之相联系的第二个论题:“主观概率”。在Uncertainty辞条中Knight的角色只是一笔带过,而主观概率才是更实质的关键词。似乎很根本的一个问题是:如果我们“完全地、本质地”不知道一个随机分布,在何种程度上能或者不能建立起一个有普遍意义的主观概率分布?--也许读懂辞条后,初学者的问题会自然消解。

--

Knight, F. H. (1921). Risk, Uncertainty, and Profit. Boston, MA: Hart, Schaffner & Marx.

Understanding the nominal IV