## R: Simulating multiple normal distribution with any given corr matrix

For example , we have a corr matrix for five standardized factors $\left[\begin{array}{c}1.00\;0.42\;0.41\;0.55\;0.42\\ 0.42\;1.00\;0.48\;0.47\;0.46\\ 0.41\;0.48\;1.00\;0.48\;0.44\\ 0.55\;0.47\;0.48\;1.00\;0.50\\ 0.42\;0.46\;0.44\;0.50\;1.00\end{array}\right]$ (Hau, Chinese Textbook, pp. 49-50).

## The tail(s) of p value

For any given $H_0$ vs $H_1$, the p value of any given point x is $\underset{\theta\in H_{0}}{sup}P\left(\left\{ z|L\left(z\right)\ge L\left(x\right)\right\} |\theta\right)$, Where $L\left(x\right)\equiv\frac{\underset{\theta\in H_{1}}{sup}\left[f\left(x|\theta\right)\right]}{\underset{\theta\in H_{0}}{sup}\left[f\left(x|\theta\right)\right]}$

-- See R. Weber's Statistics Note (Chap 6.2 & 7.1)

I made some wrong comment on the pdf Null Ritual (Gigerenzer, Krauss, & Vitouch, 2004) Where three types of significance level (rather than p value) were discussed. I had written the comment to note that the chapter had ignored the role of $H_1$ in definition of p value. In almost every textbook, the two-tail p vs single-tail p are differentiated. Usually, the two-tail p is defined by $H_1$ like $\mu\neq0$.

Here I demonstrate a three-tail p value case on R platform.

 z=(-1000:1000)*0.02; f=0.5 * dchisq(abs(z),df=5); h=dchisq(10,df=5)*.5; plot(z,f,type="h",col=c("black","grey")[1+(f>h)]); lines(c(-20,20),c(h,h)); ## $H_0$ is $\chi^2(5)$ * binomial(-1 vs 1) ##

Do you agree the region nearby zero under the "V" curve (which is below the horizontal line) should be the 3rd tail? I think so, if only $H_1$ includes all other possible distributions in the same shape.

You'll also agree there will be two asymmetrical tails if $H_1$ includes just two asymmetrical curves, for example, $\mu=-2$ and $\mu=1$ ($\sigma^2\equiv1$) while $H_0$ is the standardized normal distribution.

## The Popperian falsibility behind Regression Discontinuity Design (RDD)

Figure linked From http://www.socialresearchmethods.net/kb/statrd.php (Trochim, W., 2006, Figure 2). The red line is the fallacious treatment effect.

Causal analysis entails counter-factualist comparison between the treatment and the control conditions (Mark, 2003; Maris, 1998). To define a causal effect, two respective imaginary latent groups are introduced. The comparison is between identical subjects in the actual treatment group and in an imaginary control group, or vice versa. For example, student-A registered her RSS online and missed the collective entertainment these days. Student-B did not bother to register her RSS and took part in the collective entertainment. To ask whether RSS-attendance caused entertainment-skip, the causal statement means comparison between the actual A with RSS-attendance to an imaginary A without RSS-attendance, rather than the actual A to the actual B.

The full experimental design with randomization makes it sure that the two actual groups are identical in population before their treatment. The identity covers both pretest and relationship between post-test and pretest, so the mean post-test of the imaginary control group could be unbiasedly estimated From and then replaced by that of the actual observed control group, or vice versa.

Nevertheless, RDD only assumes that two actual groups are identical in relationship between post-test and pretest, plus that the relationships were modeled appropriately. It usually also assumes two groups were divided by a cutoff in pretest, while it is not necessary. In my opinion, RDD is a special instance of bi-group analysis. A typical RDD context is to teach students in accordance with their aptitude (in Chinese 因材施教).

The critical difference between full experimental design and RDD is that the identity and the model in pre-post-relationship between two actual groups is just some hypotheses to be tested by Popperian falsibility, while the population identity between groups in full experimental design is free of uncertainty by manipulated randomization. If the relationship between pretest and post-test is curvilinear or of other non-linear types, a linear regression analysis would report a fallacious treatment effect (Trochim, 2006, Figure 2).

If we have precision comparable to classic physics experiments, the relationship between pre and post tests would be shown with high Popperian falsibility. Thus, the true model is recognized without uncertainty and statistical hypothesis tests are just a surplus. Actually, we have only a typical .7 or .8 reliability in our social science measurement, and usually an approximation in true model (like RMSEA in SEM) is necessary. Then, a RDD conclusion would critically rely on the assumption of appropriate relationship modeling.

There are two conventional models to compare two groups -- Score of gain (Gain) vs residual with covariate adjustment (Cov. Adj). Moris gave discussions in depth on them (Moris, 1998). The difference between them in the Lord paradox context is well known to researchers. However, there are still a lot of confusions, some of them were cleared or tried to clear by Moris. He asserted that Regression-Toward-the-Mean and biases of Gain model do not imply one another, and that measurement errors need not be the reason of biases of Gain model. It notes that Moris explicitly stated his RTM definition is different From some version in the earlier literature (p. 322). If ubiquitousness should be a feature of RTM, the definition of Moris does not fit this criterion.

Moris pointed out that a sufficient condition for Gain model to be unbiased is that the gain scores are independent of the groups (p. 320). A more sufficient version is that gain(=posttest- pretest) scores are independent of the pretests. In figure, it equals to constant unit slopes for each regressive line. Such a relationship between posttest and pretest is more constrained than a general linear relationship for Cov. Adj., just like that the latter one is more constrained than a curvilinear relationship. Considering the low level of Popperian Falsibility in the modeling, the constraints of the relationship will be a source of controversies for researchers.

--

Maris, E. (1998). Covariance Adjustment Versus Gain Scores – Revisited. Psychological Methods, 3, 309-327.

Mark, M. M. (2003). Program evaluation. In Schinka, J. A. & Velicer, W. F. (Eds.), Handbook of psychology. Vol. 2: Research methods in psychology. (pp. 323-347). New York: Wiley.

Trochim, W. (2006). Regression-Discontinuity Analysis. Retrieved Sep. 15, 2007, From
http://www.socialresearchmethods.net/kb/statrd.php

## 惊喜：wordpress.com缺省支持latex

$n\left(X_{\left[p\times1\right]}|\mu_{\left[p\times1\right]},\Sigma_{\left[p\times p\right]}\right)$ $\equiv\left(\sqrt{2\pi}\right)^{-p}\left|\Sigma\right|^{-\frac{1}{2}}\mbox{Exp}\left[-\frac{1}{2}\left(X-\mu\right)^{\tau}\Sigma^{-1}\left(X-\mu\right)\right]$

--

yo2.cn如果要显示公式需要在后台启用安装$L^{A}T_{E}X$插件。大家可以看我启用后的效果，用$L_{Y}X$先写公式然后copy的。

lxxm.com基于wordpress mu平台，可以定制缺省启用的插件。这个wordpress mu插件基于John Forkosh的mimetex cgi

## 回复:关于“伪小数定律”的脚注2

---

P(十个样本的均值 >(1/SQRT(10))*NORMINV(0.95,0,1) | $\mu$真值=$\frac{2.23}{\sqrt{20}}$,十个样本的均值抽样分布标准差真值=$\frac{1}{\sqrt{10}}$,用Excel算=1-NORMDIST(NORMINV(1-0.05,0,1)/sqrt(10),2.23/SQRT(20),1/sqrt(10),TRUE)

--
Gigerenzer, G., Krauss, S., & Vitouch, O., (2004). The null ritual: What you always wanted to know about significance testing but were afraid to ask. In D. Kaplan, (ed.), The Sage handbook of quantitative methodology for the social sciences. (pp. 391–408). Thousand
Oaks, CA: Sage.

Tversky, A. & Kahneman, D. (1971). Belief in the law of small numbers. Psychological Bulletin, 76, 105-110.

Tversky, A. & Kahneman, D. (1974). Judgment under Uncertainty: Heuristics and Biases. Science, 185, 1124-1131.

## “不争论”的智慧

“不争论” 不只是学术智慧，也是政治智慧。下图是“不争论是我的一个发明”的语录作者--

--
Neapolitan, R, E., & Morris, S. (2004). Probabilistic modeling with Bayesian networks. In D. Kaplan (Ed.), The Sage Handbook of Quantitative Methodology for the Social Sciences (pp. 371-390). Thousand Oaks, CA: Sage.

## RTMA背后的认知偏执

[横轴是预测变量，纵轴是被预测变量；已知预测变量截于蓝线红线绿线位置。蓝线红线相加等于绿线，红箭嘴是被预测变量统计无偏估计；红箭起点是本能偏执预测，红箭表示趋中回归程度。图摘自2006/10北师大讲座PPT]

Kahneman, D., & Tversky, A. (1973). On the psychology of prediction. Psychological Review, 80, 237-251.

Kahneman, D., Slovic, P. and Tversky, A. (1982). Judgment under uncertainty: heuristics and biases. New York: Cambridge University Press.

Kane, M. T., (2006). Validation. In Brennan, E. (Ed.), Educational measurement (4th
ed. pp. 17-64). Washington, DC: American Council on Education and National Council on Measurement in Education.

Li, X., Hau, K. & Marsh, H. W. (2006, Apr). Comparison of strategies for value-added analyses: problems of Regression Toward the Mean artifact and Matthew effect. Paper Presented at American Educational Research Association Annual Meeting, San Francisco, CA.

Maris, E. (1998). Covariance Adjustment Versus Gain Scores - Revisited. Psychological Methods, 3, 309-327.

Marsh, H. W. & Hau, K. (2002). Multilevel modeling of longitudinal growth and change: substantive effects or Regression Toward the Mean Artifacts? Multivariate Behavioral Research, 37, 245-282.

Pedhazur, E. J. & Schmelkin, L. P.(1991). Measurement, design, and analysis: An integrated approach. Hillsdale, NJ: Lawrence Erlbaum Association.

Salsburg, D. (2001). The lady tasting tea: How statistics revolutionized science in the twentieth century. New York: Henry Holt & Company.

Wainer, H. & Robinson, D. H., (2003). Shaping Up the Practice of Null Hypothesis Significance Testing. Educational Researcher. 32(7). 22-30.

p.s.发现原先教案里的Kahneman都错拼成Khaneman了。