For example , we have a corr matrix for five standardized factors \left[\begin{array}{c}1.00\;0.42\;0.41\;0.55\;0.42\\ 0.42\;1.00\;0.48\;0.47\;0.46\\ 0.41\;0.48\;1.00\;0.48\;0.44\\ 0.55\;0.47\;0.48\;1.00\;0.50\\ 0.42\;0.46\;0.44\;0.50\;1.00\end{array}\right] (Hau, Chinese Textbook, pp. 49-50).

# Month: October 2007

## The tail(s) of p value

For any given H_0 vs H_1, the p value of any given point x is \underset{\theta\in H_{0}}{sup}P\left(\left\{ z|L\left(z\right)\ge L\left(x\right)\right\} |\theta\right), Where L\left(x\right)\equiv\frac{\underset{\theta\in H_{1}}{sup}\left[f\left(x|\theta\right)\right]}{\underset{\theta\in H_{0}}{sup}\left[f\left(x|\theta\right)\right]}

-- See R. Weber's Statistics Note (Chap 6.2 & 7.1)

I made some wrong comment on the pdf Null Ritual (Gigerenzer, Krauss, & Vitouch, 2004) Where three types of significance level (rather than p value) were discussed. I had written the comment to note that the chapter had ignored the role of H_1 in definition of p value. In almost every textbook, the two-tail p vs single-tail p are differentiated. Usually, the two-tail p is defined by H_1 like \mu\neq0.

Here I demonstrate a three-tail p value case on R platform.

z=(-1000:1000)*0.02;

f=0.5 * dchisq(abs(z),df=5);

h=dchisq(10,df=5)*.5;

plot(z,f,type="h",col=c("black","grey")[1+(f>h)]);

lines(c(-20,20),c(h,h));

## H_0 is \chi^2(5) * binomial(-1 vs 1) ##

Do you agree the region nearby zero under the "V" curve (which is below the horizontal line) should be the 3rd tail? I think so, if only H_1 includes all other possible distributions in the same shape.

You'll also agree there will be two asymmetrical tails if H_1 includes just two asymmetrical curves, for example, \mu=-2 and \mu=1 ( \sigma^2\equiv1) while H_0 is the standardized normal distribution.

## 相关系数的几何：对截距投影的残差向量之间交角余弦

一直马虎地以为两个列向量的内积就是它们所代表变量的相关系数，结果今天在学生面前出了一回丑，企图让一列常数和另一个列向量的相关系数接近1。大家都知道，一列常数和任何一个列向量的相关系数必定为零。

我的错误在于忘记了协方差表达式中，列向量作内积之前有一步中心化：减去全列的均值。被减去的实际上是一个向量，等于全列均值 \bar{x}乘以向量 \left[\begin{array}{c}1\\1\\1\\\vdots\\1\end{array}\right]，也就是 \left[\begin{array}{c}x_{1}\\x_{2}\\x_{3}\\\vdots\\x_{n}\end{array}\right]在截距向量、也就是“对角线”轴方向上的投影。 \left[\begin{array}{c}x_{1}\\x_{2}\\x_{3}\\\vdots\\x_{n}\end{array}\right]减去这个投影，是没有任何解释变量、只有截距项时的回归残差，这个残差向量 \left[\begin{array}{c}x_{1}-\bar{x}\\x_{2}-\bar{x}\\x_{3}-\bar{x}\\\vdots\\x_{n}-\bar{x}\end{array}\right]和截距方向垂直，所以落在垂直于“对角线”截距向量（日晷指针）的线性子空间里（日晷盘）。协方差实际上是这样的两个残差向量内积，而相关系数就是两残差向量之间的夹角Cosine值。

## The Popperian falsibility behind Regression Discontinuity Design (RDD)

*Figure linked From http://www.socialresearchmethods.net/kb/statrd.php (Trochim, W., 2006, Figure 2). The red line is the fallacious treatment effect.*

Causal analysis entails counter-factualist comparison between the treatment and the control conditions (Mark, 2003; Maris, 1998). To define a causal effect, two respective imaginary latent groups are introduced. The comparison is between identical subjects in the actual treatment group and in an imaginary control group, or vice versa. For example, student-A registered her RSS online and missed the collective entertainment these days. Student-B did not bother to register her RSS and took part in the collective entertainment. To ask whether RSS-attendance caused entertainment-skip, the causal statement means comparison between the actual A with RSS-attendance to an imaginary A without RSS-attendance, rather than the actual A to the actual B.

The full experimental design with randomization makes it sure that the two actual groups are identical in population before their treatment. The identity covers both pretest and relationship between post-test and pretest, so the mean post-test of the imaginary control group could be unbiasedly estimated From and then replaced by that of the actual observed control group, or vice versa.

Nevertheless, RDD only assumes that two actual groups are identical in relationship between post-test and pretest, plus that the relationships were modeled appropriately. It usually also assumes two groups were divided by a cutoff in pretest, while it is not necessary. In my opinion, RDD is a special instance of bi-group analysis. A typical RDD context is *to teach students in accordance with their aptitude* (in Chinese 因材施教).

The critical difference between full experimental design and RDD is that the identity and the model in pre-post-relationship between two actual groups is just some hypotheses to be tested by Popperian falsibility, while the population identity between groups in full experimental design is free of uncertainty by manipulated randomization. If the relationship between pretest and post-test is curvilinear or of other non-linear types, a linear regression analysis would report a fallacious treatment effect (Trochim, 2006, Figure 2).

If we have precision comparable to classic physics experiments, the relationship between pre and post tests would be shown with high Popperian falsibility. Thus, the true model is recognized without uncertainty and statistical hypothesis tests are just a surplus. Actually, we have only a typical .7 or .8 reliability in our social science measurement, and usually an approximation in true model (like RMSEA in SEM) is necessary. Then, a RDD conclusion would critically rely on the assumption of appropriate relationship modeling.

There are two conventional models to compare two groups -- Score of gain (Gain) vs residual with covariate adjustment (Cov. Adj). Moris gave discussions in depth on them (Moris, 1998). The difference between them in the Lord paradox context is well known to researchers. However, there are still a lot of confusions, some of them were cleared or tried to clear by Moris. He asserted that Regression-Toward-the-Mean and biases of Gain model do not imply one another, and that measurement errors need not be the reason of biases of Gain model. It notes that Moris explicitly stated his RTM definition is different From some version in the earlier literature (p. 322). If ubiquitousness should be a feature of RTM, the definition of Moris does not fit this criterion.

Moris pointed out that a sufficient condition for Gain model to be unbiased is that the gain scores are independent of the groups (p. 320). A more sufficient version is that gain(=posttest- pretest) scores are independent of the pretests. In figure, it equals to constant unit slopes for each regressive line. Such a relationship between posttest and pretest is more constrained than a general linear relationship for Cov. Adj., just like that the latter one is more constrained than a curvilinear relationship. Considering the low level of Popperian Falsibility in the modeling, the constraints of the relationship will be a source of controversies for researchers.

--

Maris, E. (1998). Covariance Adjustment Versus Gain Scores – Revisited. *Psychological Methods*, *3*, 309-327.

Mark, M. M. (2003). Program evaluation. In Schinka, J. A. & Velicer, W. F. (Eds.), *Handbook of psychology. Vol. 2: Research methods in psychology*. (pp. 323-347). New York: Wiley.

Trochim, W. (2006). Regression-Discontinuity Analysis. Retrieved Sep. 15, 2007, From

http://www.socialresearchmethods.net/kb/statrd.php