## 10^6 digits of π & e vs quantum random numbers

Photo from Wikipedia: Pi/Pie

10^6 digits of π & e : pi&e.csv

If you prefer quantum randomness to pseudo-randomness, run the following code powered by ANU Quantum Random Numbers Server.

## 2015/05/02 Fu Jen Univ. Talk

PPT for Fu Jen Univ. Talk:  20150502.FJU

Related materials:

Abstract

In his seminal book Flow, Csikszentmihalyi described the experience of close interaction with some Other. Its Chinese translator interpreted it as Big Self, which is a Buddhist term too.  This early concept finally developed into cosmic self / quantum self in Csikszentmihalyi’s following volume The Evolving Self. Inspired by it, Flow can be understood as the lasting self beyond the short-term memory period. Daniel Dennett’s Multiple Drafts Model regarded self as the derivative on the gravity center of parallel narrative contents. This talk tried to extend the model to deliberate the stable and lasting gravity center of lengthy narratives and the conscious contents themselves exceeding body limits. Furthermore, the three selves defined on current / flow / eternity are found isomorphic to Seligman’s triple happy lives of pleasure / engagement / meaningfulness in positive psychology.

## Coerce RStudio to output plots in a new window

Update: It is easier. Credit: John K. Kruschke

PC:

```windows()
```

Mac:

```quartz()
```

* Powered by XQuartz (dmg). X11() on Mac does not support BetterTouchTool.

Linux:

```X11()
```

==The following is obsolete==
The trick is to use Rcmdr to create the new diagram window.
library(Rcmdr);plot(1,type='n');
Then the following plots by r codes in rstudio will be coerced to output in the new window.
Both rstudio for Mac and for Windows support the trick.

## sem::cfa(…) is elegant

Prof. John Fox's sem package provided an elegant interface to define CFA and general SEM models. The following is an example given by Dr. K-T Hau's SEM course at coursera and his textbook (chap1_2).

```library(sem) #R v322; sem v3.1-6
options(fit.indices=c("RMSEA","CFI","SRMR","AIC","BIC"))

# First of all, do not forget to provide the sample size
N=100
# Then correlation matrix
1.00
0.12 	1.00
0.08 	0.08 	1.00
0.50 	0.11 	0.08 	1.00
0.48 	0.03 	0.12 	0.45 	1.00
0.07 	0.46 	0.15 	0.08 	0.11 	1.00
0.05 	0.44 	0.15 	0.12 	0.12 	0.44 	1.00
0.14 	0.17 	0.53 	0.14 	0.08 	0.10 	0.06 	1.00
0.16 	0.05 	0.43 	0.10 	0.06 	0.08 	0.10 	0.54 	1.00

# Fixed factor variances with reference.indicators=F
model.cfa <- cfa(reference.indicators=F)
F1: X1, X4, X5
F2: X2, X6, X7
F3: X3, X8, X9

# Results
sem.cfa <- sem(model.cfa,cor.Item,N=N)
summary(sem.cfa)

# More details
## The model and the data
model.cfa
print.table(cor.Item)

## Histogram &amp;amp; density of the correlation values
hist(cor.lower <- cor.Item[row(cor.Item) > col(cor.Item)],freq=F,breaks=10,
xlab='Correlation',col='grey');
rug(jitter(cor.lower),col='red')
lines(density(cor.lower),col='blue',lty=2)

## Highlight those  > 0.4 correlations
library(ellipse)
plotcorr(cor.Item,col=grey(1-abs(cor.Item)),type='lower',diag=T);

## In a grouped order
idx <- c(1,4,5,2,6,7,3,8,9)
plotcorr(cor.Item[idx,idx],col=grey(1-abs(cor.Item[idx,idx])),diag=T);

```

One more example (Chap3_1_2 Model A, B, & C), to demo usage of modification indices and GraphViz --

```library(sem) #R v322; sem v3.1-6
options(fit.indices=c("RMSEA","CFI","SRMR","AIC","BIC"))

1
.34 1
.38 .35 1
.02 .03 .04  1
.15 .19 .14 .02  1
.17 .15 .20 .01 .42  1
.20 .13 .12 .00 .40 .21  1
.32 .32 .21 .03 .10 .10 .07 1
.10 .17 .12 .02 .15 .18 .23 .13  1
.14 .16 .15 .03 .14 .19 .18 .18 .37  1
.14 .15 .19 .01 .18 .30 .13 .08 .38 .38  1
.18 .16 .24 .02 .14 .21 .21 .22 .06 .23 .18  1
.19 .20 .15 .01 .14 .24 .09 .24 .15 .21 .21 .45  1
.18 .21 .18 .03 .25 .18 .18 .18 .22 .12 .24 .28 .35  1
.08 .18 .16 .01 .22 .20 .22 .12 .12 .16 .21 .25 .20 .26  1
.12 .16 .25 .02 .15 .12 .20 .14 .17 .20 .14 .20 .15 .20 .50  1
.20 .16 .18 .04 .25 .14 .21 .17 .21 .21 .23 .15 .21 .22 .29 .41  1
")

# Fixed factor variances with reference.indicators=F

Chap.3_1_2_A <- cfa(reference.indicators=T,text="
eta1:X1,X2,X3,X4
eta2:X5,X6,X7,X8
eta3:X9,X10,X11
eta4:X12,X13,X14
eta5:X15,X16,X17
")
Chap.3_1_2_A #检查模型是否固定负荷放开方差
#
(sem.3_1_2_A <- sem(Chap.3_1_2_A,cor.17,N=350))
# sem.3_1_2_A 对象都有哪些属性?
(sum.3_1_2_A <- summary(sem.3_1_2_A))
# sum.3_1_2_A 对象都有哪些属性?

# modification indices
modIndices(sem.3_1_2_A)
summary(modIndices(sem.3_1_2_A))

# Find the small parameters
sum.3_1_2_A\$coeff[order(abs(sum.3_1_2_A\$coeff[,1])),]
sdc.3_1_2_A <- standardizedCoefficients(sem.3_1_2_A)
sdc.3_1_2_A[order(abs(sdc.3_1_2_A\$`Std. Estimate`)),]
# Highlight the non-significant parameters
sum.3_1_2_A\$coeff[sum.3_1_2_A\$coeff[,4] >=.05,]

# The alternative models, delete X4 and load X8 from eta1 rather than from eta2
Chap.3_1_2_B <- cfa(reference.indicators=T,text="
eta1: X1, X2, X3, X8
eta2: X5, X6, X7
eta3: X9, X10,X11
eta4: X12,X13,X14
eta5: X15,X16,X17
")

Chap.3_1_2_C <- cfa(reference.indicators=T, text="
eta1: X1, X2, X3, X8
eta2: X5, X6, X7, X8
eta3: X9, X10,X11
eta4: X12,X13,X14
eta5: X15,X16,X17
")
cor.16 <- cor.17[rownames(cor.17)!='X4',colnames(cor.17)!='X4']

# Results
sem.3_1_2_B <- sem(Chap.3_1_2_B,cor.16,N=350)
(sum.3_1_2_B <- summary(sem.3_1_2_B))

modIndices(sem.3_1_2_B)

sem.3_1_2_C <- sem(Chap.3_1_2_C,cor.16,N=350)
(sum.3_1_2_C <- summary(sem.3_1_2_C))
modIndices(sem.3_1_2_C)

# Model Comparision
## Embedded models
anova(sem.3_1_2_B,sem.3_1_2_C)
## Non-embedded models
AIC(sem.3_1_2_B) - AIC(sem.3_1_2_A)
sum.3_1_2_B\$AIC - sum.3_1_2_A\$AIC

# Path Diagram, ignoring dual-headed arrows
# View the diagram in RStudio Viewer
pathDiagram(sem.3_1_2_B,ignore.double=T)

# RMSEA
sum.3_1_2_B <- summary(sem.3_1_2_B)
sum.3_1_2_B\$RMSEA
(I.16 <- diag(diag(cor.16)))
summary(sem(Chap.3_1_2_B,cor.16+0.02*I.16,350))\$RMSEA

## Histogram & density of the correlation values
hist(cor.lower <- cor.17[row(cor.17) > col(cor.17)],freq=F,breaks=10,
xlab='Correlation',col='grey');
rug(jitter(cor.lower),col='red')
lines(density(cor.lower),col='blue',lty=2)

## Fixed Var vs Fixed Weight
(Chap.3_1_2_B.V <-cfa(reference.indicators=F,text="
eta1: X1, X2, X3, X8
eta2: X5, X6, X7
eta3: X9, X10,X11
eta4: X12,X13,X14
eta5: X15,X16,X17
"))
sem.3_1_2_B.V <- sem(Chap.3_1_2_B.V,cor.16,350)
sum.3_1_2_B.V <- summary(sem.3_1_2_B.V)
sum.3_1_2_B.V\$chisq
sum.3_1_2_B\$chisq

## Highlight those  > = 0.3 correlations
library(ellipse)
plotcorr(cor.17,col=grey((cor.17<.3)),type='lower',diag=T);

## In a grouped order,in darker colors
idx <- c(c(1,2,3,8),5:7,9:17,4)
cor.17.FU <- (cor.17-diag(diag(cor.17))+t(cor.17))[idx,idx]
plotcorr(cor.17.FU,col=grey((1-abs(cor.17.FU))^2),diag=T);
abline(v=(vh <- c(4,7,10,13,16)+.5),lty=2,col='red')
abline(h=18-vh,lty=2,col='red')

#####################
#####################

0.418
0.407	0.48
0.547	0.469	0.481
0.422	0.458	0.443	0.496
")
(pc.5 <- princomp(covmat =cor.5 ))
screeplot(pc.5)
abline(h=1,lty=2)

(Chap.3_4_2<- cfa(reference.indicators=T,covs = paste('eta',1:5,sep = ''),text="
eta1: X1, X2, X3, X8
eta2: X5, X6, X7
eta3: X9, X10,X11
eta4: X12,X13,X14
eta5: X15,X16,X17
xi:eta1,eta2,eta3,eta4,eta5
"))

summary(sem.3_4_2 <- sem(Chap.3_4_2,cor.16,N = 350))
pathDiagram(sem.3_4_2,edge.labels = 'v')
anova(sem.3_4_2,sem.3_1_2_B)

###############
###############

cor.18 <- readMoments(names = c(paste("y",1:9,sep=""),paste("x",1:9,sep = "")),diag = F,text = "
0.68
0.60	0.58
0.01	0.10	0.07
0.12	0.04	0.06	0.29
0.06	0.06	0.01	0.35	0.24
0.09	0.13	0.10	0.05	0.03	0.07
0.04	0.08	0.16	0.10	0.12	0.06	0.25
0.06	0.09	0.02	0.02	0.09	0.16	0.29	0.36
0.23	0.26	0.19	0.05	0.04	0.04	0.08	0.09	0.09
0.11	0.13	0.12	0.03	0.05	0.03	0.02	0.06	0.06	0.40
0.16	0.09	0.09	0.10	0.10	0.02	0.04	0.12	0.15	0.29	0.20
0.24	0.26	0.22	0.14	0.06	0.10	0.06	0.07	0.08	0.03	0.04	0.02
0.21	0.22	0.29	0.07	0.05	0.17	0.12	0.06	0.06	0.03	0.12	0.04	0.55
0.29	0.28	0.26	0.06	0.07	0.05	0.06	0.15	0.2	0.1	0.03	0.12	0.64	0.61
0.15	0.16	0.19	0.18	0.08	0.07	0.08	0.1	0.06	0.15	0.16	0.07	0.25	0.25	0.16
0.24	0.20	0.16	0.13	0.15	0.18	0.19	0.18	0.14	0.11	0.07	0.16	0.19	0.21	0.22	0.35
0.14	0.25	0.12	0.09	0.11	0.09	0.09	0.11	0.21	0.17	0.09	0.05	0.21	0.23	0.18	0.39	0.48
")
str(cor.18)
library(ellipse)
plotcorr(cor.18,type = 'l',col=grey(1-abs(cor.18)))

# Without intial values, there will be some trouble
(M0 <- cfa(reference.indicators = T,text = "
eta1:y1,y2,y3
eta2:y4,y5,y6
eta3:y7,y8,y9
xi1:x1,x2,x3
xi2:x4,x5,x6
xi3:x7,x8,x9
"))
(sem.0 <- sem(M0,cor.18,500))
# (sum.0 <- summary(sem.0))
# Error in summary.objectiveML(sem.0) :
#   coefficient covariances cannot be computed

# With initial values, M0 is OK
(M0 <- cfa(reference.indicators = T,text = "
eta1:y1,y2(1),y3(1)
eta2:y4,y5(1),y6(1)
eta3:y7,y8(1),y9(1)
xi1:x1,x2(1),x3(1)
xi2:x4,x5(1),x6(1)
xi3:x7,x8(1),x9(1)
"))
sem.0 <- sem(M0,S = cor.18,N = 500)
(sum.0 <- summary(sem.0))

(M1 <- cfa(covs = c("xi1,xi2,xi3","eta2,eta3"),reference.indicators = T,text = "
eta1:y1,y2(1),y3(1),eta2
eta2:y4,y5(1),y6(1)
eta3:y7,y8(1),y9(1)
xi1:x1,x2(1),x3(1),eta1,eta3
xi2:x4,x5(1),x6(1),eta1
xi3:x7,x8(1),x9(1),eta1,eta2
"))
sem.1 <- sem(M1,S = cor.18,N = 500)
(sum.1 <- summary(sem.1))
anova(sem.1,sem.0)
sum.1\$RMSEA
sum.0\$RMSEA

pathDiagram(sem.1)
modIndices(sem.1)
(M5 <- update(M1,text="
delete,eta1 -> eta2
delete,xi1 -> eta3
delete,eta2 <-> eta3"))
sem.5 <- sem(M5,S = cor.18,N = 500)
(sum.5 <- summary(sem.5))
sum.5\$RMSEA
standardizedCoefficients(sem.5)

```

## Quiz: Correlation of dummy variables

One-way ANOVA with a balanced K-group design is equivalent to a regression on the intercept and K-1 dummy variables. The quiz is on the correlation between any two dummy variable——positive, zero, or negative? One may think it is zero for any two dummy variables are orthogonal with one another. The answer will emerges after your click.

## Understanding Tukey’s q & p

For a planned SINGLE comparison, the CI of $\mu_2-\mu_1$ can be interpreted as $(M_2-M_1)\pm qt_{1-\frac{\alpha}{2},df}\times SE$, in which $SE=s\times \sqrt{\frac{1}{n_1}+\frac{1}{n_2}}$. The $df$ and $s$ should be defined according to the denominator of $F$.

For post hoc multiple comparisons, the CI of $\mu_2-\mu_1$ can still be interpreted as $(M_2-M_1)\pm \frac{qtukey_{1-\alpha,nmeans,df}}{\sqrt{2}}\times SE$, wherein $nmeans$ indicates the number of sub-groups whose means are compared. Note that qtukey(...) and ptukey(...) is defined with a two-tailed probability while qt(...) and pt(...) with a single-tailed one.

To verify it with a single click

## Mann-Whitney-Wilcoxon Test vs. Z-test, for non-normal samples

Control group includes 150 Poisson ($\lambda$=2.3) cases;
Experiment group includes 200 cases, with fixed effect +1 which is unknown but interested in.

Researcher A chose Z test, considering the big sample sizes.
Researcher B chose Mann-Whitney-Wilcoxon Test, considering the non-normal distributions.
Whose confidence interval will beat?

Researcher A might complain: "Your CI width is actually from Poisson distribution. Most time it is actually zero. Let's try $\chi^2_{df=1}$. It would be fairer for Z-Test to use a continual distribution."

Mann-Whitney-Wilcoxon plays better than his expectation even in the continual case.

## CI of Non-Centrality Parameter (ncp)

RMSEA=$\sqrt{\frac{ncp_{\chi^2}}{(N-1)df}}$

Cohen's $\delta$=$\frac{ncp_t}{SD/SE}$

Cohen's $\tilde{f}^2$=$\frac{ncp_F}{N}$

$\eta^2$=$\frac{ncp_F}{ncp_F+N}$

* Steiger (2004, Eq. 48-51) and Smithson (2003, Eq. 5.17-5.19) used different definitions for Cohen's partial $\tilde{f}^2$ ( and partial $\eta^2$). This post adopted Steiger's definition divided by the total sample size $N$, rather than by Smithson's denominator $(df_1 + df_2 + 1)$.

To view density plots

Refereces:

• Smithson, M. (2003). Confidence Intervals. Thousand Oaks, CA: Sage Publications.
• Steiger, J. H. (2004). Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9(2), 164-182.
• ## Why practitioners discretize their continuous data

Yihui asked this question yesterday. My supervisor Dr. Hau also criticized routine grouping discretization. I encountered two plausible reasons in 2007 classes, one negative, the other at least conditionally positive.

The first is a variant of the old Golden Hammer law -- if the only tool is ANOVA, every continuous predictor need discretization. The second reason is empirical -- ANOVA with discretization steals df(s). Let's demo it with a diagram.
The red are the population points, and the black are samples. Which predicts the population better--the green continuous line, or the discretized blue dashes? R simulation code is given.

## Misunderstanding of Eq. 4 in Singer’s (1998) SAS PROC MIXED paper

Singer (1998, p. 327, Eq. 4) gave a big covariance matrix as the following --

...if we combine the variance components for the two random effects together into a single matrix, we would find a highly structured block diagonal matrix. For example, if there were three students in each class, we would have:

If the number of students per class varied, the size of each of these submatrices would also vary, although they would still have this common structure. The variance in MATHACH for any given student is assumed to be $\tau_{00}+\sigma^{2}$ ...

Quiz:

1. Sampling K repetitions with replacement, $Y_{k}$ denotes MATHACH of the k-th random student within a given class, say, the first class. What is the population variance of the Y series? A: $\sigma^{2}$, B: $\tau_{00}+\sigma^{2}$.

2. Sampling $2\times K$ repetitions with replacement, $\left(Y_{1st,k},Y_{2nd,k}\right)$ denotes MATHACH(s) of the k-th pair of random students within a given class, say, the first class. Sometime the 2nd sampled student is just the 1st one by chance. What is the population covariance of the $Y_{1st}$ and $Y_{2nd}$ series? A: 0, B: $\tau_{00}$.

3. Sampling K repetitions with replacement, $Y_{k}$ denotes MATHACH of the k-th random student within one randomly sampled class for each repetition. What is the population variance of the Y series? A: $\sigma^{2}$, B: $\tau_{00}+\sigma^{2}$.

4. Sampling $2\times K$ repetitions with replacement, $\left(Y_{1st,k},Y_{2nd,k}\right)$ denotes MATHACH(s) of the k-th pair of random students within one random class sampled for each pair of students. Sometime the 2nd sampled student is just the 1st one by chance. What is the population covariance of the $Y_{1st}$ and $Y_{2nd}$ series? A: 0, B: $\tau_{00}$.

Correct answers: A A B B.

Your plausible incorrect answers for Quiz 1 and 2 just tell how the matrix caused my misunderstanding when I read the paper the first time. It is difficult to give names for the randomly sampled classes, and the matrix columns and rows.

--
Singer, J. D. (1998). Using SAS PROC MIXED to fit multilevel models, hierarchical models, and individual growth models. Journal of Educational and Behavioral Statistics. 24. 323-355.