第 5 章 尺度参数
5.1 5.1 两独立样本的Siegel-Tukey方差检验
- 假设两总体的位置参数相等,实践中若中位数不等,做平移使之相等
- 取秩时,“首尾交替”
- 与Wilcoxon秩和统计量有关
x=read.table("data/salary.txt")
y=x[x[,2]==2,1];x=x[x[,2]==1,1];
x1=x-median(outer(x,y,"-")) ##平移
xy=cbind(c(x1,y),c(rep(1,length(x)),rep(2,length(y))))
xy1=xy[order(xy[,1]),]
z=xy[,1] ##平移后的数据
n=length(z)
a1=2:3;b=2:3;
for(i in seq(1,n,2)){b=b+4;a1=c(a1,b)}
a2=c(1,a1+2);z=NULL;
for(i in 1:n) z=c(z,(i-floor(i/2)))
b=1:2;
for(i in seq(1,(n+2-2),2)){
if(z[i]/2!=floor(z[i]/2)){
z[i:(i+1)]=b;
b=b+2
}
}
zz=cbind(c(0,0,z[1:(n-2)]),z[1:n])
if(n==1) R=1;
if(n==2) R=c(1,2);
if(n>2) R=c(a2[1:zz[n,1]],rev(a1[1:zz[n,2]]))
xy2=cbind(xy1,R);
Wx=sum(xy2[xy2[,2]==1,3]);
Wy=sum(xy2[xy2[,2]==2,3]);
nx=length(x);ny=length(y);
Wxy=Wy-0.5*ny*(ny+1);
Wyx=Wx-0.5*nx*(nx+1)
(pvalue=pwilcox(Wyx,nx,ny))## [1] 0.02428558
| sample | region | rank |
|---|---|---|
| 9343 | 1 | 1 |
| 9783 | 1 | 4 |
| 9956 | 1 | 5 |
| 10258 | 1 | 8 |
| 10276 | 2 | 9 |
| 10374 | 1 | 12 |
| 10533 | 2 | 13 |
| 10633 | 2 | 16 |
| 10827 | 1 | 17 |
| 10837 | 2 | 20 |
| 10940 | 1 | 21 |
| 11209 | 2 | 24 |
| 11393 | 2 | 25 |
| 11864 | 2 | 28 |
| 12032 | 1 | 29 |
| 12040 | 2 | 32 |
| 12398 | 1 | 31 |
| 12552 | 1 | 30 |
| 12642 | 2 | 27 |
| 12675 | 2 | 26 |
| 12749 | 1 | 23 |
| 13199 | 2 | 22 |
| 13683 | 2 | 19 |
| 14049 | 2 | 18 |
| 14060 | 1 | 15 |
| 14061 | 2 | 14 |
| 15951 | 1 | 11 |
| 16079 | 1 | 10 |
| 16079 | 2 | 7 |
| 16441 | 1 | 6 |
| 17498 | 1 | 3 |
| 19723 | 1 | 2 |
5.2 5.2 两样本尺度参数的Mood检验
- 假设两总体的位置参数相等,实践中若中位数不等,做平移使之相等
- 取秩时,1234…按顺序取秩
x=read.table("data/salary.txt")
y=x[x[,2]==2,1];
x=x[x[,2]==1,1];
m=length(x);n=length(y)
x1=x-median(outer(x,y,"-"))
xy=cbind(c(x1,y),c(rep(1,length(x)),rep(2,length(y))))
N=nrow(xy);xy1=cbind(xy[order(xy[,1]),],1:N)| sample | region | rank |
|---|---|---|
| 9343 | 1 | 1 |
| 9783 | 1 | 2 |
| 9956 | 1 | 3 |
| 10258 | 1 | 4 |
| 10276 | 2 | 5 |
| 10374 | 1 | 6 |
| 10533 | 2 | 7 |
| 10633 | 2 | 8 |
| 10827 | 1 | 9 |
| 10837 | 2 | 10 |
| 10940 | 1 | 11 |
| 11209 | 2 | 12 |
| 11393 | 2 | 13 |
| 11864 | 2 | 14 |
| 12032 | 1 | 15 |
| 12040 | 2 | 16 |
| 12398 | 1 | 17 |
| 12552 | 1 | 18 |
| 12642 | 2 | 19 |
| 12675 | 2 | 20 |
| 12749 | 1 | 21 |
| 13199 | 2 | 22 |
| 13683 | 2 | 23 |
| 14049 | 2 | 24 |
| 14060 | 1 | 25 |
| 14061 | 2 | 26 |
| 15951 | 1 | 27 |
| 16079 | 1 | 28 |
| 16079 | 2 | 29 |
| 16441 | 1 | 30 |
| 17498 | 1 | 31 |
| 19723 | 1 | 32 |
R1=xy1[xy1[,2]==1,3];
M=sum((R1-(N+1)/2)^2)
E1=m*(N^2-1)/12;
s=sqrt(m*n*(N+1)*(N^2-4)/180)
(Z=(M-E1)/s);## [1] 2.330917
(pvalue=pnorm(Z,low=F)) #单边## [1] 0.009878857
2*min(pnorm(Z,low=F),pnorm(Z)) #双边## [1] 0.01975771
5.3 5.3 Ansari-Bradley检验
- 假设两总体的位置参数相等,实践中若中位数不等,做平移使之相等
- 用X或Y在混合样本中的秩到两个极端值中最近的一个的秩的距离来度量
x=read.table("data/salary.txt");
y=x[x[,2]==2,1]
x=x[x[,2]==1,1];
x1=x-median(outer(x,y,"-"))
ansari.test(x1,y,alt="greater")## Warning in ansari.test.default(x1, y, alt = "greater"): cannot compute exact p-
## value with ties
##
## Ansari-Bradley test
##
## data: x1 and y
## AB = 118.5, p-value = 0.02458
## alternative hypothesis: true ratio of scales is greater than 1
5.4 5.4 Fligner-Killeen检验
5.4.1 两样本尺度的精确检验
- 计算|Xij-M|,后求秩
- 考虑Wilcoxon统计量
x=read.table("data/salary.txt")
y=x[x[,2]==2,1];x=x[x[,2]==1,1];
m=length(x);n=length(y)
y1=y+median(outer(x,y,"-"));
M=median(c(x,y1))
xy=cbind(c(abs(x-M),abs(y1-M)),c(rep(1,length(x)),rep(2,length(y))))
N=nrow(xy);
xy=xy[order(xy[,1]),]
xy1=cbind(xy,rank(xy[,1],ties.method="average"))
Wx=sum(xy1[xy1[,2]==1,3]);
Wyx=Wx-0.5*m*(m+1);
Wxy=m*n-Wyx;
Wy=sum(xy1[xy1[,2]==2,3]);
(pvalue=pwilcox(Wxy,m,n))## [1] 0.03474059
##课本原结果有误| |Xij-M| | region | rank |
|---|---|---|
| 179 | 1 | 1.5 |
| 179 | 2 | 1.5 |
| 187 | 1 | 3.0 |
| 333 | 1 | 4.0 |
| 355 | 2 | 5.0 |
| 423 | 2 | 6.0 |
| 456 | 2 | 7.0 |
| 530 | 1 | 8.0 |
| 826 | 2 | 9.0 |
| 980 | 2 | 10.0 |
| 1010 | 2 | 11.0 |
| 1279 | 1 | 12.0 |
| 1382 | 2 | 13.0 |
| 1392 | 1 | 14.0 |
| 1464 | 2 | 15.0 |
| 1586 | 2 | 16.0 |
| 1686 | 2 | 17.0 |
| 1830 | 2 | 18.0 |
| 1841 | 1 | 19.0 |
| 1842 | 2 | 20.0 |
| 1845 | 1 | 21.0 |
| 1943 | 2 | 22.0 |
| 1961 | 1 | 23.0 |
| 2263 | 1 | 24.0 |
| 2436 | 1 | 25.0 |
| 2876 | 1 | 26.0 |
| 3732 | 1 | 27.0 |
| 3860 | 1 | 28.5 |
| 3860 | 2 | 28.5 |
| 4222 | 1 | 30.0 |
| 5279 | 1 | 31.0 |
| 7504 | 1 | 32.0 |
5.5 5.5 两样本尺度的平方秩检验
- 绝对离差的秩的平方和
x=read.table("data/salary.txt")
y=x[x[,2]==2,1];x=x[x[,2]==1,1];
m=length(x);n=length(y)
x1=abs(x-mean(x));y1=abs(y-mean(y));
xy1=c(x1,y1);xy0=c(x,y)
xyi=c(rep(1,m),rep(2,n));
xy=cbind(xy1,xy0,xyi)
xy2=cbind(xy[order(xy[,1]),],1:(m+n),(1:(m+n))^2)
T1=sum(xy2[xy2[,3]==1,5]);
T2=sum(xy2[xy2[,3]==2,5])
R=xy2[,5];meanR=mean(R);
S=sqrt(m*n*(sum(R^2)-(m+n)*meanR^2)/(m+n)/(m+n-1))
Zx=(T1-m*meanR)/S;Zy=(T2-n*meanR)/S;
(pvalue=min(pnorm(Zx),pnorm(Zy)))## [1] 0.006255918
| 绝对离差 | 原样本 | 地区 | 离差秩 | 秩的平方 |
|---|---|---|---|---|
| 248.8824 | 10270 | 1 | 1 | 1 |
| 297.1333 | 12642 | 2 | 2 | 4 |
| 304.8667 | 12040 | 2 | 3 | 9 |
| 330.1333 | 12675 | 2 | 4 | 16 |
| 445.8824 | 10073 | 1 | 5 | 25 |
| 480.8667 | 11864 | 2 | 6 | 36 |
| 599.8824 | 9919 | 1 | 7 | 49 |
| 854.1333 | 13199 | 2 | 8 | 64 |
| 951.8667 | 11393 | 2 | 9 | 81 |
| 965.8824 | 9553 | 1 | 10 | 100 |
| 1062.1176 | 11581 | 1 | 11 | 121 |
| 1135.8667 | 11209 | 2 | 12 | 144 |
| 1338.1333 | 13683 | 2 | 13 | 169 |
| 1507.8667 | 10837 | 2 | 14 | 196 |
| 1704.1333 | 14049 | 2 | 15 | 225 |
| 1711.8667 | 10633 | 2 | 16 | 256 |
| 1716.1333 | 14061 | 2 | 17 | 289 |
| 1811.8667 | 10533 | 2 | 18 | 324 |
| 2057.8824 | 8461 | 1 | 19 | 361 |
| 2068.8667 | 10276 | 2 | 20 | 400 |
| 2170.8824 | 8348 | 1 | 21 | 441 |
| 2623.8824 | 7895 | 1 | 22 | 484 |
| 2739.8824 | 7779 | 1 | 23 | 529 |
| 2953.1176 | 13472 | 1 | 24 | 576 |
| 3041.8824 | 7477 | 1 | 25 | 625 |
| 3081.1176 | 13600 | 1 | 26 | 676 |
| 3214.8824 | 7304 | 1 | 27 | 729 |
| 3443.1176 | 13962 | 1 | 28 | 784 |
| 3654.8824 | 6864 | 1 | 29 | 841 |
| 3734.1333 | 16079 | 2 | 30 | 900 |
| 4500.1176 | 15019 | 1 | 31 | 961 |
| 6725.1176 | 17244 | 1 | 32 | 1024 |
5.6 5.6 多样本尺度的平方秩检验
d=read.table("data/wtloss.txt");
N=nrow(d);
k=max(d[,2])
d2=NULL;
for (i in 1:k) d2=rbind(d2,cbind(abs(d[d[,2]==i,1]-mean(d[d[,2]==i,1])),d[d[,2]==i,1],i))
d3=cbind(d2[order(d2[,1]),],1:N,(1:N)^2)
Ti=NULL;
for(i in 1:k) Ti=c(Ti,sum(d3[d3[,3]==i,5]))
ni=NULL;
for(i in 1:k) ni=c(ni,nrow(d3[d3[,3]==i,]))
T=(N-1)*(sum(Ti^2/ni)-sum(Ti)^2/N)/(sum(d3[,5]^2)-sum(Ti)^2/N)
(pvalue=pchisq(T,k-1,low=F))## [1] 0.07693023
| 绝对离差 | 原样本 | 生活方式 | 秩 | 秩的平方 |
|---|---|---|---|---|
| 0.300 | 5.7 | 2 | 1 | 1 |
| 0.300 | 3.7 | 1 | 2 | 4 |
| 0.300 | 3.7 | 1 | 3 | 9 |
| 0.325 | 7.1 | 3 | 4 | 16 |
| 0.400 | 3.0 | 1 | 5 | 25 |
| 0.500 | 3.9 | 1 | 6 | 36 |
| 0.500 | 6.5 | 2 | 7 | 49 |
| 0.700 | 2.7 | 1 | 8 | 64 |
| 0.700 | 5.3 | 2 | 9 | 81 |
| 0.800 | 5.2 | 2 | 10 | 100 |
| 1.275 | 8.7 | 3 | 11 | 121 |
| 1.300 | 7.3 | 2 | 12 | 144 |
| 1.575 | 9.0 | 3 | 13 | 169 |
| 2.525 | 4.9 | 3 | 14 | 196 |