第 5 章 尺度参数

5.1 5.1 两独立样本的Siegel-Tukey方差检验

  • 假设两总体的位置参数相等,实践中若中位数不等,做平移使之相等
  • 取秩时,“首尾交替”
  • 与Wilcoxon秩和统计量有关
x=read.table("data/salary.txt")
y=x[x[,2]==2,1];x=x[x[,2]==1,1];
x1=x-median(outer(x,y,"-")) ##平移
xy=cbind(c(x1,y),c(rep(1,length(x)),rep(2,length(y))))
xy1=xy[order(xy[,1]),] 
z=xy[,1] ##平移后的数据
n=length(z)
a1=2:3;b=2:3;
for(i in seq(1,n,2)){b=b+4;a1=c(a1,b)}
a2=c(1,a1+2);z=NULL;
for(i in 1:n) z=c(z,(i-floor(i/2)))
b=1:2;
for(i in seq(1,(n+2-2),2)){
  if(z[i]/2!=floor(z[i]/2)){
    z[i:(i+1)]=b;
    b=b+2
  }
}
zz=cbind(c(0,0,z[1:(n-2)]),z[1:n])
if(n==1) R=1;
if(n==2) R=c(1,2);
if(n>2) R=c(a2[1:zz[n,1]],rev(a1[1:zz[n,2]]))
xy2=cbind(xy1,R);
Wx=sum(xy2[xy2[,2]==1,3]);
Wy=sum(xy2[xy2[,2]==2,3]);
nx=length(x);ny=length(y);
Wxy=Wy-0.5*ny*(ny+1);
Wyx=Wx-0.5*nx*(nx+1)
(pvalue=pwilcox(Wyx,nx,ny))
## [1] 0.02428558
sample region rank
9343 1 1
9783 1 4
9956 1 5
10258 1 8
10276 2 9
10374 1 12
10533 2 13
10633 2 16
10827 1 17
10837 2 20
10940 1 21
11209 2 24
11393 2 25
11864 2 28
12032 1 29
12040 2 32
12398 1 31
12552 1 30
12642 2 27
12675 2 26
12749 1 23
13199 2 22
13683 2 19
14049 2 18
14060 1 15
14061 2 14
15951 1 11
16079 1 10
16079 2 7
16441 1 6
17498 1 3
19723 1 2

5.2 5.2 两样本尺度参数的Mood检验

  • 假设两总体的位置参数相等,实践中若中位数不等,做平移使之相等
  • 取秩时,1234…按顺序取秩
x=read.table("data/salary.txt")
y=x[x[,2]==2,1];
x=x[x[,2]==1,1];
m=length(x);n=length(y)
x1=x-median(outer(x,y,"-"))
xy=cbind(c(x1,y),c(rep(1,length(x)),rep(2,length(y))))
N=nrow(xy);xy1=cbind(xy[order(xy[,1]),],1:N)
sample region rank
9343 1 1
9783 1 2
9956 1 3
10258 1 4
10276 2 5
10374 1 6
10533 2 7
10633 2 8
10827 1 9
10837 2 10
10940 1 11
11209 2 12
11393 2 13
11864 2 14
12032 1 15
12040 2 16
12398 1 17
12552 1 18
12642 2 19
12675 2 20
12749 1 21
13199 2 22
13683 2 23
14049 2 24
14060 1 25
14061 2 26
15951 1 27
16079 1 28
16079 2 29
16441 1 30
17498 1 31
19723 1 32
R1=xy1[xy1[,2]==1,3];
M=sum((R1-(N+1)/2)^2)
E1=m*(N^2-1)/12;
s=sqrt(m*n*(N+1)*(N^2-4)/180)
(Z=(M-E1)/s);
## [1] 2.330917
(pvalue=pnorm(Z,low=F)) #单边
## [1] 0.009878857
2*min(pnorm(Z,low=F),pnorm(Z)) #双边
## [1] 0.01975771

5.3 5.3 Ansari-Bradley检验

  • 假设两总体的位置参数相等,实践中若中位数不等,做平移使之相等
  • 用X或Y在混合样本中的秩到两个极端值中最近的一个的秩的距离来度量
x=read.table("data/salary.txt");
y=x[x[,2]==2,1]
x=x[x[,2]==1,1];
x1=x-median(outer(x,y,"-"))
ansari.test(x1,y,alt="greater")
## Warning in ansari.test.default(x1, y, alt = "greater"): cannot compute exact p-
## value with ties
## 
##  Ansari-Bradley test
## 
## data:  x1 and y
## AB = 118.5, p-value = 0.02458
## alternative hypothesis: true ratio of scales is greater than 1

5.4 5.4 Fligner-Killeen检验

5.4.1 两样本尺度的精确检验

  • 计算|Xij-M|,后求秩
  • 考虑Wilcoxon统计量
x=read.table("data/salary.txt")
y=x[x[,2]==2,1];x=x[x[,2]==1,1];
m=length(x);n=length(y)
y1=y+median(outer(x,y,"-"));
M=median(c(x,y1))
xy=cbind(c(abs(x-M),abs(y1-M)),c(rep(1,length(x)),rep(2,length(y))))
N=nrow(xy);
xy=xy[order(xy[,1]),]
xy1=cbind(xy,rank(xy[,1],ties.method="average"))
Wx=sum(xy1[xy1[,2]==1,3]);
Wyx=Wx-0.5*m*(m+1);
Wxy=m*n-Wyx;
Wy=sum(xy1[xy1[,2]==2,3]);
(pvalue=pwilcox(Wxy,m,n))
## [1] 0.03474059
##课本原结果有误
|Xij-M| region rank
179 1 1.5
179 2 1.5
187 1 3.0
333 1 4.0
355 2 5.0
423 2 6.0
456 2 7.0
530 1 8.0
826 2 9.0
980 2 10.0
1010 2 11.0
1279 1 12.0
1382 2 13.0
1392 1 14.0
1464 2 15.0
1586 2 16.0
1686 2 17.0
1830 2 18.0
1841 1 19.0
1842 2 20.0
1845 1 21.0
1943 2 22.0
1961 1 23.0
2263 1 24.0
2436 1 25.0
2876 1 26.0
3732 1 27.0
3860 1 28.5
3860 2 28.5
4222 1 30.0
5279 1 31.0
7504 1 32.0

5.4.2 多样本尺度的大样本近似

x=read.table("data/salary.txt");
fligner.test(x[,1],x[,2])
## 
##  Fligner-Killeen test of homogeneity of variances
## 
## data:  x[, 1] and x[, 2]
## Fligner-Killeen:med chi-squared = 4.4713, df = 1, p-value = 0.03447

5.5 5.5 两样本尺度的平方秩检验

  • 绝对离差的秩的平方和
x=read.table("data/salary.txt")
y=x[x[,2]==2,1];x=x[x[,2]==1,1];
m=length(x);n=length(y)
x1=abs(x-mean(x));y1=abs(y-mean(y));
xy1=c(x1,y1);xy0=c(x,y)
xyi=c(rep(1,m),rep(2,n));
xy=cbind(xy1,xy0,xyi)
xy2=cbind(xy[order(xy[,1]),],1:(m+n),(1:(m+n))^2)
T1=sum(xy2[xy2[,3]==1,5]);
T2=sum(xy2[xy2[,3]==2,5])
R=xy2[,5];meanR=mean(R);
S=sqrt(m*n*(sum(R^2)-(m+n)*meanR^2)/(m+n)/(m+n-1))
Zx=(T1-m*meanR)/S;Zy=(T2-n*meanR)/S;
(pvalue=min(pnorm(Zx),pnorm(Zy)))
## [1] 0.006255918
绝对离差 原样本 地区 离差秩 秩的平方
248.8824 10270 1 1 1
297.1333 12642 2 2 4
304.8667 12040 2 3 9
330.1333 12675 2 4 16
445.8824 10073 1 5 25
480.8667 11864 2 6 36
599.8824 9919 1 7 49
854.1333 13199 2 8 64
951.8667 11393 2 9 81
965.8824 9553 1 10 100
1062.1176 11581 1 11 121
1135.8667 11209 2 12 144
1338.1333 13683 2 13 169
1507.8667 10837 2 14 196
1704.1333 14049 2 15 225
1711.8667 10633 2 16 256
1716.1333 14061 2 17 289
1811.8667 10533 2 18 324
2057.8824 8461 1 19 361
2068.8667 10276 2 20 400
2170.8824 8348 1 21 441
2623.8824 7895 1 22 484
2739.8824 7779 1 23 529
2953.1176 13472 1 24 576
3041.8824 7477 1 25 625
3081.1176 13600 1 26 676
3214.8824 7304 1 27 729
3443.1176 13962 1 28 784
3654.8824 6864 1 29 841
3734.1333 16079 2 30 900
4500.1176 15019 1 31 961
6725.1176 17244 1 32 1024

5.6 5.6 多样本尺度的平方秩检验

d=read.table("data/wtloss.txt");
N=nrow(d);
k=max(d[,2])
d2=NULL;
for (i in 1:k) d2=rbind(d2,cbind(abs(d[d[,2]==i,1]-mean(d[d[,2]==i,1])),d[d[,2]==i,1],i))
d3=cbind(d2[order(d2[,1]),],1:N,(1:N)^2)
Ti=NULL;
for(i in 1:k) Ti=c(Ti,sum(d3[d3[,3]==i,5]))
ni=NULL;
for(i in 1:k) ni=c(ni,nrow(d3[d3[,3]==i,]))
T=(N-1)*(sum(Ti^2/ni)-sum(Ti)^2/N)/(sum(d3[,5]^2)-sum(Ti)^2/N)
(pvalue=pchisq(T,k-1,low=F))
## [1] 0.07693023
绝对离差 原样本 生活方式 秩的平方
0.300 5.7 2 1 1
0.300 3.7 1 2 4
0.300 3.7 1 3 9
0.325 7.1 3 4 16
0.400 3.0 1 5 25
0.500 3.9 1 6 36
0.500 6.5 2 7 49
0.700 2.7 1 8 64
0.700 5.3 2 9 81
0.800 5.2 2 10 100
1.275 8.7 3 11 121
1.300 7.3 2 12 144
1.575 9.0 3 13 169
2.525 4.9 3 14 196