样本量计算修订稿
样本量计算
样本量计算1.估计样本量的决定因素1.1 资料性质计量资料如果设计均衡,误差控制得好,样本可以小于30例; 计数资料即使误差控制严格,设计均衡, 样本需要大一些,需要30-100例。
1.2 研究事件的发生率研究事件预期结局出现的结局(疾病或死亡),疾病发生率越高,所需的样本量越小,反之就要越大。
1.3 研究因素的有效率有效率越高,即实验组和对照组比较数值差异越大,样本量就可以越小,小样本就可以达到统计学的显著性,反之就要越大。
1.4 显著性水平即假设检验第一类(α)错误出现的概率。
为假阳性错误出现的概率。
α越小,所需的样本量越大,反之就要越小。
α水平由研究者具情决定,通常α取0.05或0.01。
1.5 检验效能检验效能又称把握度,为1-β,即假设检验第二类错误出现的概率,为假阴性错误出现的概率。
即在特定的α水准下,若总体参数之间确实存在着差别,此时该次实验能发现此差别的概率。
检验效能即避免假阴性的能力,β越小,检验效能越高,所需的样本量越大,反之就要越小。
β水平由研究者具情决定,通常取β为0.2,0.1或0.05。
即1-β=0.8,0.1或0.95,也就是说把握度为80%,90%或95%。
1.6 容许的误差(δ)如果调查均数时,则先确定样本的均数( )和总体均数(m)之间最大的误差为多少。
容许误差越小,需要样本量越大。
一般取总体均数(1-α)可信限的一半。
1.7 总体标准差(s)一般因未知而用样本标准差s代替。
1.8 双侧检验与单侧检验采用统计学检验时,当研究结果高于和低于效应指标的界限均有意义时,应该选择双侧检验,所需样本量就大; 当研究结果仅高于或低于效应指标的界限有意义时,应该选择单侧检验,所需样本量就小。
当进行双侧检验或单侧检验时,其α或β的Ua 界值通过查标准正态分布的分位数表即可得到。
2.样本量的估算由于对变量或资料采用的检验方法不同,具体设计方案的样本量计算方法各异,只有通过查阅资料,借鉴他人的经验或进行预实验确定估计样本量决定因素的参数,便可进行估算。
临床试验样本量的估算[精品文档]
临床试验样本量的估算样本量的估计涉及诸多参数的确定,最难得到的就是预期的或者已知的效应大小(计数资料的率差、计量资料的均数差值),方差(计量资料)或合并的率(计数资料各组的合并率),一般需通过预试验或者查阅历史资料和文献获得,不过很多时候很难得到或者可靠性较差。
因此样本量估计有些时候不是想做就能做的。
SFDA的规定主要是从安全性的角度出发,保证能发现多少的不良反应率;统计的计算主要是从power出发,保证有多少把握能做出显著来。
但是中国的国情?有多少厂家愿意多做?建议方案里这么写:从安全性角度出发,按照SFDA××规定,完成100对有效病例,再考虑到脱落原因,再扩大20%,即120对,240例。
或者:本研究为随机双盲、安慰剂平行对照试验,只有显示试验药优于安慰剂时才可认为试验药有效,根据预试验结果,试验组和对照组的有效率分别为65.0%和42.9%,则每个治疗组中能接受评价的病人样本数必须达到114例(总共228例),这样才能在单侧显著性水平为5%、检验功效为90%的情况下证明试验组疗效优于对照组。
假设因调整意向性治疗人群而丢失病例达10%,则需要纳入病人的总样本例数为250例。
非劣性试验(α=0.05,β=0.2)时:计数资料:平均有效率(P)等效标准(δ)N=公式:N=12.365×P(1-P)/δ2计量资料:共同标准差(S)等效标准(δ)N=公式:N=12.365× (S/δ)2等效性试验(α=0.05,β=0.2)时:计数资料:平均有效率(P)等效标准(δ)N=公式:N=17.127×P(1-P)/δ2计量资料:共同标准差(S)等效标准(δ)N=公式:N=17.127× (S/δ)2上述公式的说明:1) 该公式源于郑青山教授发表的文献。
2) N 是每组的估算例数N1=N2,N1 和N2 分别为试验药和参比药的例数;3) P 是平均有效率,4) S 是估计的共同标准差,5) δ 是等效标准。
样本量计算
样本量计算文档编制序号:[KK8UY-LL9IO69-TTO6M3-MTOL89-FTT688]1.估计样本量的决定因素资料性质计量资料如果设计均衡,误差控制得好,样本可以小于30例; 计数资料即使误差控制严格,设计均衡, 样本需要大一些,需要30-100例。
研究事件的发生率研究事件预期结局出现的结局(疾病或死亡),疾病发生率越高,所需的样本量越小,反之就要越大。
研究因素的有效率有效率越高,即实验组和对照组比较数值差异越大,样本量就可以越小,小样本就可以达到统计学的显着性,反之就要越大。
显着性水平即假设检验第一类(α)错误出现的概率。
为假阳性错误出现的概率。
α越小,所需的样本量越大,反之就要越小。
α水平由研究者具情决定,通常α取或。
检验效能检验效能又称把握度,为1-β,即假设检验第二类错误出现的概率,为假阴性错误出现的概率。
即在特定的α水准下,若总体参数之间确实存在着差别,此时该次实验能发现此差别的概率。
检验效能即避免假阴性的能力,β越小,检验效能越高,所需的样本量越大,反之就要越小。
β水平由研究者具情决定,通常取β为,或。
即1-β=,或,也就是说把握度为80%,90%或95%。
容许的误差(δ)如果调查均数时,则先确定样本的均数( )和总体均数(m)之间最大的误差为多少。
容许误差越小,需要样本量越大。
一般取总体均数(1-α)可信限的一半。
总体标准差(s)一般因未知而用样本标准差s代替。
双侧检验与单侧检验采用统计学检验时,当研究结果高于和低于效应指标的界限均有意义时,应该选择双侧检验,所需样本量就大; 当研究结果仅高于或低于效应指标的界限有意义时,应该选择单侧检验,所需样本量就小。
当进行双侧检验或单侧检验时,其α或β的Ua界值通过查标准正态分布的分位数表即可得到。
2.样本量的估算由于对变量或资料采用的检验方法不同,具体设计方案的样本量计算方法各异,只有通过查阅资料,借鉴他人的经验或进行预实验确定估计样本量决定因素的参数,便可进行估算。
样本量计算
样本量计算样本量计算调查研究中样本量的确定在社会科学研究中,研究者常常会遇到这样得问题:“要掌握总体(population)情况,到底需要多少样本量(sample)?”,或者说“我要求调查精度达到95%,需要多少样本量?”。
对此,我往往感到难以回答,因为要解决这个问题,需要考虑的因素是多方面的:研究的对象,研究的主要目的,抽样方法,调查经费…。
本文将根据自己的经验,探讨在调查研究中确定调查所需样本量的一些基本方法,相信这些方法对于其他的社会调查研究也有一定的借鉴意义。
确定样本量的基本公式在简单随机抽样的条件下,我们在统计教材中可以很容易找到确定调查样本量的公式:Z2 S2n = ------------ (1)d2其中:n代表所需要样本量Z:置信水平的Z统计量,如95%置信水平的Z统计量为1.96,99%的Z为2.68。
S:总体的标准差;d :置信区间的1/2,在实际应用中就是容许误差,或者调查误差。
对于比例型变量,确定样本量的公式为:Z2 ( p ( 1-p))n = ----------------- (2)d2其中:n :所需样本量z:置信水平的z统计量,如95%置信水平的Z统计量为1.96,99%的为2.68p:目标总体的比例期望值d:置信区间的半宽关于调查精度通常我们所说的调查精度可能有两种表述方法:绝对误差数与相对误差数。
如对某市的居民进行收入调查,要求调查的人均收入误差上下不超过50元,这是绝对数表示法,这个绝对误差也就是公式(1)中置信区间半宽d。
而相对误差则是绝对误差与样本平均值的比值。
例如我们可能要求调查收入与真实情况的误差不超过1%。
假定调查城市的真实人均收入为10000元,则相对误差的绝对数是100元。
公式的应用方法对于公式的应用,一些参数是我们可以事先确定的:Z值取决于置信水平,通常我们可以考虑95%的置信水平,那么Z=1.96;或者99%,Z=2.68。
然后可以确定容许误差d(或者说精度),即我们可以根据实际情况指定置信区间的半宽度d。
科学研究中的样本量计算原则
科学研究中的样本量计算原则在科学研究中,样本量的计算是一个非常重要的步骤。
正确地确定样本量可以帮助研究者获得准确可靠的结果,从而提高研究的科学性和可信度。
本文将介绍科学研究中的样本量计算原则,并论述其重要性和应用方法。
一、样本量计算的重要性样本量的大小直接影响到研究的结果和结论的有效性。
如果样本量过小,研究结果可能不具有代表性,从而无法对总体进行准确的推断。
而样本量过大,则可能浪费资源和时间。
因此,正确地计算样本量对于科学研究的可靠性至关重要。
二、样本量计算的原则1. 效应大小:样本量计算中首先要考虑的是效应大小。
效应大小指的是研究中所关注的变量之间的实际差异大小。
一般来说,效应大小越小,所需的样本量就越大。
因此,研究者需要根据实际情况,合理估计效应大小。
2. 显著性水平:显著性水平是研究中用来判断差异是否统计显著的标准。
常见的显著性水平有0.05和0.01两种,分别表示5%和1%的显著性水平。
较低的显著性水平要求较大的样本量,因为在较低的显著性水平下,需要更强的证据来支持差异的存在。
3. 功效:功效是指研究中能够发现效应存在的概率。
一般来说,研究者希望功效越高越好,即研究结果能够更容易地找到真实的差异。
通常,研究中常用的功效水平为0.8,代表80%的概率能够找到真实的差异。
根据研究者对功效的要求,可以确定所需的样本量。
三、样本量计算的方法样本量的计算可以使用统计软件进行,也可以通过公式进行估算。
常见的样本量计算方法包括:1. 参数估计法:参数估计法是根据总体的参数估计来计算样本量。
通过对总体参数的估计和所需的置信水平,可以计算出所需的样本量。
这种方法适用于已有的研究结果较多,并且总体参数有可靠的估计的情况。
2. 效应大小法:效应大小法是根据所关注的变量之间的效应大小来计算样本量。
通过给定效应大小和显著性水平,可以通过公式计算所需的样本量。
这种方法适用于研究背景较为复杂,缺乏可靠的参数估计的情况。
临床试验样本量的估算
临床试验样本量的估算样本量的估计涉及诸多参数的确定,最难得到的就是预期的或者已知的效应大小(计数资料的率差、计量资料的均数差值),方差(计量资料)或合并的率(计数资料各组的合并率),一般需通过预试验或者查阅历史资料和文献获得,不过很多时候很难得到或者可靠性较差。
因此样本量估计有些时候不是想做就能做的。
SFDA的规定主要是从安全性的角度出发,保证能发现多少的不良反应率;统计的计算主要是从power出发,保证有多少把握能做出显著来。
但是中国的国情?有多少厂家愿意多做?建议方案里这么写:从安全性角度出发,按照SFDA××规定,完成100对有效病例,再考虑到脱落原因,再扩大20%,即120对,240例。
或者:本研究为随机双盲、安慰剂平行对照试验,只有显示试验药优于安慰剂时才可认为试验药有效,根据预试验结果,试验组和对照组的有效率分别为65.0%和42.9%,则每个治疗组中能接受评价的病人样本数必须达到114例(总共228例),这样才能在单侧显著性水平为5%、检验功效为90%的情况下证明试验组疗效优于对照组。
假设因调整意向性治疗人群而丢失病例达10%,则需要纳入病人的总样本例数为250例。
非劣性试验(α=0.05,β=0.2)时:计数资料:平均有效率(P)等效标准(δ)N=公式:N=12.365×P(1-P)/δ2计量资料:共同标准差(S)等效标准(δ)N=公式:N=12.365× (S/δ)2等效性试验(α=0.05,β=0.2)时:计数资料:平均有效率(P)等效标准(δ)N=公式:N=17.127×P(1-P)/δ2计量资料:共同标准差(S)等效标准(δ)N=公式:N=17.127× (S/δ)2上述公式的说明:1) 该公式源于郑青山教授发表的文献。
2) N 是每组的估算例数N1=N2,N1 和N2 分别为试验药和参比药的例数;3) P 是平均有效率,4) S 是估计的共同标准差,5) δ是等效标准。
调研中的样本量计算与确定
调研中的样本量计算与确定在进行调研项目时,确定合适的样本量是非常重要的,因为它直接决定了研究结果的可靠性和推广性。
样本量的大小需要根据研究的目的、研究设计和统计方法进行合理的计算和确定。
本文将介绍调研中样本量计算的方法和注意事项。
我们需要明确研究的目的和研究问题。
样本量的大小应该基于我们研究的主要目标,比如我们想要了解的总体特征、总体差异是否存在或总体参数估计的精确度。
不同的目标对样本量的要求也不同,有些目标可能需要大样本量以保证结果的可靠性,而有些目标可能只需要小样本量就可以达到。
同时,我们需要考虑研究的设计和统计方法。
不同的研究设计和统计方法对样本量的要求也不同。
比如对于调查问卷研究,我们可以使用公式计算样本量,公式如下:n = Z² × p × (1-p) / d²其中,n代表样本量,Z代表置信水平对应的Z值,p代表总体比例估计值,d代表预期误差值。
我们可以根据研究的目标和需求确定置信水平、总体比例估计值和预期误差值,然后计算得到样本量。
对于实验研究,我们可以使用方差分析(ANOVA)来确定样本量。
根据研究的设计和效应大小,我们可以使用统计软件进行模拟试验,来确定满足我们研究目的的样本量。
在确定样本量时,还需要考虑研究的时间限制、预算和实施的可行性。
如果研究时间有限或者预算有限,我们可能需要适当降低样本量,但要保证样本量仍能满足研究目的。
在确定样本量时,我们还需要考虑样本的代表性。
样本应该代表总体的特征,以便在得出结论时具有更好的推广性。
因此,在选择样本时,我们需要尽量避免选择偏差,例如简单随机抽样或分层抽样等方法可以帮助我们获得更具代表性的样本。
确定样本量后,我们还应该进行敏感性分析。
敏感性分析可以帮助我们了解样本量的合理性和可靠性。
我们可以计算不同样本量下的统计指标,并观察其变化。
如果研究结果对样本量不敏感,即随着样本量的增加,统计指标的变化很小,那么我们可以进一步确认已确定的样本量是合理的。
样本量计算
样本量计算文档编制序号:[KK8UY-LL9IO69-TTO6M3-MTOL89-FTT688]1.估计样本量的决定因素资料性质计量资料如果设计均衡,误差控制得好,样本可以小于30例; 计数资料即使误差控制严格,设计均衡, 样本需要大一些,需要30-100例。
研究事件的发生率研究事件预期结局出现的结局(疾病或死亡),疾病发生率越高,所需的样本量越小,反之就要越大。
研究因素的有效率有效率越高,即实验组和对照组比较数值差异越大,样本量就可以越小,小样本就可以达到统计学的显着性,反之就要越大。
显着性水平即假设检验第一类(α)错误出现的概率。
为假阳性错误出现的概率。
α越小,所需的样本量越大,反之就要越小。
α水平由研究者具情决定,通常α取或。
检验效能检验效能又称把握度,为1-β,即假设检验第二类错误出现的概率,为假阴性错误出现的概率。
即在特定的α水准下,若总体参数之间确实存在着差别,此时该次实验能发现此差别的概率。
检验效能即避免假阴性的能力,β越小,检验效能越高,所需的样本量越大,反之就要越小。
β水平由研究者具情决定,通常取β为,或。
即1-β=,或,也就是说把握度为80%,90%或95%。
容许的误差(δ)如果调查均数时,则先确定样本的均数( )和总体均数(m)之间最大的误差为多少。
容许误差越小,需要样本量越大。
一般取总体均数(1-α)可信限的一半。
总体标准差(s)一般因未知而用样本标准差s代替。
双侧检验与单侧检验采用统计学检验时,当研究结果高于和低于效应指标的界限均有意义时,应该选择双侧检验,所需样本量就大; 当研究结果仅高于或低于效应指标的界限有意义时,应该选择单侧检验,所需样本量就小。
当进行双侧检验或单侧检验时,其α或β的Ua界值通过查标准正态分布的分位数表即可得到。
2.样本量的估算由于对变量或资料采用的检验方法不同,具体设计方案的样本量计算方法各异,只有通过查阅资料,借鉴他人的经验或进行预实验确定估计样本量决定因素的参数,便可进行估算。
如何计算样本量
(分享)样本量的计算方法前面小弟已经发了一个帖子,就是关于我导师收藏的一个运动营养网站,这是一个个人网站,里面关于统计的部分非常详细和丰富!看到重视的人不是很多,很是遗憾,这个网站真的是不错!给张网站的截图,点击图中标识的statistics,就会发现很多关于统计的知识,非常详细!看到版中经常有战友问关于样本量大小如何确定的问题。
样本量的涉及要根据你的实验设计来确定的。
在这儿小弟就关于样本量如何计算的部分贴出来供大家参考。
请各位战友根据自己的实验设计,来选择合适的样本量计算的方法,呵呵不过所有都是英语的,估计大家英语水平,也难不到大家,呵呵WHAT DETERMINES SAMPLE SIZE?The traditional approach to estimation of sample size is based on statistical significance of your outcome measure. You have to specify the smallest effect you want to detect, the Type I and Type II error rates, and the design of the study.I present here new formulae for the resulting estimates of sample size. I also include new ways to adjust for validity and reliability, and I finish with sample sizes required for several complex cross-sectional designs.I also advocate a new approach to sample-size estimation based on width of the confidence interval of your outcome measure. In this new approach, your concern is with the precision of your estimate of the effect, not with the statistical significance of the effect. The formulae on these pages still apply, but you halve the sample sizes.--------------------------------------------------------------------------------The Smallest Effect Worth DetectingI've already spent a whole page on magnitudes of effects. You should go back and make sure you understand it before proceeding. Or take a risk and read on!Let's look at a simple example of the smallest effect worth detecting. Your research project includes the question of differences in height of adults in two regions. This sounds like a trivial project, but hey, the difference might be caused by a nutritional deficit, environmental toxin, level of physical activity, or whatever. OK, what difference in height would you consider to be the smallest difference worth noticing or commenting on? Almost everyone reading this paragraph will automaticallystart thinking either in inches or centimeters. So what's your choice? An inch, or 2.5 cm? Sounds like a nice round figure! Let's go with it for now.To use my approach to sample-size estimation, you convert this difference into a value for the effect-size statistic. To do that, you divide it by the standard deviation, expressed in the same units. The standard deviation here is just the usual measure of spread, except that we have two groups. So let's assume we have an average of the standard deviation in both groups. Let's say it is 2 inches, or 5 cm. So, if you want to detect 2.5 cm, and the standard deviation is 5.0 cm, the smallest effect worth detecting is 2.5/5.0, or 0.5.I'll talk about what I mean by detecting in a minute. First, more about the smallest effect. You'll discover shortly that the required number of subjects is quite sensitive to the magnitude of the smallest worthwhile effect. In fact, halving the magnitude quadruples the number of subjects required to detect it. So the way you decide on the smallest effect is important. How did we arrive at that minimum difference of 2.5 cm? In my experience, most researchers dream up a number that sounds plausible, just like we did here. Well, sorry, but you just can't do it like that. In fact, you don't have the freedom to choose the minimum effect. In all but a few special cases, it's the threshold for small effects on the scale of magnitudes: 0.2 for the Cohen effect-size statistic, 10% for a frequency difference, and 0.1 for a correlation. You need the same sample size to detect each of these effects, and as we'll see, it's 800 subjects for a simple cross-sectional study in theold-fashioned way of doing the figuring. It's even more than 800 when you factor in the validity of your variables. But don't panic. We'll also see that there are ways of reducing this number, sometimes drastically.--------------------------------------------------------------------------------Type I and II Error RatesNow, what do I mean by detecting? Simply that if the real difference between the two groups in the population is 2.5 cm (an effect size of 0.5), you want to be sure that it will turn up as statistically significant in the sample that you draw for your study. If it doesn't turn up as statistically significant, you have failed to detect something that you were interested in. Make sense? So our definition of statistical significance, and our idea of what it means to be sure that it will turn up, both impact on the required sample size.First, statistical significance. The difference is statistically significant, bydefinition, if the 95% confidence interval does not overlap zero, or if the p value for the effect is less than 0.05. Values of 95% or 0.05 are also equivalent to a Type I error rate of 5%: in other words, the rate of false alarms in the absence of any population effect will be 5%. We don't have any choice here. It has to be 5%, or less preferably, but most researchers opt for 5%. If you want a lower rate of false alarms, say 1%, you will need more subjects.Now, what about being sure that the effect will turn up? In other words, if the effect really is 2.5 cm in the populations, how sure do we want to be that the difference observed in our sample will be statistically significant? We don't have any choice here, either. We have to be at least 80% sure of detecting the smallest effect. To put it another way, the power of the study to detect the smallest effect has to be at least 80%. Or to put it yet one more way, the Type II error rate--the rate of failed alarms for the smallest effect--is set at 20% or less. That's one chance in five of missing the thing you're looking for!?! Sounds a bit high, but keep in mind that it is the rate for the smallest worthwhile effect. The chance of missing larger effects is smaller. Once again, if you want to make the error rate lower, say 10%, you will need more subjects.--------------------------------------------------------------------------------Research DesignWe're stuck with having to detect 0.2 for the effect-size statistic, 10% for a frequency difference, or 0.1 for a correlation. And we're stuck with false and failed alarms of 5% and 20%. All that's left now is how we're going to go about it: the research design. When it comes to sample sizes, there are only two sorts of research design: cross-sectional and longitudinal.Cross-sectional designs include correlational, case-control, and any other design with single observations for each subject. Some so-called prospective designs, where subjects are followed up over time, are cross-sectional if there is only one value for each variable for each subject. Cross-sectional studies need heaps of subjects, and the number is affected by the validity of the variables.Longitudinal designs include time series, experiments, controlled trials, crossovers, and anything else where the dependent variable is measured twice or more. The data have to be subjected to repeated-measures analysis. The usual thing with these designs is a measurement before and after you do something, to see if what you do has any effect. Whether or not you have a control group, it is always thecase that subjects "act as their own controls", because there are always pre and post measurements on the subjects. Longitudinal designs generally need far fewer subjects than cross-sectional designs, depending on the reliability of dependent variable.(缩略图,点击图片链接看原图)Sample Size for Cross-Sectional StudiesFor variables with perfect validity, you can now look up tables or run special software to see how many subjects you need. (G*power is a great little free program for the purpose.) Or use the following simple formula I have worked out:For Type I and II errors of 5% and 20%, the total number of subjects N is given by:N = 32/ES2, where ES is the smallest effect size worth detecting.Example: for ES = 0.2, the total N is 800, which means 400 in each group for a case-control study or a study comparing males and females. So for our study of differences in height, we'd need 400 in each group.What about if the outcome is a difference in the frequency of something in the two groups, for example the frequency of clinical obesity. The minimum worthwhile difference is 10% (e.g. 25% in one group and 35% in the other). You just think about that difference as being equivalent to an effect size of 0.2, and plug it into the formula: 400 in each group again.And finally what about sample size to detect a correlation, for example the correlation between physical activity and body fat? Same story: 800 subjects to detect the minimum worthwhile correlation of 0.1, because a correlation of 0.1 is equivalent to an effect size of 0.2. For larger correlations use the scale of magnitudes to convert the correlation to an equivalent effect size, then plug it into the formula.For the rare cases where you have the luxury of Type I and II errors of 1% and 10% respectively, the number is nearly double: N = 60/ES2.Validity of the variables can have a major impact on sample size in cross-sectional studies. The lower the validity, the more the "noise in the signal", so the more subjects you need to detect the signal. If the validity correlation of the dependent variable is v (Pearson, intraclass, or kappa), the number of subjects increases toN/v2.To detect a correlation between variables with validities v and w, the number is N/(v2w2). Sample sizes may therefore have to be doubled or quadrupled when effects are represented by psychometric or other variables that have modest (~0.7) validity.Sample Size for Longitudinal StudiesIn our first example on this page, we had a cross-sectional design in which we were interested in the difference in height between people in two regions. Now, in a longitudinal design, we might want to know whether a stretching exercise makes people taller. Can you see that the same concept of minimum effect size still holds here? If we thought one inch was the smallest difference worth detecting between groups, then it has to be the smallest difference we would like to see as a result of our stretching exercise. (It might need a medieval rack to make people a whole inch taller!)Once again we don't have a choice about that minimum effect: it's still an effect size of 0.2 standard deviations, and the standard deviation is still the usual standard deviation of the subjects. At the moment we have only one group of subjects, and the standard deviation before we put people on the rack is usually about the same as after the rack. So you can think about the minimum effect size as a fraction of either standard deviation. But note well: do not use the standard deviation of the before-after difference score.Reliability of the dependent variable is the final piece of the jigsaw. The higher the reliability, the more reproducible are the values for each subject when you retest them, which makes it more likely you will detect a change in their values. So the higher the reliability, the less subjects you need to detect the minimum effect. Read the earlier section on sample size for an experiment for an overview of the role of typical error in sample-size estimation, and for an important detail about the conditions in a reliability study aimed at estimating sample size.The rest of this section contains details of formulae that you may not need to worry about. You can use two forms of reliability in the formulae: retest correlation and within-subject variation.Using the Retest CorrelationFirst, a couple of cautions. The retest correlation is for retests with the sametime between the tests as you intend to have in your experiment. For example, if you are doing an intervention that lasts 2 months, you need a 2-month retest correlation. Don't use a 1-day retest correlation unless you have good grounds for believing that it will be the same as a 2-month retest correlation. Also, the spread between the subjects in your study has to be similar to the spread between the subjects in the reliability study. If the spread is different, the value of the retest correlation coefficient will be inappropriate. In that case you will need to calculate the appropriate value by combining the within and between (S) standard deviations for your subjects using this formula:retest correlation r = (S2-s2)/S2.Right, here's the strategy for working out the required sample size when you know the retest correlation:Work out the sample size of an equivalent cross-sectional study, N, as shown above. It's 800 in the traditional approach using statistical significance, or 400 using my new approach of adequate precision of estimation for trivial effects. Determine the reliability r of the outcome measure by consulting the literature or doing a separate study.For a simple design consisting of a single pre and post measurement on each subject, and no control group, the number of subjects is:n = (1 - r)N/2This formula applies also to simple crossover designs, in which subjects receive an experimental treatment and a control treatment. (One half get the experimental treatment first; the other half get the control treatment first.)If there is a control group, the total number of subjects required is:n = 2(1 - r)NYes, you need four times the number of subjects when there is a control group, not twice the number. Hard to accept, I know.To take into account the validity of the outcome measure, multiply the above formulae by 1/v2, where v is the concurrent validity correlation (the correlation between the observed value and the true value of the variable). The simplest estimate of the concurrent validity is the square root of the concurrent reliability correlation for the outcome measure, so you simply divide the above formulae by the concurrent reliability correlation. In general, the concurrent reliability will be greater than the retest reliabilityUsing the Within-Subject VariationYou can also think about the difference between the post and pre means in terms ofthe within-subject variation (standard deviation). For example, if the performance of an individual athlete varies by 1% (the within-subject standard deviation expressed as a coefficient of variation), how many athletes should you test to detect a 1% change in performance, or a 2% change, or a 0.5% change? Here is the formula:To detect a fraction of a within-subject standard deviation with 5% false alarms and 20% failed alarms:n = 64/f2 with a full control groupn = 16/f2 for crossovers or experiments without a control group.Another way to represent the same formulae is to replace f with d/s, where d is the smallest worthwhile post-pre difference you want to detect, and s is thewithin-subject standard deviation:n = 64s2/d2 with a full control groupn = 16s2/d2 for crossovers or experiments without a control group.Remember to halve these numbers when you justify sample size using the new approach based on acceptable precision of the outcome.Example: You want to detect (p=0.05, 80% power) a 2% change in performance when the coefficient of variation is 2%. The corresponding value of f is 1.0, which means you'd need to test 16 athletes in a crossover design, or 32 in each of a control and experimental group. Or it's 8 or 16+16, if you justify sample size using precision of estimation.What's the smallest value of f worth detecting? Is it 1.0? Not an easy question! To answer it, you usually have to bring in the between-subject variation one way or another. Why? Because you can't get away from the fact that the magnitude of a change in the value of a variable usually has to be thought about in terms of the variation in the values of that variable between subjects. That's what minimum worthwhile effect sizes are all about. For example, if the between-subject variation is 5%, the smallest difference worth detecting is 0.2*5% or 1%. So, if your within-subject variation of 2%, you have to chase an f of 0.5. But if the between-subject variation is 10%, the smallest worthwhile effect is 0.2*10% or 2%, so you chase an f of 1.0.Once you bring the between-subject variation back into the picture, you have all the ingredients for expressing the reliability as a retest correlation, so you can use the formulae with the retest correlation. For example, a within of 2% and a between of 5% implies a retest correlation of (52-22)/52 or (25-4)/25 or 0.84. A within of 2% and a between of 10% implies a correlation of (100-4)/100, or 0.96. Use these correlations in the formulae for sample size and you'll get the same answersas in the formulae using f. But if you have a reasonable notion of the smallest worthwhile change in a variable without explicitly knowing the between-subject standard deviation or the correlation, use the formula with d and s (or f).There is certainly one situation where it's better to use the within-subject variation: estimation of sample size in studies of athletic performance. When athletes are subjects and competitive performance is the outcome, the smallest worthwhile effect is an enhancement that increases the medal prospects of a top athlete, not the average athlete. For sports like track and field, this minimum effect is about 0.5 of the typical variation in a top athlete's performance between events. For example, if the typical variation between events is 1.0%, then you're interested in enhancements of about 0.5%. So if you use a lab test with the same typical error as the competitive event, f in the above formulae is simply 0.5, so you would need 64/0.52, or 256 subjects for a fully controlled study. That's bad enough, but if your lab test has a typical variation of 2.0%, f is 0.5/2.0, which means 1024 subjects! Oh no! Clearly you need very reliable lab tests if you want to detect the smallest effects that matter to top athletes. See this Sportscience article for more information:Hopkins WG, Hawley JA, Burke LM (1999). Researching worthwhile performance enhancements. Sportscience 3, /jour/9901/wghnews.htmlSample Size for Complex Cross-Sectional StudiesI'll deal with two groups of unequal size, more than two groups, and more than one independent variable. Anything else requires simulation.Two Groups of Unequal SizeUp to this point I have assumed equal numbers in each group, because that gives the most power to detect a difference between the groups. But sometimes unequal numbers are justified.The simplest case is where you have far more in one group than another. For example, you already have the heights for thousands of control subjects from all over the country, and you want to compare these with the heights of people from a particular region you are interested in. So, how many subjects do you need in that particular group? And the answer is... as few as one-quarter the usual number! But you will need to test, or have the data for, an "infinite" number of subjects in the other group for the number to be that low. How big is infinite? For the purposes ofstatistical power, about 5 times as many as in the special-interest group is close enough.I have a formula, but to understand how to apply it will need a lot of thought. If you have samples of size n1 and n 2, then your study will have the power equivalent to a study with a sample size of N equally divided between two groups, where:N = 4 n1 n2/( n1 + n2)For example, if you have data for 1000 controls (= n1), and 800 (= N) is the number you would normally require for equal-sized groups, then the above formula shows that you need to test only 250 cases (= n2). If you make n1 very large, the formula simplifies to N = 4 n2, or n2 = N/4, which is one-quarter the usual total number.More Than Two GroupsSuppose we wanted to compare the heights of people in more than two regions. What should we do about the sample size? Do we need more than 400 in each region, less than 400, or just 400? And the answer is... it depends on what estimates or contrasts you want to perform.If you are interested in comparing one particular region with another particular region, you will still need 400 in each of those regions to keep the same power to detect a difference. The fact that you have all those other regions in the analysis matters not a jot, I'm afraid. They don't increase the power of the design unless the number in each region is about 10 or less, which it never should be!If you are interested in comparing one particular region with the mean of every other, you've got the usual two-group design, but with 400 subjects in the region of interest and 400 divided up equally into the other regions.If you want to do every possible comparison between pairs of regions, or between pairs of groups of regions, things start to get complicated. As far as I can see, with six regions, say, only five completely independent comparisons are possible. So if you are concerned about inflation of the Type I error, you will need to apply Bonferroni's correction by reducing the p value to 0.05/5, or 0.01. Alas, a smaller p value means a bigger sample size. It's difficult to work out exactly what it should go up to, because somehow or other the inflated Type II error should also be taken into account. Certainly, nearly doubling the group size from the usual 400 would be a good start in this example, because as we've already seen on this page, that would be equivalent to a p value of 0.01 and a Type II error of 10%, instead of theusual 0.05 and 20%.More Than One Independent VariableSuppose you intend to measure half a dozen things like age, sex, body fat, whatever, and you want to know the effect of each of them on severity of injury in a particular sport. How many subjects do you need?Before we get clever with complex models for this question, let's take in the big view. If we treat each variable as a separate issue, it should be obvious that there will be a problem with inflation of the Type I error: none of the variables you've measured might predict severity of injury in the population, but if you have enough variables, there's a good chance one will predict injury in your sample. So you'll need to reduce your p value using Bonferroni's 0.05/n, where n is the number of independent variables. This correction will be too severe if the independent variables are correlated, but I don't know how to adjust for that.When you analyze the data, you should look at the effect of the independent variables separately to start with, but you will also end up using multiple linear regression, analysis of covariance, or some other complex model, with all the independent variables on the right-hand side of the model. As I explained on the first page devoted to complex models, you are now asking a question about how much each variable contributes to the severity of injury in the presence of (when you control for) the others. How many subjects do you need to answer this question? Theoretically the extra independent variables shouldn't make much difference, but I've checked by simulation to make sure. You need one extra subject for each extra independent variable. With five extra variables, that makes five extra subjects. Forget it. With a thousand or so subjects, five won't make any difference.Here's a different problem involving more than one independent variable, where you don't have to worry about increasing the sample size to reduce the Type I error. Suppose you are currently predicting competitive performance from four lab and field tests, and you want to know whether it's worth adding an expensive fifth test to the test battery. For this sort of problem, you would model the data by doing a multiple linear regression, with the expensive test as the last independent variable in the model. So, how many subjects? It's a specific extra variable in this case, so there is no inflation of the Type I error, so the sample size is still about 800. But if all the field tests were in there on an equal footing, and you wanted to know which ones to drop out of the test battery, then it's back to the bigger sample sizeof the previous example. In this case you'd use stepwise regression with a reduced p value for entry of variables into the model.。
临床试验样本量的估算
临床试验样本量的估算样本量的估计涉及诸多参数的确定,最难得到的就是预期的或者已知的效应大小(计数资料的率差、计量资料的均数差值),方差(计量资料)或合并的率(计数资料各组的合并率),一般需通过预试验或者查阅历史资料和文献获得,不过很多时候很难得到或者可靠性较差。
因此样本量估计有些时候不是想做就能做的。
SFDA的规定主要是从安全性的角度出发,保证能发现多少的不良反应率;统计的计算主要是从power出发,保证有多少把握能做出显著来。
但是中国的国情?有多少厂家愿意多做?建议方案里这么写:从安全性角度出发,按照SFDA××规定,完成100对有效病例,再考虑到脱落原因,再扩大20%,即120对,240例。
或者:本研究为随机双盲、安慰剂平行对照试验,只有显示试验药优于安慰剂时才可认为试验药有效,根据预试验结果,试验组和对照组的有效率分别为65.0%和42.9%,则每个治疗组中能接受评价的病人样本数必须达到114例(总共228例),这样才能在单侧显著性水平为5%、检验功效为90%的情况下证明试验组疗效优于对照组。
假设因调整意向性治疗人群而丢失病例达10%,则需要纳入病人的总样本例数为250例。
非劣性试验(α=0.05,β=0.2)时:计数资料:平均有效率(P)等效标准(δ)N=公式:N=12.365×P(1-P)/δ2计量资料:共同标准差(S)等效标准(δ)N=公式:N=12.365× (S/δ)2等效性试验(α=0.05,β=0.2)时:计数资料:平均有效率(P)等效标准(δ)N=公式:N=17.127×P(1-P)/δ2计量资料:共同标准差(S)等效标准(δ)N=公式:N=17.127× (S/δ)2上述公式的说明:1) 该公式源于郑青山教授发表的文献。
2) N 是每组的估算例数N1=N2,N1 和N2 分别为试验药和参比药的例数;3) P 是平均有效率,4) S 是估计的共同标准差,5) δ 是等效标准。
临床试验样本量的估算
临床试验样本量的估算样本量的估计涉及诸多参数的确定,最难得到的就是预期的或者已知的效应大小(计数资料的率差、计量资料的均数差值),方差(计量资料)或合并的率(计数资料各组的合并率),一般需通过预试验或者查阅历史资料和文献获得,不过很多时候很难得到或者可靠性较差。
因此样本量估计有些时候不是想做就能做的。
SFDA的规定主要是从安全性的角度出发,保证能发现多少的不良反应率;统计的计算主要是从power出发,保证有多少把握能做出显著来。
但是中国的国情?有多少厂家愿意多做?建议方案里这么写:从安全性角度出发,按照SFDA××规定,完成100对有效病例,再考虑到脱落原因,再扩大20%,即120对,240例。
或者:本研究为随机双盲、安慰剂平行对照试验,只有显示试验药优于安慰剂时才可认为试验药有效,根据预试验结果,试验组和对照组的有效率分别为65.0%和42.9%,则每个治疗组中能接受评价的病人样本数必须达到114例(总共228例),这样才能在单侧显著性水平为5%、检验功效为90%的情况下证明试验组疗效优于对照组。
假设因调整意向性治疗人群而丢失病例达10%,则需要纳入病人的总样本例数为250例。
非劣性试验(α=0.05,β=0.2)时:计数资料:平均有效率(P)等效标准(δ)N=公式:N=12.365×P(1-P)/δ2计量资料:共同标准差(S)等效标准(δ)N=公式:N=12.365× (S/δ)2等效性试验(α=0.05,β=0.2)时:计数资料:平均有效率(P)等效标准(δ)N=公式:N=17.127×P(1-P)/δ2计量资料:共同标准差(S)等效标准(δ)N=公式:N=17.127× (S/δ)2上述公式的说明:1) 该公式源于郑青山教授发表的文献。
2) N 是每组的估算例数N1=N2,N1 和N2 分别为试验药和参比药的例数;3) P 是平均有效率,4) S 是估计的共同标准差,5) δ 是等效标准。
样本量计算
样本量计算1.估计样本量的决定因素1.1 资料性质计量资料如果设计均衡,误差控制得好,样本可以小于30例; 计数资料即使误差控制严格,设计均衡, 样本需要大一些,需要30-100例。
1.2 研究事件的发生率研究事件预期结局出现的结局(疾病或死亡),疾病发生率越高,所需的样本量越小,反之就要越大。
1.3 研究因素的有效率有效率越高,即实验组和对照组比较数值差异越大,样本量就可以越小,小样本就可以达到统计学的显著性,反之就要越大。
1.4 显著性水平即假设检验第一类(α)错误出现的概率。
为假阳性错误出现的概率。
α越小,所需的样本量越大,反之就要越小。
α水平由研究者具情决定,通常α取0.05或0.01。
1.5 检验效能检验效能又称把握度,为1-β,即假设检验第二类错误出现的概率,为假阴性错误出现的概率。
即在特定的α水准下,若总体参数之间确实存在着差别,此时该次实验能发现此差别的概率。
检验效能即避免假阴性的能力,β越小,检验效能越高,所需的样本量越大,反之就要越小。
β水平由研究者具情决定,通常取β为0.2,0.1或0.05。
即1-β=0.8,0.1或0.95,也就是说把握度为80%,90%或95%。
1.6 容许的误差(δ)如果调查均数时,则先确定样本的均数( )和总体均数(m)之间最大的误差为多少。
容许误差越小,需要样本量越大。
一般取总体均数(1-α)可信限的一半。
1.7 总体标准差(s)一般因未知而用样本标准差s代替。
1.8 双侧检验与单侧检验采用统计学检验时,当研究结果高于和低于效应指标的界限均有意义时,应该选择双侧检验,所需样本量就大; 当研究结果仅高于或低于效应指标的界限有意义时,应该选择单侧检验,所需样本量就小。
当进行双侧检验或单侧检验时,其α或β的Ua 界值通过查标准正态分布的分位数表即可得到。
2.样本量的估算由于对变量或资料采用的检验方法不同,具体设计方案的样本量计算方法各异,只有通过查阅资料,借鉴他人的经验或进行预实验确定估计样本量决定因素的参数,便可进行估算。
(完整版)样本量计算(DOC)
1.估计样本量的决定因素1.1资料性质计量资料如果设计均衡,误差控制得好,样本可以小于30例;计数资料即使误差控制严格,设计均衡,样本需要大一些,需要30-100例。
1.2研究事件的发生率研究事件预期结局出现的结局(疾病或死亡),疾病发生率越高,所需的样本量越小,反之就要越大。
1.31.41.5度为1.61.71.8双侧检验与单侧检验采用统计学检验时,当研究结果高于和低于效应指标的界限均有意义时,应该选择双侧检验,所需样本量就大;当研究结果仅高于或低于效应指标的界限有意义时,应该选择单侧检验,所需样本量就小。
当进行双侧检验或单侧检验时,其α或β的Ua?界值通过查标准正态分布的分位数表即可得到。
2.样本量的估算由于对变量或资料采用的检验方法不同,具体设计方案的样本量计算方法各异,只有通过查阅资料,借鉴他人的经验或进行预实验确定估计样本量决定因素的参数,便可进行估算。
护理中的量性研究可以分为3种类型:①描述性研究:如横断面调查,目的是描述疾病的分布情况或现况调查;②分析性研究:其目的是分析比较发病的相关因素或影响因素;③实验性研究:即队列研究或干预实验。
研究的类型不同,则样本量也有所不同。
2.1描述性研究护理研究中的描述性研究多为横断面研究,横断面研究的抽样方法主要包括单纯随机抽样、系统抽样、分层抽样和整群抽样。
分层抽样的样本量大小取决于作者选用的对象是用均数还是率进行抽样调查。
例.要做一项有关北京城区护士参与继续教育的学习动机和学习障碍的现状调查,采用分层多级抽样,选用的是均数抽样的公式,Uα为检验水准α对应的υ值,σ为总体标准差,δ为容许误差,根据预实验得出标准差σ=1.09,取α=0.05,δ=0.1,样本量算得520例,考虑到10%-15%的失访率和抽样误差,样本扩展到690例。
2.2分析性研究2.2.1探索有关变量的影响因素研究有关变量影响因素研究的样本量大多是根据统计学变量分析的要求,样本数至少是变量数的5-10倍。
临床试验样本量的估算
临床试验样本量的估算样本量的估计涉及诸多参数的确定,最难得到的就是预期的或者已知的效应大小(计数资料的率差、计量资料的均数差值),方差(计量资料)或合并的率(计数资料各组的合并率),一般需通过预试验或者查阅历史资料和文献获得,不过很多时候很难得到或者可靠性较差。
因此样本量估计有些时候不是想做就能做的。
SFDA的规定主要是从安全性的角度出发,保证能发现多少的不良反应率;统计的计算主要是从power出发,保证有多少把握能做出显著来。
但是中国的国情?有多少厂家愿意多做?建议方案里这么写:从安全性角度出发,按照SFDA××规定,完成100对有效病例,再考虑到脱落原因,再扩大20%,即120对,240例。
或者:本研究为随机双盲、安慰剂平行对照试验,只有显示试验药优于安慰剂时才可认为试验药有效,根据预试验结果,试验组和对照组的有效率分别为65.0%和42.9%,则每个治疗组中能接受评价的病人样本数必须达到114例(总共228例),这样才能在单侧显著性水平为5%、检验功效为90%的情况下证明试验组疗效优于对照组。
假设因调整意向性治疗人群而丢失病例达10%,则需要纳入病人的总样本例数为250例。
非劣性试验(α=0.05,β=0.2)时:计数资料:平均有效率(P)等效标准(δ)N=公式:N=12.365×P(1-P)/δ2计量资料:共同标准差(S)等效标准(δ)N=公式:N=12.365× (S/δ)2等效性试验(α=0.05,β=0.2)时:计数资料:平均有效率(P)等效标准(δ)N=公式:N=17.127×P(1-P)/δ2计量资料:共同标准差(S)等效标准(δ)N=公式:N=17.127× (S/δ)2上述公式的说明:1) 该公式源于郑青山教授发表的文献。
2) N 是每组的估算例数N1=N2,N1 和N2 分别为试验药和参比药的例数;3) P 是平均有效率,4) S 是估计的共同标准差,5) δ 是等效标准。
样本量的计算
而对于市场调查;在市场研究中,常常有客户和研究者询问:“要掌握市场总体情况,到底需要多少样本量?”,或者说“我要求调查精度达到95%,需要多少样本量?”。
对此,我往往感到难以回答,因为要解决这个问题,需要考虑的因素是多方面的:研究的对象,研究的主要目的,抽样方法,调查经费…。
有人说,北京这么大,上千万人口,我们怎么也得做一万人的访问才能代表北京市吧。
根据统计学原理,完全不必。
只要在500-1000左右就够了。
当然前提是,我们要按照科学的方法去抽样。
根据市场调查的经验,市场潜力等涉及量比较严格的调查所需样本量较大,而产品测试,产品定价,广告效果等人们间彼此差异不是特别大或对量的要求不严格的调查所需样本量较小些。
样本量的大小涉及到调研中所要包括的人数或单元数。
确定样本量的大小是比较复杂的问题,既要有定性的考虑也要有定量的考虑。
从定性的方面考虑样本量的大小,其考虑因素有:决策的重要性,调研的性质,变量个数,数据分析的性质,同类研究中所用的样本量,发生率,完成率,资源限制等。
具体地说,更重要的决策,需要更多的信息和更准确的信息,这就需要较大的样本;探索性研究,样本量一般较小,而结论性研究如描述性的调查,就需要较大的样本;收集有关许多变量的数据,样本量就要大一些,以减少抽样误差的累积效应;如果需要采用多元统计方法对数据进行复杂的高级分析,样本量就应当较大;如果需要特别详细的分析,如做许多分类等,也需要大样本。
针对子样本分析比只限于对总样本分析,所需样本量要大得多。
具体确定样本量还有相应的统计学公式,根据样本量计算公式,我们知道,样本量的大小不取决于总体的多少,而取决于(1) 研究对象的变动程度;(2) 所要求或允许的误差大小;(3) 要求推断的置信程度。
也就是说,当所研究的现象越复杂,差异越大时,样本量要求越大;当要求的精度越高,可推断性要求越高时,样本量越大。
因此,如果不同城市分别进行推断时,"大城市多抽,小城市少抽"这种说法原则上是不对的。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
样本量计算Document number【SA80SAB-SAA9SYT-SAATC-SA6UT-SA18】1.估计样本量的决定因素资料性质计量资料如果设计均衡,误差控制得好,样本可以小于30例; 计数资料即使误差控制严格,设计均衡, 样本需要大一些,需要30-100例。
研究事件的发生率研究事件预期结局出现的结局(疾病或死亡),疾病发生率越高,所需的样本量越小,反之就要越大。
研究因素的有效率有效率越高,即实验组和对照组比较数值差异越大,样本量就可以越小,小样本就可以达到统计学的显着性,反之就要越大。
显着性水平即假设检验第一类(α)错误出现的概率。
为假阳性错误出现的概率。
α越小,所需的样本量越大,反之就要越小。
α水平由研究者具情决定,通常α取或。
检验效能检验效能又称把握度,为1-β,即假设检验第二类错误出现的概率,为假阴性错误出现的概率。
即在特定的α水准下,若总体参数之间确实存在着差别,此时该次实验能发现此差别的概率。
检验效能即避免假阴性的能力,β越小,检验效能越高,所需的样本量越大,反之就要越小。
β水平由研究者具情决定,通常取β为,或。
即1-β=,或,也就是说把握度为80%,90%或95%。
容许的误差(δ)如果调查均数时,则先确定样本的均数( )和总体均数(m)之间最大的误差为多少。
容许误差越小,需要样本量越大。
一般取总体均数(1-α)可信限的一半。
总体标准差(s)一般因未知而用样本标准差s代替。
双侧检验与单侧检验采用统计学检验时,当研究结果高于和低于效应指标的界限均有意义时,应该选择双侧检验,所需样本量就大; 当研究结果仅高于或低于效应指标的界限有意义时,应该选择单侧检验,所需样本量就小。
当进行双侧检验或单侧检验时,其α或β的Ua界值通过查标准正态分布的分位数表即可得到。
2.样本量的估算由于对变量或资料采用的检验方法不同,具体设计方案的样本量计算方法各异,只有通过查阅资料,借鉴他人的经验或进行预实验确定估计样本量决定因素的参数,便可进行估算。
护理中的量性研究可以分为3种类型:①描述性研究:如横断面调查,目的是描述疾病的分布情况或现况调查;②分析性研究:其目的是分析比较发病的相关因素或影响因素;③实验性研究:即队列研究或干预实验。
研究的类型不同,则样本量也有所不同。
描述性研究护理研究中的描述性研究多为横断面研究,横断面研究的抽样方法主要包括单纯随机抽样、系统抽样、分层抽样和整群抽样。
分层抽样的样本量大小取决于作者选用的对象是用均数还是率进行抽样调查。
例.要做一项有关北京城区护士参与继续教育的学习动机和学习障碍的现状调查,采用分层多级抽样,选用的是均数抽样的公式,Uα为检验水准α对应的υ值,σ为总体标准差,δ为容许误差,根据预实验得出标准差σ=,取α=,δ=,样本量算得520例,考虑到 10%-15%的失访率和抽样误差,样本扩展到690例。
分析性研究探索有关变量的影响因素研究有关变量影响因素研究的样本量大多是根据统计学变量分析的要求,样本数至少是变量数的5-10倍。
例如,如果研究肺结核患者生存质量及影响因素,首先要考虑影响因素有几个,然后通过文献回顾,可知约有12个预测影响变量,如年龄、性别、婚姻、文化程度、家庭月收入、医疗付费方式、病程、排菌、喀血、结核中毒症状、心理健康、社会支持,那么研究的变量就可以在60-120例。
这是一种较为简便的估算样本量的方法,在获得相关文献支持下,最好根据公式计算,计量资料的样本量估算可用公式,根据预实验中的数据(也可以依据其他文献的结果)得出标准差S和容许误差δ,代入公式最终计算出样本量,计数资料资料可用公式,P为样本率。
研究某变量对另一变量的影响对于研究某变量对另一变量的影响来说,样本量可以根据直线相关的公式获得,μα与μβ与分别为检验水准α和第Ⅱ类错误的概率β相对应的U值,ρ为总体相关系数。
例.要做一项血透患者自我管理水平对其健康状况影响的研究,假设α=,power=,查表得μα=,μβ=,总体相关系数可选用文献报道中血液透析患者自我管理水平与健康调查简表得分相关系数为,代入公式就可算出所需样本量为103例。
两变量或多变量的相关研究对于两变量或多变量相关的研究,样本量与自变量的多少有关,一般是其10倍,也可以采用公式计算。
Uα为检验水准α相对应的U值,S为标准差,δ为容许误差。
例. 研究慢性腰背痛患者认知-情感应对、自我和谐与适应水平的关系.设定显着性水平α= ,则Uα=,标准差和容许误差可从预实验中获得,根据预实验的S和δ值,算出S/δ=5,样本量则为99例.实验性研究实验性研究样本量的估算公式,也分计量资料和计数资料两种。
计量资料可采用两样本均数的计算公式N1=N2=,计数资料可采用率的计算公式。
式中N1、N2分别为两样本含量,一般要求相等,S为两总体标准差的估计值,一般假设其相等或取合并方差的平方根,δ为两均数之差值(若为自身对照,δ也可以写为d),tα/2和tβ/2分别为检验水准α和第Ⅱ类错误概率β相对应的t值。
α有单双侧之分,而β只取单侧。
例.一项心肌梗死患者院外自助式心脏康复的效果研究,可以采用此公式计算,其中的d可以选取文献中报道的、自助式康复手册的随机对照研究中的干预组和对照组在普通健康问卷GHQ的得分:d=,计算Sc为,双侧α=,β=,查表得tα/2=,tβ=,代入公式得出两组各需样本为56例。
附临床研究样本量的估计:1.计量资料对总体平均数m做估计调查的样本估计公式:式中:n为所需样本大小;Ua为双侧检验中,a时U的界值,当a=时, =,a=时,=;s为总体标准差;δ为容许的误差。
例1:某学校有学生3500人,用单纯随机抽样调查学生的白细胞水平,根据预查标准差为950个/ mm ,允许误差不超过100个/mm ,应调查多少人N=3500 d=100个/mm s=950个/mma=(双侧) Ua=n=×950/100) ≈347对样本均数与总体均数的差别做显着性检验时,所需样本的估计。
单侧检验用:单侧检验用:n=[(U2α+ U2β)s/δ](式)双侧检验用:n=[(Uα+ U2β)s/δ](式)式中:α与β分别为第一类错误及第二类错误出现的概率,Uα、U2α、U2β分别为α、2α、2β检验水准的t值。
2 计数资料对总体率π?做估计调查的样本大小公式:n=(Uα/δ) /P(1-P)(式)式中:δ为容许的误差:即允许样本率(p)和总体率(P)的最大容许误差为多少。
P为样本率。
例2:对某地HBsAg阳性率进行调查,希望所得的样本率(p)和总体率(P)之差不超过2%,基于小规模预调查样本率P=14%,应调查多少人 (规定a=已知:δ=, P=,a= , Ua=n=2/×(1- =1156需调查约1160人.对样本率与总体率的差别做显着性检验时,所需样本的估计。
单侧检验用:n=(U2α+ U2β/δ2)(式)双侧检验用:n=(Uα+ U2β/δ)(式)式中:α与β分别为第一类错误及第二类错误出现的概率,Uα、U2α、U2β分别为α、2α、2β检验水准的t值。
对样本均数与总体均数的差别做显着性检验时,所需样本的估计。
单侧检验用:n=[(U2α+ U2β)s/δ] P(1-P)式)双侧检验用:n=[(Uα+ U2β)s/δ] P(1-P)()式中:α与β分别为第一类错误及第二类错误出现的概率,Uα、U2α、U2β分别为α、2α、2β检验水准的U值。
3 病例对照研究的样本量估计选择患有特定疾病的人群作为病例组,和未患这种疾病的人群作为对照组,调查两组人群过去暴露于某种(些)可能危险因素的比例,判断暴露危险因素是否与疾病有关联及其关联程度大小的一种观察性研究。
设置估算样本量的相关值①人群中研究因素的暴露率(对照组在目标人群中估计的暴露率);②比值比 (odds ratio,OR) 估计出的各研究因素的相对危险度或暴露的比值比(即RR 或OR)③α值,检验的显着性水平,通常取α=或;④期望的把握度(1-β),通常区β=或;即把握度为90%或80%。
根据以上有关参数查表或代公式计算公式为:n=(U +U ) /(p1-p0)2(式)p1=p0×OR/1-p0+OR×P0=1/2(p1+p0) =1-q1=1-p1q0=1-p0p0与P1分别为对照组及病例组人群估计的暴露率;OR为主要暴露因子的相对危险度或暴露的比值比(RR或OR)。
q0=1-P0, q1=1-P1;为两组暴露史比例的平均值,既 =(P1+P2)/2, Q1=1-P1;例:拟用病例对照研究法调查孕妇暴露于某因子与婴儿先天性心脏病的关系。
估计孕妇有30%暴露于此因子。
现要求在暴露造成相对危险度为2时,即能在95%的显着性水平以90%的把握度查出,病例组和对照组各需多少例p0= OR=2,设α=, β=,用双侧检验Uα= Uβ?=p1=×2)/[1+(2-1)]=q0== =1/2+=q1== ==n= + )2/≈192 ,即病例组与对照组各需192人.4实验研究的样本量计算计量资料: 计量资料指身高、体重、血压、血脂和胆固醇等数值变量。
估计公式为:(式)n为计算所得一个组的样本人数,如果两组的人数相等,则全部试验所需的样本大小为2n;Uα为显着性水平相应的标准正态差;Uβ为β相应的标准正态差;δ为估计的标准差,δ2=(δ12+δ22)/2;d为两组数值变量均值之差,例题:某新药治疗高血压,将研究对象随机分为治疗组和对照组。
假设:a=, β=,血压的标准差分别为与,检测两组的血压差为。
查表:zα?=,zβ?=(双侧检验),需要多大样本。
计数资料:即非连续变量资料,如发病率、感染率、阳性率、死亡率、病死率、治愈率、有效率等。
当现场试验的评价指标是非连续变量时,按下式计算样本大小:n=[U +U ] /(P -P ) (式)P :对照组发生率P :实验组发生率5 诊断试验的样本量估计设置估算样本量的相关值①灵敏度60%;②特异度60%;③α值,检验的显着性水平,通常取α=或;④期望的把握度(1-β),通常区β=或;即把握度为90%或80%。
计算公式公式:n=(U/δ) /P(1-P)(式)α式中:Uα为显着性水平相应的U值,通常取α=或;δ为容许的误差:即允许样本率(p)和总体率(P)的最大容许误差为多少。
P为诊断试验的灵敏度或特异度;例:预计所评价的诊断试验的灵敏度为90%,特异度85%;δ=,规定a=,病例组和对照组应调查多少人已知:δ=, a= , Ua=n=2/×(1- =783n=2/×(1- =553对照组需783人, 病例组需553人。
参考文献[1] 胡修周.医学科学研究学[M].北京:高等教育出版社.2006,76.[2] 刘娜,倪平,陈京立. 护理研究中量性研究的样本量估计[J]. 中华护理杂志,2010, 45(4): 378-379.。