Any research group will sooner or later face a situation where they want to base their conclusion on some kind of statistical analyses. In planning these kinds of experiments/ studies, an overarching question is how to design the study to allow for proper statistical analyses.

Without a sufficient sample size, comparisons between groups may be irrelevant. With too many samples biologically insignificant differences may show up as statistically different.

 Before you start any calculations on sample size, the researcher has to

  1. Formulate a precise research question to be addressed
  2. Describe the aims of the study
  3. Decide which study type to go for (Observational/ epidemiological study or controlled/ experimental study)
  4. State the present level of knowledge in the area
  5. Define variables to be studied - OUTCOME variable(s) and explanatory/ experimental variables
  6. Look into available resources, possibly reducing the number of questions to be addressed in the study

 

Sample size and power calculations, technical aspects 

Normally, sample size calculations are undertaken after step 1-6 in the preceding list, and are only one part of the overall process of designing a study/ writing a protocol. Only after carefully considering the research question and deciding upon the type of study the formal calculations can be done. This may be anything from comparing to samples from a population, a case control study to complex study designs.

 The theory behind sample size calculations are presented in almost any textbook in statistics/ epidemiology. The EpiCentre recommends:

  • Altman. Practical Statistics for Medical Research for a general introduction to the topic
  • A simpler alternative: Petrie/ Watson: Statistics for Veterinary and Animal Science
  • Dohoo et al. Veterinary Epidemiologic Research for epidemiological studies
  • Web resources as Wikipedia and a long range of others

A standard part of sample size caclulations is to perform a power analysis – to find how likely you will be to find a difference with your sample size. Please also notice that power analyses often also needs to be performed after the study – as a part of the discussion of results.

The theory behind sample size calculations demands that you know some basics about the expected difference, variance etc. In many situations this may not be known, and sample size will have to be based upon certain assumptions. Please remember that statistical interpretation is related to your level of knowledge before the study: The more precise question, the more precise design, the more easy statistical analyses, the more easily interpretable results.

Examples and illustrations

Continuous variables, comparing two groups is a classic situation. The sample size as a function of effect size in standard deviations assuming a 90% power, a 5% significance level and a two-sided test. For other sets of significance and power, you will get other numbers (Graph from Isogenic). As can be seen, sample size depends mainly on the difference (effect size) to detect. The smaller difference related to the SD the more samples you need. This method is often referred to as the power method.

Sample1

The resource equation method is an elegant approach if which can be analysed using the analysis of variance such as: exploratory experiments, complex biological experiments with several factors and treatments and any experiment where the power analysis method is not possible or practicable. This is well expained in Isogenic and boils down to the very simple equation: E=N-B-T, where E is the error df and should be between 10 and 20, N is the total df, B is the blocks df, and T is the treatments df. In a non-blocked design the equation reduces to E=N-T should be 10-20. which is simply: The total number of animals minus the number of treatments should be between ten and twenty.

Comparing two proportions is another classic example. Here the easiest is to study the tables below, where the expected proportion of each group is the key to read the table. Comparing groups with extected 0.20 and 0.10 gives n=219 in each group:

 

Software

Technical sample size/power calculations may be carried out in most software platforms, and also using a long range of web-based resources. We recommend that the research team is familiar with one or more of the following:

  • Statistical packages as JMP (general purpose) and Stata (epidemiological studies) with SAS as an alternative to Stata
  • Web-based resources as AusVet and OpenEpi
  • For experimental animals, the Isogenic page is an excellent option

 

The following list illustrates the software you may need, with red colour indicating recommended software/ platform:

 Research question

Stata

JMP

AusVet

Open Epi

Two group comparisons

(means/ proportions)

+

++

+

++

Multiple group comparison

 

++

 

 

Simple precision, power

+

++

+

++

Randomized trials

 

++

 

+++

Cross-sectional/ Cohort studies

 

++

 

++

Case-control studies

 

 

 

+++

Survival studies

++

 

 

 

Survey/ diseases surveillance

 

 

+++

 

Freedom of Disesase

 

 

+++

 

 

Examples of sample size/ power calculation at different recommended platforms is found in the enclosed pdf file.

For more information about sample size and power analyses, consult the staff at the EpiCentre.