This is a broader explanation which includes components that address the question.
First consider the reasons for choosing a particular
type of epidemiological study.
These are the broad types of epidemiological studies:
Reference URL:
To help with making a selection, this diagram summarises the basis of each type of study:
Reference URL:
courses.lumenlearning.com
A
prospective cohort study or better yet, a
randomised controlled trial, might provide the strongest evidence for the hypothesis, but waiting 10-20 years for results may not be appropriate.
A
cross sectional observational study is simply a snapshot in the present of everyone in the study, so there’ll be a range of values for each variable (ie. range of different ages, red meat consumption and absent/weak/strong family history of bowel cancer). Multiple regression analyses is used to determine how each independent variable (age, family history of bowel cancer, red meat consumption, alcohol consumption) correlates with the dependent variable (bowel cancer). Regression analyses are also used to examine shared variance between the independent variables (for example, a lot of people drink red wine when they eat red meat, so the shared variance of these two variable is likely to be large) to determine how much each factor correlates to the outcome independent of the other factors.
Cross sectional studies only show correlation, not causation, since for example both red meat and red wine might correlate with incidence of bowel cancer, but we don’t know if both contribute to causation or if one factor is the main causative variable & the other is just very highly correlated. Hence whilst cross sectional studies are the quickest and easiest studies to perform and they can suggest factors that are
linked to the disease of interest, they don’t provide any evidence of causation.
A retrospective
case-control study is somewhat of a compromise between the above study designs and would be the epidemiology study I’d choose to test the given hypothesis.
In a case-control study, patients who have the disease (bowel cancer) are selected as cases and their past exposure to the suspected causative factor of interest (alcohol) is compared with that of individuals who do not have the disease, who are selected as controls. Where possible, cases are matched to controls for possible confounders (for example matching each case with control(s) of the same age and selecting case & control groups with approximately the same proportion of people who are vegetarian/ pescatarian/ red meat eaters - a questionnaire could be used to estimate how much/how often red meat is consumed in a week). The ratio of controls to cases is often based on the expected odds ratio of the condition in cases vs controls. Where it is impossible to eliminate confounding by matching cases and controls (for example if family history is a strong factor, the proportion of cases with a positive family history may be larger than can be matched by a control group), then the impact of confounding factors could be estimated by logistic regression and taken into account when estimating the residual effect due to alcohol.
In the study design, estimation of the sample size required to reach statistical power depends on:
(1) Expected odds ratio (of the disease) between exposed and non-exposed groups: required sample size is inversely proportional to odds ratio - if alcohol consumption is strongly associated with bowel cancer, a smaller number of subjects in both the cases & controls groups are required to demonstrate the link.
(2) The probability of exposure in cases and in controls - if alcohol consumption were generally uncommon, then larger sample sizes would be required in both groups.
(3) Statistical power - increasing the sample size increases the probability of finding an effect if it is real. Studies are commonly designed to achieve statistical power of 0.80 - 0.95.
(4) Alpha: probability of a false effect, usually set at 0.05.
There are a number of free sample size calculators - JavaStat is quite easy to use and popular for a range of statistical analyses (
https://statpages.info/proppowr.html).