Using statistics in research
Statistical methods provide a way for formally accounting for sources of variability in patients’ responses to treatment.
The use of statistics allows the clinical researcher to form reasonable and accurate inferences from collected information, and make sound decisions in the presence of uncertainty.
Statistics are key to preventing errors and biases in medical research.
A hypothesis is an assumption, or set of assumptions, that either:
- asserts something on a provisional basis with a view to guiding scientific investigation; or
- confirms something as highly probable in light of established facts.
If we have a hypothesis that asserts something, for example, that a new treatment for a disease is better than the existing standard of care treatment, if the new treatment is ‘B’, and the standard of care treatment is ‘A’ then the hypothesis states that ‘B’ is better than ‘A’.
Rather than trying to prove the ‘B’ hypothesis, scientific method assumes that in fact ‘A’ is true – that there is no difference between the standard of care and the new treatment. This is known as the ‘Null’ hypothesis.
Scientists then try to disprove ‘A’. This is also known as proving the Null hypothesis false. If they can prove that hypothesis ‘A’ is false, and that the standard of care is not better than the new treatment – it follows that ‘B’ is true, and that the new treatment ‘B’ is better than the standard treatment ‘A’.
So, why is null hypothesis tested? This seems to suggest that trying to prove the Null hypothesis false or wrong is a more rigorous and achievable objective than trying to prove the alternative hypothesis is right.
This does not properly explain why science adopts this approach, but perhaps it can help us to comprehend and accept a tricky concept more easily!
No amount of experimentation can ever prove me right; a single experiment can prove me wrong.A. Einstein
Type I and Type II errors
Type I errors could kill a patient – imagine a study that incorrectly found that the standard of care was not better than the new treatment – and consequently gave new treatments to people with catastrophic results. Committing Type I errors will incorrectly detect an effect that is not present.
Type II errors mean that potentially valuable research goes to waste. Perhaps this research could have been really useful, but no harm is done to patients. Committing Type II errors will fail to detect an effect that is present.
Type I errors are more serious than Type II errors when it comes to patients.
Significance Level and Statistical Power
Significance Level is the probability of committing a Type I error. This is affected by the size of the sample, and by the statistical power of the test.
The ‘power’ of a statistical test is the probability that it will correctly lead to the rejection of a Null hypothesis, or in other words, the ability of the test to detect an effect (if that effect actually exists).
Another way of describing this is to say that the ‘power’ of a test is the probability of not making a Type II error.
P-values, or ‘probability’ values, weigh the strength of the evidence on a scale between 0 and 1.
A small p-value (typically less than 0.05, or 5%) indicates that there is strong evidence against the Null hypothesis, which might lead you to reject the Null hypothesis.
A large p-value (greater than 0.05) indicates the opposite. That is, the Null hypothesis is likely to be true.
Correlation versus Causation
When analysing the results from a trial, it is important to remember that correlation is not the same thing as causation. Correlation is when two variables are linked in some way.
This does not mean that one will cause the other (there is an association between both variables).
An example of this involves hormone replacement therapy (HRT) and coronary heart disease (CHD).
- Women taking HRT were found to be at less risk from CHD.
- This was not due to the actual HRT process.
- The group of women receiving HRT tended to belong to a higher socio-economic group, with better-than-average diets and exercise regimes, and therefore at less risk of CHD.
Causation can be observed when a factor causes an outcome. A causal factor is often a partial cause of an outcome.
To differentiate between correlation and causation it is important to record as much information as possible about the participants in trials. It is also necessary to carefully apply the scientific methodology in clinical trials design and to assess the possible bias in the trial.
Data manipulation is used to describe the transformation of data and/or the application of certain statistical methods. It is also used to describe the malicious changing of data or misrepresentation of data.
- An example (good) is when a researcher removes the outliers (a result that is very much bigger or smaller than the next nearest result) from the results, it is important to verify that those are truly outliers and not just results that differ from the expected or wanted results.
- An example (bad) is when data that disagree with the expected result are intentionally discarded to increase the proportion of results that would confirm the stated hypothesis.
Data transformation is the application of a mathematical formula to some data gained through a trial.
This is often used to make the presentation of data clearer or easier to understand. For example, when measuring fuel efficiency for cars, it is natural to measure efficiency in the form of ‘kilometres per litre’. However, if you were assessing how much additional fuel would be required to increase the distance travelled, it would be expressed as ‘litres per kilometre’.
Applying an incorrect formula to obtain the new data in this case would affect the overall results of the trial.
Data merging is the act of combining data from multiple studies in order to gain a better understanding of the situation. One of the most common forms of this is meta-analysis where the results from several published trials are put together and compared.
It is important whilst performing a meta-analysis to check that the trial methodologies are the same or comparable. Any differences on design need to be taken into account, so that there are no underlying different variables (confounding variables). An example of incorrect data merging might be aggregating data from several trials with different species of mice during animal testing.