The different lines of evidence in epidemiology

Epidemiology is the study of diseases in population. For example, a precursor of epidemiology, John Snow, understood that an outbreak of cholera in London was due to infected water :

However, epidemiology is not limited to infectious diseases. For example, the study of type 1 diabetes (T1D) in the human population falls in the field of epidemiology. T1D is an autoimmune disease that affects children and results in the destruction of the insulin-producing beta cells in the pancreas. The treatment for the disease is to inject insulin several times a day for the rest of a patient’s life. The first thing epidemiologists study is the incidence (number of new cases per unit of time) and prevalence (total number of cases in the population) of a disease. For T1D in France, incidence is 13.5 new cases for 100 000 children under 15 per year and prevalence is around 2 out of 1000 people. Incidence after 15 years is not zero but one order of magitude lower.

A second question that epidemiologists are interested in is the causes of the diseases. Genetic causes have been investigated using the genome wide association study design (cf earlier post). Here, I will present the different kinds of study that can be done to try and understand the environmental determinants of a disease. I will start from the study design that provides the weakest evidence and is the less expensive to the study design  that provides the strongest evidence but is the most expensive.

Ecological study

An ecological study uses variables that are defined at the group level to try and explain differences between different groups. The classic example is Durkheim’s study on suicide and religion. He linked the differences in suicide rate between France and Germany to the different proportions of Catholics and Protestants in the two countries. This has been criticized as the “ecological fallacy”: the variables are measured at the group level and not at the individual level.

For type 1 diabetes, there is ecological evidence for the hygiene hypothesis; the idea that the increase in allergies and autoimmune disease incidence is linked to a lack of proper immune stimulation (too much aseptisation). Indeed, the incidence of many infectious diseases has decreased at the same time as incidence of allergies and autoimmune diseases increased (cf this  behind a paywall, if you want to read it nonetheless you should not use sci-hub because that’s illegal, have you thought of the publisher’s 40% profit margins?). While this kind of evidence is interesting, it is very weak evidence and many spurious correlations can be obtained in the same way. For example, cancer causes mobile phones.

Case-control study

A case-control study consists in collecting information at the individual level for cases (patients) and (healthy) controls and comparing the two groups. For example, my INSERM research team is conducting a case-control study in France on T1D. The information we collect are mainly environmental questionnaires that are filled by patients and controls. An important question is how do we choose the controls ? The patients are well-defined, they are recruited by the participating doctors when they are diagnosed but who should we compare them to ? The choice that was made by my team has been to have matched controls: each patient who answered the questionnaire was asked to give two other questionnaires to friends of the same age who then answered it. Look here for the results of the study.

A limitation of case-control studies using questionnaire is that they depend on our fallible memory. Recall errors simply add noise/missing data to the data. A more troubling problem is recall bias in which the perception of the disease influences differently  the cases and the controls. For example, a mother of a T1D patient might be more inclined to underestimate the amount of sugar the child was eating before diagnosis as sugar and T1D is linked in her mind.

Prospective cohort studies

A way to avoid concerns about recall is to do a prospective study in which you start from a healthy population and measure their exposition to environmental factors. You then follow the population for a long period to find out who becomes sick. An important advantage is that you can collect biological samples before the disease and therefore measure variables unavailable to a case-control study.

For a rare disease like T1D, a prospective study is faced with a difficulty, if we enroll 10000 children, only 200 will have developed T1D after 15 years. The solution is to screen the population for elevated genetic risk. For T1D, this is done using HLA typing (Human Leucocyte Antigen an important gene of the immune system and the largest genetic influence on T1D) and prospective studies are then conducted on the population positive for screening. There has been a number of such studies : DAISY, BABYDIAB,.. An ambitious one is underway: the TEDDY study.

Correlation is not causation

The case-control study and the prospective cohort study are both observational study designs. As such, they can show association between an environmental factor and a disease but not causation. The main problem is the possibility of confounding. To take an example, suppose we are interested in the relation between lung cancer and yellow coloration of finger tips. If we do not take into account smoking status, we will find an association between lung cancer and yellow coloration of finger tips but if we control (in a statistical sense) for it, the association will disappear since both lung cancer and yellow coloration of finger tips are caused by smoking.

We are not really interested in association but causation because we want to find interventions that influence the outcome. If to protect people with yellow finger tips from lung cancer, we painted their finger tips white or some other color, their risk of lung cancer remain the same.

What can we do if we want to make a causal claim ? We have to move from observational studies to interventional studies.

Randomized controlled trials (RCT)

The gold standard of evidence is the randomized controlled trial. Before a drug can be used, pharmaceutical companies need to conduct randomized controlled trials that prove that their drug is more effective than the current best practice (or placebo if no best practice exists). Patients are randomly separated in two groups: one that will receive the new drug and one that will receive the old drug/placebo. It is required that the patient and the doctor remain unaware of which arm of the study the patient is assigned to, what we call double-blind. This is of course not always possible if the tratment is something other than a pill.

For T1D, prevention RCTs have been conducted with the same screening step as in prospective studies. They have tested different presentations of insulin, later introduction to gluten, supplementation in nicotinamide or replacement of cow milk by modified milk. Unfortunately, all attempts remain fruitless for now.

A reason for the superiority of RCT compared to observational studies is that randomization ensures that the two groups will be similar for all possible confounders.

More on causation

Are we forced to always do RCT to prove causality? Sometimes it is not possible either for ethical, economical or practical considerations. To prove that smoking causes cancer, we would need to assign people to smoker and non-smoker groups and force them to comply which seems difficult in practice and unethical. The causal link between smoking and lung cancer has in fact never been tested in a RCT in humans. Even if it is based on observational studies, it is nevertheless the most famous epidemiological result and is beyond doubt.

So how do we go from observational studies to a causal claim ? Bradford Hill in 1965 set forward a list of criteria to fulfill before accepting a causal link, including replications of the associations in different populations. In the 80’s, Rosenbaum and Rubin introduced the potential outcome framework and propensity score matching which is a way to try and mimic a RCT with an observational study. More recently, Judea Pearl made important contributions based on graphical models. I had the pleasure of listening to Marloes Maathuis in Montpellier talk about causality in high-dimensional settings based on the formalism of Pearl. In short, causality is an active field of research in math.

A word on animal studies

Many experiences are done on mice or other animals to try and understand biology including biology related to diseases. There are even strains of mice and rats that are created to have a model of a disease such as NOD (Non-Obese Diabetic) mice for T1D. The result of an animal study shoould not however be taken as immediately translatable to humans. We know hundreds of ways to prevent diabetes in NOD mice but none in humans. Animal studies are one line of evidence among others.

Take-home message

Next time you read an article on a new study, try and find out what kind of study it is and modulate the article claims accordingly. Was it interventional? Was it retrospective or prospective? Is it in an animal model ?

Oh and by the way, GMOs are safe for your health. Get over it.


Leave a comment

Filed under introductory, Review

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s