To mark International Women’s Day, Hartree Centre Data Scientist, Simon Goodchild writes a blog post to celebrate the work of a pioneering epidemiologist and doctor Janet Lane-Claypon. At the time of writing the post, Simon was studying medical statistics for the first time as part of a statistical society diploma and was surprised to have not previously heard about a woman who had invented two of the key techniques he was learning about!
How do you know that your treatment actually works?
How do you know whether something in the environment may impact upon your health?
These are some of the most basic and most important questions in medicine and epidemiology. Getting good answers is vital, and nowadays there are established procedures for finding sensible answers. Several of these can be traced back to the under-recognised work of Janet Lane-Claypon in the early part of the 20th century.
In 1907, Lane-Claypon was working at the Institute of Preventative Medicine in London, investigating the growth of babies. She was ideally placed to investigate work of this nature, having been a brilliant student at the London School of Medicine for Women (the last bit is rather a sign of the times), starting in 1898 and earning academic distinction that included both M.D. and Ph.D. degrees. She looked at whether it was better for a baby’s growth to feed them cow’s milk or human milk. Lane-Claypon approached this by finding comparable groups of infants, some who had been fed cows’ milk and some who had been breast-fed, and studied the differences between their weight.
This may seem obvious to us now, but at the time it was a new way of solving problems. Lane-Claypon’s study is one of the very first examples of a cohort study – comparing reasonably sized groups of similar people to try and determine the size of an effect. In this case, she travelled to Berlin, where a charitable fund was paying for consultations for newborn babies. She was able to obtain data about the weights of 300 babies who had been breast-fed and 204 who had been fed cows’ milk, and analysed this to determine whether either was more effective.
Her final report was a careful, detailed examination of the data which was pioneering in a number of ways. From a simple plot of the mean weight of the two groups over time, it appeared that breast-fed babies gained weight more quickly, but she was careful to investigate all the possible causes for this observation. First, she looked at whether the result was simply due to chance, something she describes as sampling error. To do this she used what is now known as a two-sample z-test, which compares the difference between the two means to the expected variation, which can be measured from the standard deviations of the two samples. If the difference is larger than expected due to chance variation, then it is likely to be significant.
Looking at a particular small part of the data, the general conclusion didn’t seem to work for one group; for babies in the first eight days, cows’ milk seemed more effective. Lane-Clayton analysed this using the t-test which had only been published a few years previously – at the time it was the statistical state-of-the-art and only used by experts – and concluded that this result probably wasn’t significant.
Finally, and of equal importance, she investigated whether the effect was due not to the different type of milk, but other causes. These are called confounding factors, and it is critical to work out whether they have an effect if you’re trying to decide whether your data really shows what it seemed to. Lane-Claypon was concerned that the effects she saw might be due to social class, so she calculated the correlation between the babies’ weights and their fathers’ wages while controlling for the method of feeding. At the time, this was another piece of cutting-edge statistics, as she used a method published by Pearson in 1909. The correlation turned out to be 0.026±0.036, effectively zero within experimental error.
Having analysed the data so carefully and ruled out likely confounding factors, Lane-Claypon could be confident in her conclusion that “the evidence dealt with throughout this report emphasises very forcibly the importance of breast-feeding for the young of all species.” Almost all the features of a modern study are already here – data collection, good statistical analysis to see if the conclusions are not just down to chance, and an investigation of whether other factors might be causing the result. Nowadays it would just be proper procedure, but in 1912 this was incredibly innovative. Not only had Lane-Claypon created a new form of study, but she had also carried it out rigorously, and used the latest methods in statistics to analyse her data.
Later in her career, she moved to the Ministry of Health and began studying breast cancer. In 1926 she extended her cohort study by publishing one of the first case-control studies, looking for the causes of breast cancer. In this study, she compared 500 women who had breast cancer with 500 controls, women free of breast cancer, and used a detailed 50-question survey to collect as much information as she could about their life histories. Using this, she identified that women who had more children, who started having children earlier and who breastfed more were less likely to develop cancer; conclusions which were confirmed by a 2010 re-analysis of her data using the full power of modern statistical methods.
Lane-Claypon’s career was brought to a premature end in 1929 when she got married, as the Civil Service didn’t allow married women to work there. She retired to the countryside and lived until she was 90. In her career she had pioneered two of the most important methods for modern epidemiology, and it is hard not to agree with Katherine Nightingale in the MRC’s Insight when she says:
“Who knows what she could have achieved if she’d carried on?”