Monthly Archives: November 2015

The sleeping beauty in the random forest: T-Trees

A sleeping beauty refers to an article that is undervalued and is not cited very often and then awakens later and is recognised as important. The most famous one is the work of Mendel that were published in 1865 and rediscovered 34 years later. To learn more about this concept, you can read this or that.

Today, I want to talk about a paper that is a bit too young to be a sleeping beauty but that seems undervalued :

Exploiting SNP Correlations within Random Forest for Genome-Wide Association Studies (2014) by Vincent Botta, Gilles Louppe, Pierre Geurts, Louis Wehenkel

The four authors are from Liège in Belgium. Their team is known to work on Random forest (if you do not know what that is, you can read my earlier blogpost on the subject). Geurts and Wehenkel have proposed a variant of random forest called extremely randomized trees. Gilles Louppe is the one who implemented random forest for the scikit-learn package for python (which I use. Thanks!). The first author Vincent Botta left academia after his PhD and went to work for a company (a start-up in newspeak).


The idea of the paper is to use biological structure in order to increase prediction accuracy. The additional structure used here is chromosomal distance. A SNP is located on a chromosome and it has neighbours. This information can be useful in several ways: Continue reading



Filed under Review

Transcriptomics and the stochasticity of biological cells

I just came home from a two-day conference in Evry 30 km south of Paris. Evry hosts a biocluster centred on genetics. It hosted the first genetic map in the 90s that inspired the human genome project and it is also the location where the French contribution to the human genome project -the sequencing of chromosome 14- took place.

I won’t be as rigorous as I usually aim to be, I will just try to give you a flavour of some of the talks.

Transcriptome from micro-arrays to RNA-seq by François Cambien

The transcriptome is the study of the expression of genes. According to the central dogma of molecular biology, DNA is is transcribed as an RNA in the nucleus. The RNA then goes in the cytoplasm and there it is translated into proteins. Transcriptomics is therefore the study of RNA abundance.

François Cambien works on heart diseases Continue reading

1 Comment

Filed under Non classé