Data and Code

All data and code can be downloaded from here. For detailed instructions on their use, see the panel to the right.

CGP Website

Visit the CGP website, where all expression and phenotype data can be downloaded for the cell lines. This site also contains an array of tools for visualizing and analysing the data.

Code and instruction of in vivo drug sensitivity prediction

This page contains all of the code and instructions to fully reproduce the analysis from our paper Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines (Genome Biology, 2014). Annotated R code has been made available in Sweave format. To reproduce the analysis, first download all scripts and data . Unzip this file, navigate to the relevant folder (see below) and run the Sweave file from the R prompt. E.g. to run the entire docetaxel in breast cancer analysis (using linux):

$ wget
$ unzip
$ cd paper/docetaxelAnalysis/
$ R
> Sweave("docetaxelBreastCancer.Snw")

NB: All required R packages must be installed first (see Sweave files)!

Locations of various scripts

The zip file contains folders with the analysis to re-create the analysis for the four different drugs. Each folder contains a Sweave file (.Snw), an .R file and a PDF file, which contain all of the code and relevant annotation. Note that the PDF was created by running the .Snw file (as above).

Analysis Scripts

The R/Sweave code to analyse the data are contained in the folders docetaxelAnalysis/, bortezomibAnalysis/, cisplatinAnalysis/ and cerlotonibAndSorafenib/

Scripts to acquire and preprocess data

The code to preprocess all raw data is contained in these folders processDocetaxelData/, processRawCisplatinData/, processRawErlotinibData/ and processRawGdscData/. The PDF files in these folders (generated from the .Snw files) contain instructions on where to acquire all of the raw data (i.e. .CEL files and phenotype data). If you do not wish to download and preprocess all of the data yourself, this data is already contained in the Data/ folder (see below).


This folder contains the data that was created by the scripts above (i.e. all of the preprocessed data).


This folder contains the script to create figure 2 (i.e. the PCA plots).


Contains figure 1 and some of the supplementary figures.


This folder contains some of the functions that are called by the scripts which analyse the data.