communication [at] (communication)

Hypothesis-free NGS data analysis

Daniel Gautheret, Institute for Integrative Biology of the Cell, CNRS UMR 9198 - Universite Paris-Sud, invited by LBPA Laboratory, will give a lecture about "Hypothesis-free NGS data analysis", friday 2nd february.
IDA Building, Chemla amphitheater
Add to my Calendar02/02/2018 11:00am 02/02/2018 12:30pm Hypothesis-free NGS data analysis IDA Building, Chemla amphitheater Europe/Paris public

Computational pipelines for NGS data analysis involve multiple hypotheses and simplifications leading to an important loss of information. For instance, a major limiting factor is the mapping step where NGS reads are aligned to a reference genome or transcriptome. In RNA-seq analysis, relying on a reference transcriptome amounts to ignoring novel genes, alternative transcripts and transcripts from repeats or with high levels of mutation or editing. Hundreds of dedicated software have been developed to bypass these limitations and retrieve specific event types, with highly diverging results.

Our lab has developed a method for RNA-seq data analysis, DE-kupl (1), in which NGS data is analyzed at the level of raw sequence using k-mers (i.e. subsequences of length k, with typically k=31) followed by differential expression analysis. Only k-mers that are differentially represented between two sets of libraries are extracted and analyzed.

Therefore, all biological variation present in the original NGS dataset is theoretically collected, with no prior hypothesis about their origin.
We will show how DE-kupl can be applied to various experimental settings and present our plans for future developments, including application to the discovery of novel biomarkers based on cliniciallly annotated DNA-seq or RNA-seq data.

(1) Audoux J, Philippe N, Chikhi R, Salson M, Gallopin M, Gabriel M, Le Coz J, Commes T, Gautheret D. (2017) DE-kupl: Exhaustive capture of biological variation in RNA-seq data through k-mer decomposition. Genome Biol. 18: 243.