Biochemistry

Abstract

CS-16-6 - Leveraging More from de novo Transcriptome Assemblies Using Machine Learning

Monday, July 16
4:58 PM - 5:00 PM

Plant genomes are often large and costly to sequence compared with other eukaryotic systems.  For many research applications, working with transcriptomes rather than full genomes represents a much more cost-effective alternative.  However, in the absence of a reference genome de novo assembly of RNA-seq data represents a considerable computational challenge: gene expression can lead to transcript abundance levels that vary by several orders of magnitude, complicating error detection, while alternative splice isoforms are very difficult to resolve.  Here, I will present a classification method based on machine learning to distinguish genuine paralogous gene copies from either splice isoforms or variant assemblies of the same gene.  Features used in classification include pairwise alignment details and BLAST statistics, which are easy to obtain.  This method is applicable for both single species and groups of related species, with a more robust feature set being possible in the latter case.  I will show how this method has been successfully used to obtain primary coding sequences for the transcriptomes of two related species of Atriplex without the aid of a reference genome.


 

Co-Authors

Tammy Sage – University of Toronto; Rowan Sage – University of Toronto

Matt Stata

PhD Candidate
University of Toronto

Presentation(s):

Send Email for Matt Stata


Assets

CS-16-6 - Leveraging More from de novo Transcriptome Assemblies Using Machine Learning



Attendees who have favorited this

Please enter your access key

The asset you are trying to access is locked. Please enter your access key to unlock.

Send Email for Leveraging More from de novo Transcriptome Assemblies Using Machine Learning