Skip to Content

Transcriptome Assembly and Annotation

de novo Assembly

Sequencing reads from the Silene vulgaris transcriptome project were assembled into contigs using 454’s de novo Assembler v2.3 (“newbler”). One of the challenges of working with transcriptome sequence data (particularly from pooled samples of multiple individuals) is to deal with structural variation that can arise from alternative intron splicing and/or allelic variation. de novo Assembler addresses this issue by organizing related contigs into collections referred to as “isogroups”. In principle, the contigs from each isogroup should correspond to a single gene or locus (although grouping of closely related paralogs is also likely). Within each isogroup, contigs are connected in various combination to form “isotigs”, which can loosely be thought of as individual transcripts or splice variants. Because most of the search features on this site are based on isotig sequences, it is important to recognize that each isogroup can have many associated isotigs, which likely do not represent distinct loci within the genome. More information on isogroups, isotigs and related assembly methods is available here.

Assembly Results

Approximately 87% of the 959,520 sequencing reads were assembled into contigs with a total length of 25.4 Mb. These contigs were organized into 18,178 isogroups, representing a total of 37,976 isotigs. The average isotig size is 1.3 kb.

Transcriptome Annotation

Assembled transcriptome sequences were annotated by performing BLAST searches against multiple public databases. BLAST results were parsed with custom BioPerl scripts, which were modified from annotation scripts developed by Eli Meyer. Based on the top BLAST hits, transcriptome sequences were associated with gene names, protein domains, and gene ontology (GO) annotations.