Computational pipelines and knowledge-bases
Pipelines for the analysis of genomic data are an essential part of what we do. While there is much useful material available, our foray into complex biological systems and interrogation via metagenomic sequencing have provoked development of new kinds of analyses. One particular example comes from the work of Steven Quistad on microbial communities whose pipelines allow identification of DNA sequences moving via selfish genetic elements. Central to this project is so-called barcode sequencing to track gene transfer (project lead Andy Farr). Our pipelines can also be applied to track barcoded lineages over time and to measure genomic biodiversity, mean fitness, selection coefficients, as well as the likelihood of adaptation in growth experiments on SBW25 in quasi-static cultures (project lead Loukas Theodosiou). As these pipelines are produced in a user friendly format they will be made available here.
Beyond specific computational pipelines, Carsten Fortmann-Grote has established a knowledge-base with focus being the SBW25 genome. Too often genomes, once sequenced and annotated, are left and never updated despite the availability of new information. This is a travesty! Via the SBW25 knowledge-base it is possible to update information concerning the genome, for example, to add new information on gene function, correct annotation errors, and update annotations based on new information. Also available within the knowledge base is the usual suite of tools for comparative genome analysis, for primer design, DNA sequence extraction, survey of interactions among proteins, and ortholog analysis. Connections to third party *omics databases, as well as to scientific literature and other resources trawled from the web relevant to SBW25 and it’s particular genes make the knowledge-base a one-stop-shop for all SBW25 related information.
Additionally, the knowledge-base allows data from genomic analyses to be archived in a single place and to be made available to all users for further analysis. This includes data from re-sequencing projects, data from RNA-Seq, Tn-Seq, and barcode sequencing.
The resources will be made publicly available in the near future.