SimpleSynteny: a web-based tool for visualization of microsynteny across multiple species.
ABSTRACT: Defining syntenic relationships among orthologous gene clusters is a frequent undertaking of biologists studying organismal evolution through comparative genomic approaches. With the increasing availability of genome data made possible through next-generation sequencing technology, there is a growing need for user-friendly tools capable of assessing synteny. Here we present SimpleSynteny, a new web-based platform capable of directly interrogating collinearity of local genomic neighbors across multiple species in a targeted manner. SimpleSynteny provides a pipeline for evaluating the synteny of a preselected set of gene targets across multiple organismal genomes. An emphasis has been placed on ease-of-use, and users are only required to submit FASTA files for their genomes and genes of interest. SimpleSynteny then guides the user through an iterative process of exploring and customizing genomes individually before combining them into a final high-resolution figure. Because the process is iterative, it allows the user to customize the organization of multiple contigs and incorporate knowledge from additional sources, rather than forcing complete dependence on the computational predictions. Additional tools are provided to help the user identify which contigs in a genome assembly contain gene targets and to optimize analyses of circular genomes. SimpleSynteny is freely available at: http://www.SimpleSynteny.com.
Project description:Insect genomes contain larger blocks of conserved gene order (microsynteny) than would be expected under a random breakage model of chromosome evolution. We present evidence that microsynteny has been retained to keep large arrays of highly conserved noncoding elements (HCNEs) intact. These arrays span key developmental regulatory genes, forming genomic regulatory blocks (GRBs). We recently described GRBs in vertebrates, where most HCNEs function as enhancers and HCNE arrays specify complex expression programs of their target genes. Here we present a comparison of five Drosophila genomes showing that HCNE density peaks centrally in large synteny blocks containing multiple genes. Besides developmental regulators that are likely targets of HCNE enhancers, HCNE arrays often span unrelated neighboring genes. We describe differences in core promoters between the target genes and the unrelated genes that offer an explanation for the differences in their responsiveness to enhancers. We show examples of a striking correspondence between boundaries of synteny blocks, HCNE arrays, and Polycomb binding regions, confirming that the synteny blocks correspond to regulatory domains. Although few noncoding elements are highly conserved between Drosophila and the malaria mosquito Anopheles gambiae, we find that A. gambiae regions orthologous to Drosophila GRBs contain an equivalent distribution of noncoding elements highly conserved in the yellow fever mosquito Aëdes aegypti and coincide with regions of ancient microsynteny between Drosophila and mosquitoes. The structural and functional equivalence between insect and vertebrate GRBs marks them as an ancient feature of metazoan genomes and as a key to future studies of development and gene regulation.
Project description:Nematodes are an attractive group of organisms for studying the evolution of developmental processes. Pristionchus pacificus was established as a satellite organism for comparing vulva development and other processes to Caenorhabditis elegans. The generation of a genetic linkage map of P.pacificus has provided a first insight into the structure and organization of the genome of this species. Pristionchus pacificus and C.elegans are separated from one another by >100 000 000 years such that the structure of the genomes of these two nematodes might differ substantially. To evaluate the amount of synteny between the two genomes, we have obtained 126 kb of continuous genomic sequence of P.pacificus, flanking the developmental patterning gene pal-1. Of the 20 predicted open reading frames in this interval, 11 have C.elegans orthologs. Ten of these 11 orthologs are located on C.elegans chromosome III, indicating the existence of synteny. However, most of these genes are distributed over a 12 Mb interval of the C.elegans genome and only three pairs of genes show microsynteny. Thus, intrachromosomal rearrange ments occur frequently in nematodes, limiting the likelihood of identifying orthologous genes of P.pacificus and C.elegans based on positional information within the two genomes.
Project description:SyMAP (Synteny Mapping and Analysis Program) was originally developed to compute synteny blocks between a sequenced genome and a FPC map, and has been extended to support pairs of sequenced genomes. SyMAP uses MUMmer to compute the raw hits between the two genomes, which are then clustered and filtered using the optional gene annotation. The filtered hits are input to the synteny algorithm, which was designed to discover duplicated regions and form larger-scale synteny blocks, where intervening micro-rearrangements are allowed. SyMAP provides extensive interactive Java displays at all levels of resolution along with simultaneous displays of multiple aligned pairs. The synteny blocks from multiple chromosomes may be displayed in a high-level dot plot or three-dimensional view, and the user may then drill down to see the details of a region, including the alignments of the hits to the gene annotation. These capabilities are illustrated by showing their application to the study of genome duplication, differential gene loss and transitive homology between sorghum, maize and rice. The software may be used from a website or standalone for the best performance. A project manager is provided to organize and automate the analysis of multi-genome groups. The software is freely distributed at http://www.agcol.arizona.edu/software/symap.
Project description:Many genomes display high levels of heterozygosity (i.e. presence of different alleles at the same loci in homologous chromosomes), being those of hybrid organisms an extreme such case. The assembly of highly heterozygous genomes from short sequencing reads is a challenging task because it is difficult to accurately recover the different haplotypes. When confronted with highly heterozygous genomes, the standard assembly process tends to collapse homozygous regions and reports heterozygous regions in alternative contigs. The boundaries between homozygous and heterozygous regions result in multiple assembly paths that are hard to resolve, which leads to highly fragmented assemblies with a total size larger than expected. This, in turn, causes numerous problems in downstream analyses such as fragmented gene models, wrong gene copy number, or broken synteny. To circumvent these caveats we have developed a pipeline that specifically deals with the assembly of heterozygous genomes by introducing a step to recognise and selectively remove alternative heterozygous contigs. We tested our pipeline on simulated and naturally-occurring heterozygous genomes and compared its accuracy to other existing tools. Our method is freely available at https://github.com/Gabaldonlab/redundans.
Project description:<h4>Background</h4>The recent introduction of the Pacific Biosciences RS single molecule sequencing technology has opened new doors to scaffolding genome assemblies in a cost-effective manner. The long read sequence information is promised to enhance the quality of incomplete and inaccurate draft assemblies constructed from Next Generation Sequencing (NGS) data.<h4>Results</h4>Here we propose a novel hybrid assembly methodology that aims to scaffold pre-assembled contigs in an iterative manner using PacBio RS long read information as a backbone. On a test set comprising six bacterial draft genomes, assembled using either a single Illumina MiSeq or Roche 454 library, we show that even a 50× coverage of uncorrected PacBio RS long reads is sufficient to drastically reduce the number of contigs. Comparisons to the AHA scaffolder indicate our strategy is better capable of producing (nearly) complete bacterial genomes.<h4>Conclusions</h4>The current work describes our SSPACE-LongRead software which is designed to upgrade incomplete draft genomes using single molecule sequences. We conclude that the recent advances of the PacBio sequencing technology and chemistry, in combination with the limited computational resources required to run our program, allow to scaffold genomes in a fast and reliable manner.
Project description:Web-based synteny visualization tools are important for sharing data and revealing patterns of complicated genome conservation and rearrangements. Such tools should allow biologists to upload genomic data for their own analysis. This requirement is critical because individual biologists are generating large amounts of genomic sequences that quickly overwhelm any centralized web resources to collect and display all those data. Recently, we published a web-based synteny viewer, GSV, which was designed to satisfy the above requirement. However, GSV can only compare two genomes at a given time. Extending the functionality of GSV to visualize multiple genomes is important to meet the increasing demand of the research community.We have developed a multi-Genome Synteny Viewer (mGSV). Similar to GSV, mGSV is a web-based tool that allows users to upload their own genomic data files for visualization. Multiple genomes can be presented in a single integrated view with an enhanced user interface. Users can navigate through all the selected genomes in either pairwise or multiple viewing mode to examine conserved genomic regions as well as the accompanying genome annotations. Besides serving users who manually interact with the web server, mGSV also provides Web Services for machine-to-machine communication to accept data sent by other remote resources. The entire mGSV package can also be downloaded for easy local installation.mGSV significantly enhances the original functionalities of GSV. A web server hosting mGSV is provided at http://cas-bioinfo.cas.unt.edu/mgsv.
Project description:Rapid identification of non-human sequences (RINS) is an intersection-based pathogen detection workflow that utilizes a user-provided custom reference genome set for identification of non-human sequences in deep sequencing datasets. In <2 h, RINS correctly identified the known virus in the dataset SRR73726 and is compatible with any computer capable of running the prerequisite alignment and assembly programs. RINS accurately identifies sequencing reads from intact or mutated non-human genomes in a dataset and robustly generates contigs with these non-human sequences (Supplementary Material).RINS is available for free download at http://khavarilab.stanford.edu/resources.html.
Project description:BACKGROUND: Fragaria belongs to the Rosaceae, an economically important family that includes a number of important fruit producing genera such as Malus and Prunus. Using genomic sequences from 50 Fragaria fosmids, we have examined the microsynteny between Fragaria and other plant models. RESULTS: In more than half of the strawberry fosmids, we found syntenic regions that are conserved in Populus, Vitis, Medicago and/or Arabidopsis with Populus containing the greatest number of syntenic regions with Fragaria. The longest syntenic region was between LG VIII of the poplar genome and the strawberry fosmid 72E18, where seven out of twelve predicted genes were collinear. We also observed an unexpectedly high level of conserved synteny between Fragaria (rosid I) and Vitis (basal rosid). One of the strawberry fosmids, 34E24, contained a cluster of R gene analogs (RGAs) with NBS and LRR domains. We detected clusters of RGAs with high sequence similarity to those in 34E24 in all the genomes compared. In the phylogenetic tree we have generated, all the NBS-LRR genes grouped together with Arabidopsis CNL-A type NBS-LRR genes. The Fragaria RGA grouped together with those of Vitis and Populus in the phylogenetic tree. CONCLUSIONS: Our analysis shows considerable microsynteny between Fragaria and other plant genomes such as Populus, Medicago, Vitis, and Arabidopsis to a lesser degree. We also detected a cluster of NBS-LRR type genes that are conserved in all the genomes compared.
Project description:BACKGROUND: It has been repeatedly observed that gene order is rapidly lost in prokaryotic genomes. However, persistent synteny blocks are found when comparing more or less distant species. These genes that remain consistently adjacent are appealing candidates for the study of genome evolution and a more accurate definition of their functional role. Such studies require visualizing conserved synteny blocks in a large number of genomes at all taxonomic distances. RESULTS: After comparing nearly 600 completely sequenced genomes encompassing the whole prokaryotic tree of life, the computed synteny data were assembled in a relational database, SynteBase. SynteView was designed to visualize conserved synteny blocks in a large number of genomes after choosing one of them as a reference. SynteView functions with data stored either in SynteBase or in a home-made relational database of personal data. In addition, this software can compute on-the-fly and display the distribution of synteny blocks which are conserved in pairs of genomes. This tool has been designed to provide a wealth of information on each positional orthologous gene, to be user-friendly and customizable. It is also possible to download sequences of genes belonging to these synteny blocks for further studies. SynteView is accessible through Java Webstart at http://www.synteview.u-psud.fr. CONCLUSION: SynteBase answers queries about gene order conservation and SynteView visualizes the obtained results in a flexible and powerful way which provides a comparative overview of the conserved synteny in a large number of genomes, whatever their taxonomic distances.
Project description:The reconstruction of the complete genome sequence of an organism is an important point for comparative, functional and evolutionary genomics. Nevertheless, overcoming the problems encountered while completing the sequence of an entire genome can still be demanding in terms of time and resources. We have developed Enly, a simple tool based on the iterative mapping of sequence reads at contig edges, capable to extend the genomic contigs deriving from high-throughput sequencing, especially those deriving by Newbler-like assemblies. Testing it on a set of de novo draft genomes led to the closure of up to 20% of the gaps originally present. Enly is cross-platform and most of the steps of its pipeline are parallelizable, making easy and fast to improve a draft genome resulting from a de novo assembly.