Unknown

Dataset Information

0

Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy.


ABSTRACT: Improvements in genome sequencing and assembly are enabling high-quality reference genomes for all species. However, the assembly process is still laborious, computationally and technically demanding, lacks standards for reproducibility, and is not readily scalable. Here we present the latest Vertebrate Genomes Project assembly pipeline and demonstrate that it delivers high-quality reference genomes at scale across a set of vertebrate species arising over the last ~500 million years. The pipeline is versatile and combines PacBio HiFi long-reads and Hi-C-based haplotype phasing in a new graph-based paradigm. Standardized quality control is performed automatically to troubleshoot assembly issues and assess biological complexities. We make the pipeline freely accessible through Galaxy, accommodating researchers even without local computational resources and enhanced reproducibility by democratizing the training and assembly process. We demonstrate the flexibility and reliability of the pipeline by assembling reference genomes for 51 vertebrate species from major taxonomic groups (fish, amphibians, reptiles, birds, and mammals).

SUBMITTER: Lariviere D 

PROVIDER: S-EPMC10327048 | biostudies-literature | 2023 Jun

REPOSITORIES: biostudies-literature

altmetric image

Publications

Scalable, accessible, and reproducible reference genome assembly and evaluation in Galaxy.

Larivière Delphine D   Abueg Linelle L   Brajuka Nadolina N   Gallardo-Alba Cristóbal C   Grüning Bjorn B   Ko Byung June BJ   Ostrovsky Alex A   Palmada-Flores Marc M   Pickett Brandon D BD   Rabbani Keon K   Balacco Jennifer R JR   Chaisson Mark M   Cheng Haoyu H   Collins Joanna J   Denisova Alexandra A   Fedrigo Olivier O   Gallo Guido Roberto GR   Giani Alice Maria AM   Gooder Grenville MacDonald GM   Jain Nivesh N   Johnson Cassidy C   Kim Heebal H   Lee Chul C   Marques-Bonet Tomas T   O'Toole Brian B   Rhie Arang A   Secomandi Simona S   Sozzoni Marcella M   Tilley Tatiana T   Uliano-Silva Marcela M   van den Beek Marius M   Waterhouse Robert M RM   Phillippy Adam M AM   Jarvis Erich D ED   Schatz Michael C MC   Nekrutenko Anton A   Formenti Giulio G  

bioRxiv : the preprint server for biology 20230630


Improvements in genome sequencing and assembly are enabling high-quality reference genomes for all species. However, the assembly process is still laborious, computationally and technically demanding, lacks standards for reproducibility, and is not readily scalable. Here we present the latest Vertebrate Genomes Project assembly pipeline and demonstrate that it delivers high-quality reference genomes at scale across a set of vertebrate species arising over the last ~500 million years. The pipelin  ...[more]

Similar Datasets

| S-EPMC11462542 | biostudies-literature
| S-EPMC8213174 | biostudies-literature
| S-EPMC6901077 | biostudies-literature
| S-EPMC6030816 | biostudies-literature
| S-EPMC7319590 | biostudies-literature
| S-EPMC8178898 | biostudies-literature
| S-EPMC4410580 | biostudies-literature
| S-EPMC3878063 | biostudies-literature
| S-EPMC7488338 | biostudies-literature
| S-EPMC3753567 | biostudies-literature