Project description:We present a draft genome assembly that includes 200 Gb of Illumina reads, 4 Gb of Moleculo synthetic long-reads and 108 Gb of Chicago libraries, with a final size matching the estimated genome size of 2.7 Gb, and a scaffold N50 of 4.8 Mb. We also present an alternative assembly including 27 Gb raw reads generated using the Pacific Biosciences platform. In addition, we sequenced the proteome of the same individual and RNA from three different tissue types from three other species of squid species (Onychoteuthis banksii, Dosidicus gigas, and Sthenoteuthis oualaniensis) to assist genome annotation. We annotated 33,406 protein coding genes supported by evidence and the genome completeness estimated by BUSCO reached 92%. Repetitive regions cover 49.17% of the genome.
Project description:Insect derived cell-lines, from Spodoptera frugiperda (Sf21) and from Trichoplusia ni (High Five), are ones of the most widely used systems for recombinant protein expression in Baculoviral Expression Vector System (BEVS). Genomic sequences and annotations are still incomplete for Sf21 or absent for High Five. In this study we present an approach using different sequencing data types with short-read sequencing, long synthetic and Oxford Nanopore reads to build genomes at an unprecedented resolution. The Sf21 and High Five assemblies contain 4,020 scaffolds of size, 463 Mb with N50 of 364 Kb and 2,954 scaffolds of size, 332 Mb with N50 of 326 Kb, respectively. Furthermore, we build a new gene prediction workflow, which integrates transcriptome proteome information using pre-existing tools. Using this approach, we could predict 21,506 Sf21 genes and 14,159 High Five genes, which were then functionally annotated. Finally, we also generate and integrate proteomic datasets to validate predicted genes. This integrative approach could be theoretically applied to any uncharacterized genome and result in valuable new resources. With this information available, Sf21 and High Five cells will become even better tools for protein expression and could be used in a wider range of applications, from promoter identifications to genome engineering and editing.
Project description:Macaque species share over 93% genome homology with humans and develop many disease phenotypes similar to those of humans, making them valuable animal models for the study of human diseases (e.g.,HIV and neurodegenerative diseases). However, the quality of genome assembly and annotation for several macaque species lags behind the human genome effort. To close this gap and enhance functional genomics approaches, we employed a combination of de novo linked-read assembly and scaffolding using proximity ligation assay (HiC) to assemble the pig-tailed macaque (Macaca nemestrina) genome. This combinatorial method yielded large scaffolds at chromosome-level with a scaffold N50 of 127.5 Mb; the 23 largest scaffolds covered 90% of the entire genome. This assembly revealed large-scale rearrangements between pig-tailed macaque chromosomes 7, 12, and 13 and human chromosomes 2, 14, and 15. We subsequently annotated the genome using transcriptome and proteomics data from personalized induced pluripotent stem cells (iPSCs) derived from the same animal. Reconstruction of the evolutionary tree using whole genome annotation and orthologous comparisons among three macaque species, human and mouse genomes revealed extensive homology between human and pig-tailed macaques with regards to both pluripotent stem cell genes and innate immune gene pathways. Our results confirm that rhesus and cynomolgus macaques exhibit a closer evolutionary distance to each other than either species exhibits to humans or pig-tailed macaques. These findings demonstrate that pig-tailed macaques can serve as an excellent animal model for the study of many human diseases particularly with regards to pluripotency and innate immune pathways.
Project description:We used an approach combining PacBio data and published Illumina reads to de novo assemble D. busckii contigs. We generated Hi-C data from D. busckii embryos to order these contigs into chromosome-length scaffolds. For D. virilis we generated Hi-C data to order and orient the published Dvir_caf1 scaffolds into chromosome-length assemblies. Furthermore, we compared Hi-C matrices from these two new assemblies with D. melanogaster with respect to synteny blocks and dosage compensation as a chromosome-wide gene-regulatory mechanism.
Project description:We used an approach combining PacBio data and published Illumina reads to de novo assemble D. busckii contigs. We generated Hi-C data from D. busckii embryos to order these contigs into chromosome-length scaffolds. For D. virilis we generated Hi-C data to order and orient the published Dvir_caf1 scaffolds into chromosome-length assemblies. Furthermore, we compared Hi-C matrices from these two new assemblies with D. melanogaster with respect to synteny blocks and dosage compensation as a chromosome-wide gene-regulatory mechanism.
Project description:The domestic goat, Capra hircus (2n=60), is one of the most important domestic livestock species in the world. Here we report its high quality reference genome generated by combining Illumina short reads sequencing and a new automated and high throughput whole genome mapping system based on the optical mapping technology which was used to generate extremely long super-scaffolds. The N50 size of contigs, scaffolds, and super-scaffolds for the sequence assembly reported herein are 18.7 kb, 3.06 Mb, and 18.2 Mb, respectively. Almost 95% of the supper-scaffolds are anchored on chromosomes based on conserved syntenic information with cattle. The assembly is strongly supported by the RH map of goat chromosome 1. We annotated 22,175 protein-coding genes, most of which are recovered by RNA-seq data of ten tissues. Rapidly evolving genes and gene families are enriched in metabolism and immune systems, consistent with the fact that the goat is one of the most adaptable and geographically widespread livestock species. Comparative transcriptomic analysis of the primary and secondary follicles of a cashmere goat revealed 51 genes that were significantly differentially expressed between the two types of hair follicles. This study not only provides a high quality reference genome for an important livestock species, but also shows that the new automated optical mapping technology can be used in a de novo assembly of large genomes. Corresponding whole genome sequencing is available in NCBI BioProject PRJNA158393. We have sequenced a 3-year-old female Yunnan black goat and constructed a reference sequence for this breed. In order to improve quality of gene models, RNA samples of ten tissues (Bladder, Brain, Heart, Kidney, Liver, Lung, Lymph, Muscle, Ovarian, Spleen) were extracted from the same goat which was sequenced. To investigate the genic basis underlying the development of cashmere fibers using the goat reference genome assembly and annotated genes, we extracted RNA samples of primary hair follicle and secondary hair follicle from three Inner Mongolia cashmere goats and conducted transcriptome sequencing and DGE analysis. This submission represents RNA-Seq component of study.
Project description:Singapore grouper iridovirus (SGIV), one of the nucleocytoviricota viruses (NCVs), is a highly pathogenic iridovirid. SGIV infection results in massive economic losses to the aquaculture industry and significantly threatens global biodiversity. In recent years, high morbidity and mortality in aquatic animals have been caused by iridovirid infections worldwide. Effective control and prevention strategies are urgently needed. Here, we present a near-atomic architecture of the SGIV capsid and identify eight types of capsid proteins. The viral inner membrane-integrated anchor protein colocalizes with the endoplasmic reticulum (ER), supporting the hypothesis that the biogenesis of the inner membrane is associated with the ER. Additionally, immunofluorescence assays indicate minor capsid proteins (mCPs) could form various building blocks with major capsid proteins (MCPs) before the formation of a viral factory (VF). These results expand our understanding of the capsid assembly of NCVs and provide more targets for vaccine and drug design to fight iridovirid infections.
Project description:The ratmouth barbel (Ptychidio jordani) is a critically endangered freshwater fish from the Cyprinidae family, primarily due to overfishing and habitat disruption. To address the challenges of its shrinking wild populations and the difficulties in artificial reproduction, we sequenced, assembled, and annotated a high-quality chromosome-level genome of P. jordani using next-generation short-read sequencing, third-generation long-read sequencing, and Hi-C sequencing. The final genome assembly was 1.14 Gb, consisting of 25 chromosomes with a contig N50 of 25.14 Mb and a scaffold N50 of 42.91 Mb. We identified 25,183 protein-coding genes, 751.75 Mb of repeats, and 19,373 ncRNAs. Methylation loci on most chromosomes ranged from 1,000 to 3,000 per 100 kb window. Gene expression levels across various tissues were analyzed, revealing 12,135 (caudal fin), 11,465 (liver), 14,438 (gill), 12,413 (heart), 8,301 (spleen), and 3,578 (kidney) differentially expressed genes compared to muscle. The comprehensive genomic and transcriptomic resources generated here will aid in understanding the ecology, adaptation, and environmental responses of P. jordani, supporting future research and conservation efforts.
Project description:The ratmouth barbel (Ptychidio jordani) is a critically endangered freshwater fish from the Cyprinidae family, primarily due to overfishing and habitat disruption. To address the challenges of its shrinking wild populations and the difficulties in artificial reproduction, we sequenced, assembled, and annotated a high-quality chromosome-level genome of P. jordani using next-generation short-read sequencing, third-generation long-read sequencing, and Hi-C sequencing. The final genome assembly was 1.14 Gb, consisting of 25 chromosomes with a contig N50 of 25.14 Mb and a scaffold N50 of 42.91 Mb. We identified 25,183 protein-coding genes, 751.75 Mb of repeats, and 19,373 ncRNAs. Methylation loci on most chromosomes ranged from 1,000 to 3,000 per 100 kb window. Gene expression levels across various tissues were analyzed, revealing 12,135 (caudal fin), 11,465 (liver), 14,438 (gill), 12,413 (heart), 8,301 (spleen), and 3,578 (kidney) differentially expressed genes compared to muscle. The comprehensive genomic and transcriptomic resources generated here will aid in understanding the ecology, adaptation, and environmental responses of P. jordani, supporting future research and conservation efforts.