Generation of genome-scale gene-associated SNPs in catfish for the construction of a high-density SNP array.
ABSTRACT: BACKGROUND: Single nucleotide polymorphisms (SNPs) have become the marker of choice for genome-wide association studies. In order to provide the best genome coverage for the analysis of performance and production traits, a large number of relatively evenly distributed SNPs are needed. Gene-associated SNPs may fulfill these requirements of large numbers and genome wide distribution. In addition, gene-associated SNPs could themselves be causative SNPs for traits. The objective of this project was to identify large numbers of gene-associated SNPs using high-throughput next generation sequencing. RESULTS: Transcriptome sequencing was conducted for channel catfish and blue catfish using Illumina next generation sequencing technology. Approximately 220 million reads (15.6 Gb) for channel catfish and 280 million reads (19.6 Gb) for blue catfish were obtained by sequencing gene transcripts derived from various tissues of multiple individuals from a diverse genetic background. A total of over 35 billion base pairs of expressed short read sequences were generated. Over two million putative SNPs were identified from channel catfish and almost 2.5 million putative SNPs were identified from blue catfish. Of these putative SNPs, a set of filtered SNPs were identified including 342,104 intra-specific SNPs for channel catfish, 366,269 intra-specific SNPs for blue catfish, and 420,727 inter-specific SNPs between channel catfish and blue catfish. These filtered SNPs are distributed within 16,562 unique genes in channel catfish and 17,423 unique genes in blue catfish. CONCLUSIONS: For aquaculture species, transcriptome analysis of pooled RNA samples from multiple individuals using Illumina sequencing technology is both technically efficient and cost-effective for generating expressed sequences. Such an approach is most effective when coupled to existing EST resources generated using traditional sequencing approaches because the reference ESTs facilitate effective assembly of the expressed short reads. When multiple individuals with different genetic backgrounds are used, RNA-Seq is very effective for the identification of SNPs. The SNPs identified in this report will provide a much needed resource for genetic studies in catfish and will contribute to the development of a high-density SNP array. Validation and testing of these SNPs using SNP arrays will form the material basis for genome association studies and whole genome-based selection in catfish.
Project description:Background:The yellow catfish, Pelteobagrus fulvidraco, belonging to the Siluriformes order, is an economically important freshwater aquaculture fish species in Asia, especially in Southern China. The aquaculture industry has recently been facing tremendous challenges in germplasm degeneration and poor disease resistance. As the yellow catfish exhibits notable sex dimorphism in growth, with adult males about two- to three-fold bigger than females, the way in which the aquaculture industry takes advantage of such sex dimorphism is another challenge. To address these issues, a high-quality reference genome of the yellow catfish would be a very useful resource. Findings:To construct a high-quality reference genome for the yellow catfish, we generated 51.2 Gb short reads and 38.9 Gb long reads using Illumina and Pacific Biosciences (PacBio) sequencing platforms, respectively. The sequencing data were assembled into a 732.8 Mb genome assembly with a contig N50 length of 1.1 Mb. Additionally, we applied Hi-C technology to identify contacts among contigs, which were then used to assemble contigs into scaffolds, resulting in a genome assembly with 26 chromosomes and a scaffold N50 length of 25.8 Mb. Using 24,552 protein-coding genes annotated in the yellow catfish genome, the phylogenetic relationships of the yellow catfish with other teleosts showed that yellow catfish separated from the common ancestor of channel catfish ?81.9 million years ago. We identified 1,717 gene families to be expanded in the yellow catfish, and those gene families are mainly enriched in the immune system, signal transduction, glycosphingolipid biosynthesis, and fatty acid biosynthesis. Conclusions:Taking advantage of Illumina, PacBio, and Hi-C technologies, we constructed the first high-quality chromosome-level genome assembly for the yellow catfish P. fulvidraco. The genomic resources generated in this work not only offer a valuable reference genome for functional genomics studies of yellow catfish to decipher the economic traits and sex determination but also provide important chromosome information for genome comparisons in the wider evolutionary research community.
Project description:Domestication and selection for important performance traits can impact the genome, which is most often reflected by reduced heterozygosity in and surrounding genes related to traits affected by selection. In this study, analysis of the genomic impact caused by domestication and artificial selection was conducted by investigating the signatures of selection using single nucleotide polymorphisms (SNPs) in channel catfish (Ictalurus punctatus). A total of 8.4 million candidate SNPs were identified by using next generation sequencing. On average, the channel catfish genome harbors one SNP per 116 bp. Approximately 6.6 million, 5.3 million, 4.9 million, 7.1 million and 6.7 million SNPs were detected in the Marion, Thompson, USDA103, Hatchery strain, and wild population, respectively. The allele frequencies of 407,861 SNPs differed significantly between the domestic and wild populations. With these SNPs, 23 genomic regions with putative selective sweeps were identified that included 11 genes. Although the function for the majority of the genes remain unknown in catfish, several genes with known function related to aquaculture performance traits were included in the regions with selective sweeps. These included hypoxia-inducible factor 1?. HIF??.. and the transporter gene ATP-binding cassette sub-family B member 5 (ABCB5). HIF1?. is important for response to hypoxia and tolerance to low oxygen levels is a critical aquaculture trait. The large numbers of SNPs identified from this study are valuable for the development of high-density SNP arrays for genetic and genomic studies of performance traits in catfish.
Project description:BACKGROUND: EST sequencing is one of the most efficient means for gene discovery and molecular marker development, and can be additionally utilized in both comparative genome analysis and evaluation of gene duplications. While much progress has been made in catfish genomics, large-scale EST resources have been lacking. The objectives of this project were to construct primary cDNA libraries, to conduct initial EST sequencing to generate catfish EST resources, and to obtain baseline information about highly expressed genes in various catfish organs to provide a guide for the production of normalized and subtracted cDNA libraries for large-scale transcriptome analysis in catfish. RESULTS: A total of 17 cDNA libraries were constructed including 12 from channel catfish (Ictalurus punctatus) and 5 from blue catfish (I. furcatus). A total of 31,215 ESTs, with average length of 778 bp, were generated including 20,451 from the channel catfish and 10,764 from blue catfish. Cluster analysis indicated that 73% of channel catfish and 67% of blue catfish ESTs were unique within the project. Over 53% and 50% of the channel catfish and blue catfish ESTs, respectively, had significant similarities to known genes. All ESTs have been deposited in GenBank. Evaluation of the catfish EST resources demonstrated their potential for molecular marker development, comparative genome analysis, and evaluation of ancient and recent gene duplications. Subtraction of abundantly expressed genes in a variety of catfish tissues, identified here, will allow the production of low-redundancy libraries for in-depth sequencing. CONCLUSION: The sequencing of 31,215 ESTs from channel catfish and blue catfish has significantly increased the EST resources in catfish. The EST resources should provide the potential for microarray development, polymorphic marker identification, mapping, and comparative genome analysis.
Project description:BACKGROUND: Through the Community Sequencing Program, a catfish EST sequencing project was carried out through a collaboration between the catfish research community and the Department of Energy's Joint Genome Institute. Prior to this project, only a limited EST resource from catfish was available for the purpose of SNP identification. RESULTS: A total of 438,321 quality ESTs were generated from 8 channel catfish (Ictalurus punctatus) and 4 blue catfish (Ictalurus furcatus) libraries, bringing the number of catfish ESTs to nearly 500,000. Assembly of all catfish ESTs resulted in 45,306 contigs and 66,272 singletons. Over 35% of the unique sequences had significant similarities to known genes, allowing the identification of 14,776 unique genes in catfish. Over 300,000 putative SNPs have been identified, of which approximately 48,000 are high-quality SNPs identified from contigs with at least four sequences and the minor allele presence of at least two sequences in the contig. The EST resource should be valuable for identification of microsatellites, genome annotation, large-scale expression analysis, and comparative genome analysis. CONCLUSIONS: This project generated a large EST resource for catfish that captured the majority of the catfish transcriptome. The parallel analysis of ESTs from two closely related Ictalurid catfishes should also provide powerful means for the evaluation of ancient and recent gene duplications, and for the development of high-density microarrays in catfish. The inter- and intra-specific SNPs identified from all catfish EST dataset assembly will greatly benefit the catfish introgression breeding program and whole genome association studies.
Project description:BACKGROUND:Genome annotation projects, gene functional studies, and phylogenetic analyses for a given organism all greatly benefit from access to a validated full-length cDNA resource. While increasingly common in model species, full-length cDNA resources in aquaculture species are scarce. METHODOLOGY AND PRINCIPAL FINDINGS:Through in silico analysis of catfish (Ictalurus spp.) ESTs, a total of 10,037 channel catfish and 7,382 blue catfish cDNA clones were identified as potentially encoding full-length cDNAs. Of this set, a total of 1,169 channel catfish and 933 blue catfish full-length cDNA clones were selected for re-sequencing to provide additional coverage and ensure sequence accuracy. A total of 1,745 unique gene transcripts were identified from the full-length cDNA set, including 1,064 gene transcripts from channel catfish and 681 gene transcripts from blue catfish, with 416 transcripts shared between the two closely related species. Full-length sequence characteristics (ortholog conservation, UTR length, Kozak sequence, and conserved motifs) of the channel and blue catfish were examined in detail. Comparison of gene ontology composition between full-length cDNAs and all catfish ESTs revealed that the full-length cDNA set is representative of the gene diversity encoded in the catfish transcriptome. CONCLUSIONS:This study describes the first catfish full-length cDNA set constructed from several cDNA libraries. The catfish full-length cDNA sequences, and data gleaned from sequence characteristics analysis, will be a valuable resource for ongoing catfish whole-genome sequencing and future gene-based studies of function and evolution in teleost fishes.
Project description:BACKGROUND: Upon the completion of whole genome sequencing, thorough genome annotation that associates genome sequences with biological meanings is essential. Genome annotation depends on the availability of transcript information as well as orthology information. In teleost fish, genome annotation is seriously hindered by genome duplication. Because of gene duplications, one cannot establish orthologies simply by homology comparisons. Rather intense phylogenetic analysis or structural analysis of orthologies is required for the identification of genes. To conduct phylogenetic analysis and orthology analysis, full-length transcripts are essential. Generation of large numbers of full-length transcripts using traditional transcript sequencing is very difficult and extremely costly. RESULTS: In this work, we took advantage of a doubled haploid catfish, which has two sets of identical chromosomes and in theory there should be no allelic variations. As such, transcript sequences generated from next-generation sequencing can be favorably assembled into full-length transcripts. Deep sequencing of the doubled haploid channel catfish transcriptome was performed using Illumina HiSeq 2000 platform, yielding over 300 million high-quality trimmed reads totaling 27 Gbp. Assembly of these reads generated 370,798 non-redundant transcript-derived contigs. Functional annotation of the assembly allowed identification of 25,144 unique protein-encoding genes. A total of 2,659 unique genes were identified as putative duplicated genes in the catfish genome because the assembly of the corresponding transcripts harbored PSVs or MSVs (in the form of pseudo-SNPs in the assembly). Of the 25,144 contigs with unique protein hits, around 20,000 contigs matched 50% length of reference proteins, and over 14,000 transcripts were identified as full-length with complete open reading frames. The characterization of consensus sequences surrounding start codon and the stop codon confirmed the correct assembly of the full-length transcripts. CONCLUSIONS: The large set of transcripts assembled in this study is the most comprehensive set of genome resources ever developed from catfish, which will provide the much needed resources for functional genome research in catfish, serving as a reference transcriptome for genome annotation, analysis of gene duplication, gene family structures, and digital gene expression analysis. The putative set of duplicated genes provide a starting point for genome scale analysis of gene duplication in the catfish genome, and should be a valuable resource for comparative genome analysis, genome evolution, and genome function studies.
Project description:As one economically important fish in the southeastern Himalayas, the giant devil catfish (Bagarius yarrelli) has been known for its extraordinarily large body size. It can grow up to 2 m, whereas the non-Bagarius sisorids only reach 10-30 cm. Another outstanding characteristic of Bagarius species is the salmonids-like reddish flesh color. Both body size and flesh color are interesting questions in science and also valuable features in aquaculture that worth of deep investigations. Bagarius species therefore are ideal materials for studying body size evolution and color depositions in fish muscles, and also potential organisms for extensive utilization in Asian freshwater aquaculture. In a combination of Illumina and PacBio sequencing technologies, we de novo assembled a 571-Mb genome for the giant devil catfish from a total of 153.4-Gb clean reads. The scaffold and contig N50 values are 3.1 and 1.6 Mb, respectively. This genome assembly was evaluated with 93.4% of Benchmarking Universal Single-Copy Orthologs completeness, 98% of transcripts coverage, and highly homologous with a chromosome-level-based genome of channel catfish (Ictalurus punctatus). We detected that 35.26% of the genome assembly is composed of repetitive elements. Employing homology, de novo, and transcriptome-based annotations, we annotated a total of 19,027 protein-coding genes for further use. In summary, we generated the first high-quality genome assembly of the giant devil catfish, which provides an important genomic resource for its future studies such as the body size and flesh color issues, and also for facilitating the conservation and utilization of this valuable catfish.
Project description:This study examined differentially expressed (DE) gene transcripts and regulated pathways of two geographically distinct channel catfish (Ictalurus punctatus) strains and one hybrid catfish (I. punctatus x [blue catfish] I. furcatus) strain to test whether one particular catfish type handled thermal stress better. Following a six-week growth experiment, where fish were subjected to daily cycling temperatures of either 27-31°C or 32-36°C, mimicking pond fluctuations. We sequenced 18 cDNA libraries of liver samples to obtain 61 million reads per library. There were 5,443 DE transcripts and 41,689 regulated pathways. Northern channel catfish had the highest amount of DE transcripts (48.6%), 5 times that of southern channel catfish, and the greatest amount of transcripts with fold changes ≥ 2. The overall amount of temperature-induced DE transcripts between southern hybrid and southern channel catfish was fairly comparable in relation to that of northern channel catfish, however, there were more transcripts up- or downregulated with ≥ 2 fold changes in channel catfish strains compared to the southern hybrid catfish. Results from this study strongly suggest genetic differences between geographic catfish types affect physiological responses to thermal stress. Furthermore, a number of genes were linked to thermal stress tolerance, which may be beneficial for understanding geographic differences in thermal stress tolerance in ectotherms and for strain development of catfish. Hepatic mRNA profiles of three fingerling catfish types following a six week growth experiment of daily cycling temperatures of either 27-31°C or 32-36°C, mimicking pond fluctuations.
Project description:Channel catfish (Ictalurus spp.) is an economically important species in freshwater aquaculture around the world and occupies a prominent position in the aquaculture industry of the United States. MicroRNAs (miRNAs) play important roles in the regulation of almost every biological process in eukaryotes; however, there is little information available concerning miRNAs in channel catfish. In this study, a small-RNA cDNA library was constructed from 10 tissues of channel catfish, and Solexa sequencing technology was used to perform high-throughput sequencing of the library. A total of 14,919,026 raw reads, representing 161,288 unique sequences, were obtained from the small-cDNA library. After comparing the small RNA sequences with the RFam database, 4,542,396 reads that represent 25,538 unique sequences were mapped to the genome sequence of zebrafish to perform distribution analysis and to screen for candidate miRNA genes. Subsequent bioinformatic analysis identified 237 conserved miRNAs and 45 novel miRNAs in the channel catfish. Stem-loop RT-PCR was applied to validate and profile the expression of the novel miRNAs in 10 tissues. Some novel miRNAs, such as ipu-miR-129b, ipu-miR-7562 and ipu-miR-7553, were expressed in all tissues examined. However, some novel miRNAs appear to be tissue specific. Ipu-miR-7575 is predominantly expressed in stomach. Ipu-miR-7147 and ipu-miR-203c are highly expressed in heart, but are relatively weakly expressed in other tissues. Based on sequence complementarity between miRNAs and mRNA targets, potential target sequences for the 45 novel miRNAs were identified by searching for antisense hits in the reference RNA sequences of the channel catfish. These potential target sequences are involved in immune regulation, transcriptional regulation, metabolism and many other biological functions. The discovery of miRNAs in the channel catfish genome by this study contributes to a better understanding of the role miRNAs play in regulating diverse biological processes in fish and vertebrates.
Project description:Tra catfish (<i>Pangasianodon hypophthalmus</i>), also known as striped catfish, is a facultative air-breather that uses its swim bladder as an air-breathing organ (ABO). A related species in the same order (Siluriformes), channel catfish (<i>Ictalurus punctatus</i>), does not possess an ABO and thus cannot breathe in the air. Tra and channel catfish serve as great comparative models for investigating possible genetic underpinnings of aquatic to land transitions, as well as for understanding genes that are crucial for the development of the swim bladder and the function of air-breathing in tra catfish. In this study, hypoxia challenge and microtomy experiments collectively revealed critical time points for the development of the air-breathing function and swim bladder in tra catfish. Seven developmental stages in tra catfish were selected for RNA-seq analysis based on their transition to a stage that could live at 0 ppm oxygen. More than 587 million sequencing clean reads were generated, and a total of 21,448 unique genes were detected. A comparative genomic analysis between channel catfish and tra catfish revealed 76 genes that were present in tra catfish, but absent from channel catfish. In order to further narrow down the list of these candidate genes, gene expression analysis was performed for these tra catfish-specific genes. Fourteen genes were inferred to be important for air-breathing. Of these, <i>HRG</i>, <i>GRP</i>, and <i>CX3CL1</i> were identified to be the most likely genes related to air-breathing ability in tra catfish. This study provides a foundational data resource for functional genomic studies in air-breathing function in tra catfish and sheds light on the adaptation of aquatic organisms to the terrestrial environment.