Towards the ictalurid catfish transcriptome: generation and analysis of 31,215 catfish ESTs.
ABSTRACT: BACKGROUND: EST sequencing is one of the most efficient means for gene discovery and molecular marker development, and can be additionally utilized in both comparative genome analysis and evaluation of gene duplications. While much progress has been made in catfish genomics, large-scale EST resources have been lacking. The objectives of this project were to construct primary cDNA libraries, to conduct initial EST sequencing to generate catfish EST resources, and to obtain baseline information about highly expressed genes in various catfish organs to provide a guide for the production of normalized and subtracted cDNA libraries for large-scale transcriptome analysis in catfish. RESULTS: A total of 17 cDNA libraries were constructed including 12 from channel catfish (Ictalurus punctatus) and 5 from blue catfish (I. furcatus). A total of 31,215 ESTs, with average length of 778 bp, were generated including 20,451 from the channel catfish and 10,764 from blue catfish. Cluster analysis indicated that 73% of channel catfish and 67% of blue catfish ESTs were unique within the project. Over 53% and 50% of the channel catfish and blue catfish ESTs, respectively, had significant similarities to known genes. All ESTs have been deposited in GenBank. Evaluation of the catfish EST resources demonstrated their potential for molecular marker development, comparative genome analysis, and evaluation of ancient and recent gene duplications. Subtraction of abundantly expressed genes in a variety of catfish tissues, identified here, will allow the production of low-redundancy libraries for in-depth sequencing. CONCLUSION: The sequencing of 31,215 ESTs from channel catfish and blue catfish has significantly increased the EST resources in catfish. The EST resources should provide the potential for microarray development, polymorphic marker identification, mapping, and comparative genome analysis.
Project description:BACKGROUND:Genome annotation projects, gene functional studies, and phylogenetic analyses for a given organism all greatly benefit from access to a validated full-length cDNA resource. While increasingly common in model species, full-length cDNA resources in aquaculture species are scarce. METHODOLOGY AND PRINCIPAL FINDINGS:Through in silico analysis of catfish (Ictalurus spp.) ESTs, a total of 10,037 channel catfish and 7,382 blue catfish cDNA clones were identified as potentially encoding full-length cDNAs. Of this set, a total of 1,169 channel catfish and 933 blue catfish full-length cDNA clones were selected for re-sequencing to provide additional coverage and ensure sequence accuracy. A total of 1,745 unique gene transcripts were identified from the full-length cDNA set, including 1,064 gene transcripts from channel catfish and 681 gene transcripts from blue catfish, with 416 transcripts shared between the two closely related species. Full-length sequence characteristics (ortholog conservation, UTR length, Kozak sequence, and conserved motifs) of the channel and blue catfish were examined in detail. Comparison of gene ontology composition between full-length cDNAs and all catfish ESTs revealed that the full-length cDNA set is representative of the gene diversity encoded in the catfish transcriptome. CONCLUSIONS:This study describes the first catfish full-length cDNA set constructed from several cDNA libraries. The catfish full-length cDNA sequences, and data gleaned from sequence characteristics analysis, will be a valuable resource for ongoing catfish whole-genome sequencing and future gene-based studies of function and evolution in teleost fishes.
Project description:BACKGROUND: Through the Community Sequencing Program, a catfish EST sequencing project was carried out through a collaboration between the catfish research community and the Department of Energy's Joint Genome Institute. Prior to this project, only a limited EST resource from catfish was available for the purpose of SNP identification. RESULTS: A total of 438,321 quality ESTs were generated from 8 channel catfish (Ictalurus punctatus) and 4 blue catfish (Ictalurus furcatus) libraries, bringing the number of catfish ESTs to nearly 500,000. Assembly of all catfish ESTs resulted in 45,306 contigs and 66,272 singletons. Over 35% of the unique sequences had significant similarities to known genes, allowing the identification of 14,776 unique genes in catfish. Over 300,000 putative SNPs have been identified, of which approximately 48,000 are high-quality SNPs identified from contigs with at least four sequences and the minor allele presence of at least two sequences in the contig. The EST resource should be valuable for identification of microsatellites, genome annotation, large-scale expression analysis, and comparative genome analysis. CONCLUSIONS: This project generated a large EST resource for catfish that captured the majority of the catfish transcriptome. The parallel analysis of ESTs from two closely related Ictalurid catfishes should also provide powerful means for the evaluation of ancient and recent gene duplications, and for the development of high-density microarrays in catfish. The inter- and intra-specific SNPs identified from all catfish EST dataset assembly will greatly benefit the catfish introgression breeding program and whole genome association studies.
Project description:BACKGROUND: Single nucleotide polymorphisms (SNPs) have become the marker of choice for genome-wide association studies. In order to provide the best genome coverage for the analysis of performance and production traits, a large number of relatively evenly distributed SNPs are needed. Gene-associated SNPs may fulfill these requirements of large numbers and genome wide distribution. In addition, gene-associated SNPs could themselves be causative SNPs for traits. The objective of this project was to identify large numbers of gene-associated SNPs using high-throughput next generation sequencing. RESULTS: Transcriptome sequencing was conducted for channel catfish and blue catfish using Illumina next generation sequencing technology. Approximately 220 million reads (15.6 Gb) for channel catfish and 280 million reads (19.6 Gb) for blue catfish were obtained by sequencing gene transcripts derived from various tissues of multiple individuals from a diverse genetic background. A total of over 35 billion base pairs of expressed short read sequences were generated. Over two million putative SNPs were identified from channel catfish and almost 2.5 million putative SNPs were identified from blue catfish. Of these putative SNPs, a set of filtered SNPs were identified including 342,104 intra-specific SNPs for channel catfish, 366,269 intra-specific SNPs for blue catfish, and 420,727 inter-specific SNPs between channel catfish and blue catfish. These filtered SNPs are distributed within 16,562 unique genes in channel catfish and 17,423 unique genes in blue catfish. CONCLUSIONS: For aquaculture species, transcriptome analysis of pooled RNA samples from multiple individuals using Illumina sequencing technology is both technically efficient and cost-effective for generating expressed sequences. Such an approach is most effective when coupled to existing EST resources generated using traditional sequencing approaches because the reference ESTs facilitate effective assembly of the expressed short reads. When multiple individuals with different genetic backgrounds are used, RNA-Seq is very effective for the identification of SNPs. The SNPs identified in this report will provide a much needed resource for genetic studies in catfish and will contribute to the development of a high-density SNP array. Validation and testing of these SNPs using SNP arrays will form the material basis for genome association studies and whole genome-based selection in catfish.
Project description:BACKGROUND: SNPs are abundant, codominantly inherited, and sequence-tagged markers. They are highly adaptable to large-scale automated genotyping, and therefore, are most suitable for association studies and applicable to comparative genome analysis. However, discovery of SNPs requires genome sequencing efforts through whole genome sequencing or deep sequencing of reduced representation libraries. Such genome resources are not yet available for many species including catfish. A large resource of ESTs is to become available in catfish allowing identification of large number of SNPs, but reliability of EST-derived SNPs are relatively low because of sequencing errors. This project was designed to answer some of the questions relevant to quality assessment of EST-derived SNPs. RESULTS: wo factors were found to be most significant for validation of EST-derived SNPs: the contig size (number of sequences in the contig) and the minor allele sequence frequency. The larger the contigs were, the greater the validation rate although the validation rate was reasonably high when the contigs contain four or more EST sequences with the minor allele sequence being represented at least twice in the contigs. Sequence quality surrounding the SNP under test is also crucially important. PCR extension appeared to be limited to a very short distance, prohibiting successful genotyping when an intron was present, a surprising finding. CONCLUSION: Stringent quality assessment measures should be used when working with EST-derived SNPs. In particular, contigs containing four or more ESTs should be used and the minor allele sequence should be represented at least twice. Genotyping primers should be designed from a single exon, completely avoiding introns. Application of such quality assessment measures, along with large resources of ESTs, should provide effective means for SNP identification in species where genome sequence resources are lacking.
Project description:BACKGROUND: Single-pass, partial sequencing of complementary DNA (cDNA) libraries generates thousands of chromatograms that are processed into high quality expressed sequence tags (ESTs), and then assembled into contigs representative of putative genes. Usually, to be of value, ESTs and contigs must be associated with meaningful annotations, and made available to end-users. RESULTS: A web application, Expressed Sequence Tag Information Management and Annotation (ESTIMA), has been created to meet the EST annotation and data management requirements of multiple high-throughput EST sequencing projects. It is anchored on individual ESTs and organized around different properties of ESTs including chromatograms, base-calling quality scores, structure of assembled transcripts, and multiple sources of comparison to infer functional annotation, Gene Ontology associations, and cDNA library information. ESTIMA consists of a relational database schema and a set of interactive query interfaces. These are integrated with a suite of web-based tools that allow a user to query and retrieve information. Further, query results are interconnected among the various EST properties. ESTIMA has several unique features. Users may run their own EST processing pipeline, search against preferred reference genomes, and use any clustering and assembly algorithm. The ESTIMA database schema is very flexible and accepts output from any EST processing and assembly pipeline. ESTIMA has been used for the management of EST projects of many species, including honeybee (Apis mellifera), cattle (Bos taurus), songbird (Taeniopygia guttata), corn rootworm (Diabrotica vergifera), catfish (Ictalurus punctatus, Ictalurus furcatus), and apple (Malus x domestica). The entire resource may be downloaded and used as is, or readily adapted to fit the unique needs of other cDNA sequencing projects. CONCLUSIONS: The scripts used to create the ESTIMA interface are freely available to academic users in an archived format from http://titan.biotec.uiuc.edu/ESTIMA/. The entity-relationship (E-R) diagrams and the programs used to generate the Oracle database tables are also available. We have also provided detailed installation instructions and a tutorial at the same website. Presently the chromatograms, EST databases and their annotations have been made available for cattle and honeybee brain EST projects. Non-academic users need to contact the W.M. Keck Center for Functional and Comparative Genomics, University of Illinois at Urbana-Champaign, Urbana, IL, for licensing information.
Project description:The zebrafish is a powerful system for understanding the vertebrate genome, allowing the combination of genetic, molecular, and embryological analysis. Expressed sequence tags (ESTs) provide a rapid means of identifying an organism's genes for further analysis, but any EST project is limited by the availability of suitable libraries. Such cDNA libraries must be of high quality and provide a high rate of gene discovery. However, commonly used normalization and subtraction procedures tend to select for shorter, truncated, and internally primed inserts, seriously affecting library quality. An alternative procedure is to use oligonucleotide fingerprinting (OFP) to precluster clones before EST sequencing, thereby reducing the re-sequencing of common transcripts. Here, we describe the use of OFP to normalize and subtract 75,000 clones from two cDNA libraries, to a minimal set of 25,102 clones. We generated 25,788 ESTs (11,380 3' and 14,408 5') from over 16,000 of these clones. Clustering of 10,654 high-quality 3' ESTs from this set identified 7232 clusters (likely genes), corresponding to a 68% gene diversity rate, comparable to what has been reported for the best normalized human cDNA libraries, and indicating that the complete set of 25,102 clones contains as many as 17,000 genes. Yet, the library quality remains high. The complete set of 25,102 clones is available for researchers as glycerol stocks, filters sets, and as individual EST clones. These resources have been used for radiation hybrid, genetic, and physical mapping of the zebrafish genome, as well as positional cloning and candidate gene identification, molecular marker, and microarray development.
Project description:The hybrid between female channel catfish (Ictalurus punctatus) and male blue catfish (Ictalurus furcatus) is superior in feed conversion, disease resistance, carcass yield, and harvestability compared to both parental species. However, heterosis and heterobeltiosis only occur in pond culture, and channel catfish grow much faster than the other genetic types in small culture units. This environment-dependent heterosis is intriguing, but the underlying genetic mechanisms are not well understood. In this study, phenotypic characterization and transcriptomic analyses were performed in the channel catfish, blue catfish, and their reciprocal F1s reared in tanks. The results showed that the channel catfish is superior in growth-related morphometrics, presumably due to significantly lower innate immune function, as investigated by reduced lysozyme activity and alternative complement activity. RNA-seq analysis revealed that genes involved in fatty acid metabolism/transport are significantly upregulated in channel catfish compared to blue catfish and hybrids, which also contributes to the growth phenotype. Interestingly, hybrids have a 40-80% elevation in blood glucose than the parental species, which can be explained by a phenomenon called transgressive expression (overexpression/underexpression in F1s than the parental species). A total of 1,140 transgressive genes were identified in F1 hybrids, indicating that 8.5% of the transcriptome displayed transgressive expression. Transgressive genes upregulated in F1s are enriched for glycan degradation function, directly related to the increase in blood glucose level. This study is the first to explore molecular mechanisms of environment-dependent hetero-sis/heterobeltiosis in a vertebrate species and sheds light on the regulation and evolution of heterosis vs. hybrid incompatibility. Overall design: Profile of transcriptome-wide gene expression levels in the liver in the channel catfish, the blue catfish, and their reciprocal F1 hybrids.
Project description:The microbiota of teleost fish has gained a great deal of research attention within the past decade, with experiments suggesting that both host-genetics and environment are strong ecological forces shaping the bacterial assemblages of fish microbiomes. Despite representing great commercial and scientific importance, the catfish within the family Ictaluridae, specifically the blue and channel catfish, have received very little research attention directed toward their gut-associated microbiota using 16S rRNA gene sequencing. Within this study we utilize multiple genetically distinct strains of blue and channel catfish, verified via microsatellite genotyping, to further quantify the role of host-genetics in shaping the bacterial communities in the fish gut, while maintaining environmental and husbandry parameters constant. Comparisons of the gut microbiota among the two catfish species showed no differences in bacterial species richness (observed and Chao1) or overall composition (weighted and unweighted UniFrac) and UniFrac distances showed no correlation with host genetic distances (Rst) according to Mantel tests. The microbiota of environmental samples (diet and water) were found to be significantly more diverse than that of the catfish gut associated samples, suggesting that factors within the host were further regulating the bacterial communities, despite the lack of a clear connection between microbiota composition and host genotype. The catfish gut communities were dominated by the phyla Fusobacteria, Proteobacteria, and Firmicutes; however, differential abundance analysis between the two catfish species using analysis of composition of microbiomes detected two differential genera, Cetobacterium and Clostridium XI. The metagenomic pathway features inferred from our dataset suggests the catfish gut bacterial communities possess pathways beneficial to their host such as those involved in nutrient metabolism and antimicrobial biosynthesis, while also containing pathways involved in virulence factors of pathogens. Testing of the inferred KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways by DESeq2 revealed minor difference in microbiota function, with only two metagenomic pathways detected as differentially abundant between the two catfish species. As the first study to characterize the gut microbiota of blue catfish, our study results have direct implications on future ictalurid catfish research. Additionally, our insight into the intrinsic factors driving microbiota structure has basic implications for the future study of fish gut microbiota.
Project description:BACKGROUND: Within the framework of a genomics project on livestock species (AGENAE), we initiated a high-throughput DNA sequencing program of Expressed Sequence Tags (ESTs) in rainbow trout, Oncorhynchus mykiss. RESULTS: We constructed three cDNA libraries including one highly complex pooled-tissue library. These libraries were normalized and subtracted to reduce clone redundancy. ESTs sequences were produced, and 96,472 ESTs corresponding to high quality sequence reads were released on the international database, currently representing 42.5% of the overall sequence knowledge in this species. All these EST sequences and other publicly available ESTs in rainbow trout have been included on a publicly available Website (SIGENAE) and have been clustered into a total of 52,930 clusters of putative transcripts groups, including 24,616 singletons. 57.1% of these 52,930 clusters are represented by at least one Agenae EST and 14,343 clusters (27.1%) are only composed by Agenae ESTs. Sequence analysis also reveals that normalization and especially subtraction were effective in decreasing redundancy, and that the pooled-tissue library was representative of the initial tissue complexity. CONCLUSION: Due to present work on the construction of rainbow trout normalized cDNA libraries and their extensive sequencing, along with other large scale sequencing programs, rainbow trout is now one of the major fish models in term of EST sequences available in a public database, just after Zebrafish, Danio rerio. This information is now used for the selection of a non redundant set of clones for producing DNA micro-arrays in order to examine global gene expression.
Project description:Channel catfish (Ictalurus spp.) is an economically important species in freshwater aquaculture around the world and occupies a prominent position in the aquaculture industry of the United States. MicroRNAs (miRNAs) play important roles in the regulation of almost every biological process in eukaryotes; however, there is little information available concerning miRNAs in channel catfish. In this study, a small-RNA cDNA library was constructed from 10 tissues of channel catfish, and Solexa sequencing technology was used to perform high-throughput sequencing of the library. A total of 14,919,026 raw reads, representing 161,288 unique sequences, were obtained from the small-cDNA library. After comparing the small RNA sequences with the RFam database, 4,542,396 reads that represent 25,538 unique sequences were mapped to the genome sequence of zebrafish to perform distribution analysis and to screen for candidate miRNA genes. Subsequent bioinformatic analysis identified 237 conserved miRNAs and 45 novel miRNAs in the channel catfish. Stem-loop RT-PCR was applied to validate and profile the expression of the novel miRNAs in 10 tissues. Some novel miRNAs, such as ipu-miR-129b, ipu-miR-7562 and ipu-miR-7553, were expressed in all tissues examined. However, some novel miRNAs appear to be tissue specific. Ipu-miR-7575 is predominantly expressed in stomach. Ipu-miR-7147 and ipu-miR-203c are highly expressed in heart, but are relatively weakly expressed in other tissues. Based on sequence complementarity between miRNAs and mRNA targets, potential target sequences for the 45 novel miRNAs were identified by searching for antisense hits in the reference RNA sequences of the channel catfish. These potential target sequences are involved in immune regulation, transcriptional regulation, metabolism and many other biological functions. The discovery of miRNAs in the channel catfish genome by this study contributes to a better understanding of the role miRNAs play in regulating diverse biological processes in fish and vertebrates.