Project description:More than 2x10E9 sequences made on Illumina platform derived from the genome of E14 embryonic stem cells cultured in our laboratory were used to build a database of about 2.7x10E6 single nucleotide variant. The database was validated using other two sequencing datasets from other laboratory and high overlap was observed. The identified variant are enriched on intergenic regions, but several thousands reside on gene exons and regulatory regions, such as promoters, enhancers, splicing site and untranslated regions of RNA, thus indicating high probability of an important functional impact on the molecular biology of this cells. We created a new E14 genome assembly including the new identified variants and used it to map reads from next generation sequencing data generated in our laboratory or in others on E14 cell line. We observed an increase in the number of mapped reads of about 5%. CpG dinucleotide showed the higher variation frequency, probably because of it could be target of DNA methylation. We performed a reduced representation bisulfite sequencing on E14 cell line to test our new genome assembly with respect to the mm9 genome reference. After mapping and methylation status calling, we obtained an increase of about 120,000 called CpG and we avoided about 20,000 wrong CpG calling. genotyping of E14 embryonic stem cells (ESCs) and Reduced representation Bisulfite Sequencing (RRBS) of E14 ESCs.
Project description:Porcine 60K BeadChip genotyping arrays (Illumina) are increasingly being applied in pig genomics to validate SNPs identified by re-sequencing or assembly-versus-assembly method. Here we report that more than 98% SNPs identified from the porcine 60K BeadChip genotyping array (Illumina) were consistent with the SNPs identified from the assembly-based method. This result demonstrates that whole-genome de novo assembly is a reliable approach to deriving accurate maps of SNPs. To compare SNPs identified by genotyping arrays and de novo assembly method, we genotyped 10 pig breeds by porcine 60K BeadChip genotyping array (Illumina), including 1 berkshire pig, 1 hampshire pig, 1 landrace pig, 1 large white pig,1 piétrain pig, 1 bamei pig,1 jinhua pig, 1 meishan pig, 1 rongchang pig and 1 Tibetan wild boar.
Project description:Using RNA sequencing and de novo transcript assembly, we identified 4516 lncRNAs expressed in 8 different stages of B cell development and activation. Chromatin immuno-precipitation sequencing was used to classify a substantial fraction (38%) of these lncRNAs as enhancer-associated or promoter-associated RNAs (eRNAs or pRNAs). A catalogue of lncRNAs expressed in eight murine B cell populations
Project description:Using an integrated systems approach, the expressed proteome of B. diazoefficiens strain 110scp4 was measured under i) normal, oxic growth, and ii) microoxic growth condtions. This included, as a first step, the sequencing and de novo assembly of the genome of this widely used rhizobial model strain, which turned out to harbor several deletions and insertions compared to the B. diazoefficiens USDA 110 NCBI reference genome. With this optimal basis in hand, a shotgun proteomics approach relying on a slightly adapated FASP protocol was carried out, allowing to identify 2900 (oxia) and 2826 (microoxia) proteins, respectively, thereby largely expanding the proteome known to be expressed under microoxic conditions.
Project description:PAPD5 is one of the seven members of non-canonical poly(A) polymerases in human cells. There are previous reports about polyadenylation dependent degradation of pre-ribosomal RNAs and uridylation dependent degradation of histone mRNAs in vivo. In this study, we observed polyadenylation but not polyuridylation activity of PAPD5 with in vitro assays. We aimed to get genome-wide targets of PAPD5 and used PAR-CLIP and deep sequencing for this purpose. Recombinant version of PAPD5 is expressed in HEK293 human cell lines and its genome wide targets are obtained with PAR-CLIP and deep sequencing as two replicate experiments. The short reads in the deep sequencing libraries of PAPD5 replicates and an unrelated protein to polymerization from a previous study, IGF2BP1, are aligned to the hg18 human genome assembly. The biological variance of the read counts in overlapping 100-nucleotide-long-windows is estimated between the PAPD5 replicates and further used in the differential expression estimations between the 100-nucleotide windows in PAPD5 replicates and IGF2BP1. The top differentially expressed windows in PAPD5 and IGF2BP1 are further annotated using gene and repeat tracks from UCSC.
Project description:Super-enhancers (SEs) are large clusters of transcriptional enhancers that are co-occupied by multiple lineage specific transcription factors driving expression of genes that define cell identity. In embryonic stem cells (ESCs), SEs are highly enriched for Oct4, Sox2, and Nanog in the enhanceosome assembly and express enhancer RNAs (eRNAs). We sought to dissect the molecular control mechanism of SE activity and eRNA transcription for pluripotency and reprogramming. Starting from a protein interaction network surrounding Sox2, a key pluripotency and reprogramming factor that guides the ESC-specific enhanceosome assembly and orchestrates the hierarchical transcriptional activation during the final stage of reprogramming, we discovered Tex10 as a novel pluripotency factor that is evolutionally conserved and functionally significant in ESC self-renewal, early embryo development, and reprogramming. Tex10 is enriched at SEs in a Sox2-dependent manner and coordinates histone acetylation and DNA demethylation of SEs. Our study sheds new light on epigenetic control of SE activity for cell fate determination. Genome binding/occupancy profiling of Tex10 was performed in mouse embryonic stem cells by ChIP sequencing.
Project description:Vongsangnak2008 - Genome-scale metabolic
network of Aspergillus oryzae (iWV1314)
This model is described in the article:
Improved annotation through
genome-scale metabolic modeling of Aspergillus oryzae.
Vongsangnak W, Olsen P, Hansen K,
Krogsgaard S, Nielsen J.
BMC Genomics 2008; 9: 245
BACKGROUND: Since ancient times the filamentous fungus
Aspergillus oryzae has been used in the fermentation industry
for the production of fermented sauces and the production of
industrial enzymes. Recently, the genome sequence of A. oryzae
with 12,074 annotated genes was released but the number of
hypothetical proteins accounted for more than 50% of the
annotated genes. Considering the industrial importance of this
fungus, it is therefore valuable to improve the annotation and
further integrate genomic information with biochemical and
physiological information available for this microorganism and
other related fungi. Here we proposed the gene prediction by
construction of an A. oryzae Expressed Sequence Tag (EST)
library, sequencing and assembly. We enhanced the function
assignment by our developed annotation strategy. The resulting
better annotation was used to reconstruct the metabolic network
leading to a genome scale metabolic model of A. oryzae.
RESULTS: Our assembled EST sequences we identified 1,046 newly
predicted genes in the A. oryzae genome. Furthermore, it was
possible to assign putative protein functions to 398 of the
newly predicted genes. Noteworthy, our annotation strategy
resulted in assignment of new putative functions to 1,469
hypothetical proteins already present in the A. oryzae genome
database. Using the substantially improved annotated genome we
reconstructed the metabolic network of A. oryzae. This network
contains 729 enzymes, 1,314 enzyme-encoding genes, 1,073
metabolites and 1,846 (1,053 unique) biochemical reactions. The
metabolic reactions are compartmentalized into the cytosol, the
mitochondria, the peroxisome and the extracellular space.
Transport steps between the compartments and the extracellular
space represent 281 reactions, of which 161 are unique. The
metabolic model was validated and shown to correctly describe
the phenotypic behavior of A. oryzae grown on different carbon
sources. CONCLUSION: A much enhanced annotation of the A.
oryzae genome was performed and a genome-scale metabolic model
of A. oryzae was reconstructed. The model accurately predicted
the growth and biomass yield on different carbon sources. The
model serves as an important resource for gaining further
insight into our understanding of A. oryzae physiology.
This model is hosted on
and identified by:
To cite BioModels Database, please use:
An enhanced, curated and annotated resource for published
quantitative kinetic models.
To the extent possible under law, all copyright and related or
neighbouring rights to this encoded model have been dedicated to
the public domain worldwide. Please refer to
Public Domain Dedication for more information.
Project description:We first report the use of next-generation massively parallel sequencing technologies and de novo transcriptome assembly to gain insight into the wide range of transcriptome of Hevea brasiliensis. The output of sequenced data showed that more than 12 million sequence reads with average length of 90nt were generated. Totally 48,768 unigenes (mean size = 488 bp) were assembled through transcriptome de novo assembly, which represent more than 3-fold of all the sequences of Hevea brasiliensis deposited in the GenBank. Assembled sequences were annotated with gene descriptions, gene ontology and clusters of orthologous group terms. Total 37,373 unigenes were successfully annotated and more than 10% of unigenes were aligned to known proteins of Euphorbiaceae. The unigenes contain nearly complete collection of known rubber-synthesis-related genes. Our data provides the most comprehensive sequence resource available for study rubber tree and demonstrates the availability of Illumina sequencing and de novo transcriptome assembly in a species lacking genome information. The transcriptome of latex and leaf in Hevea brasiliensis
Project description:Due to the large size, complex splicing and wide dynamic range of eukaryotic transcriptomes, RNA sequencing samples the majority of expressed genes infrequently, resulting in sparse sequencing coverage that can hinder robust isoform assembly and quantification. Targeted RNA sequencing addresses this challenge by using oligonucleotide probes to capture selected genes or regions of interest for focused sequencing. This enhanced sequencing coverage confers sensitive gene discovery, robust transcript assembly and accurate gene quantification. Here we describe a detailed protocol for all stages of targeted RNA sequencing, from initial probe design considerations, capture of targeted genes, to final assembly and quantification of captured transcripts. Initial probe design and final analysis can take less than a day, while the central experimental capture stage requires ~7 days. Targetted RNA sequencing of long noncoding RNAs
Project description:The domestic goat, Capra hircus (2n=60), is one of the most important domestic livestock species in the world. Here we report its high quality reference genome generated by combining Illumina short reads sequencing and a new automated and high throughput whole genome mapping system based on the optical mapping technology which was used to generate extremely long super-scaffolds. The N50 size of contigs, scaffolds, and super-scaffolds for the sequence assembly reported herein are 18.7 kb, 3.06 Mb, and 18.2 Mb, respectively. Almost 95% of the supper-scaffolds are anchored on chromosomes based on conserved syntenic information with cattle. The assembly is strongly supported by the RH map of goat chromosome 1. We annotated 22,175 protein-coding genes, most of which are recovered by RNA-seq data of ten tissues. Rapidly evolving genes and gene families are enriched in metabolism and immune systems, consistent with the fact that the goat is one of the most adaptable and geographically widespread livestock species. Comparative transcriptomic analysis of the primary and secondary follicles of a cashmere goat revealed 51 genes that were significantly differentially expressed between the two types of hair follicles. This study not only provides a high quality reference genome for an important livestock species, but also shows that the new automated optical mapping technology can be used in a de novo assembly of large genomes. Corresponding whole genome sequencing is available in NCBI BioProject PRJNA158393. We have sequenced a 3-year-old female Yunnan black goat and constructed a reference sequence for this breed. In order to improve quality of gene models, RNA samples of ten tissues (Bladder, Brain, Heart, Kidney, Liver, Lung, Lymph, Muscle, Ovarian, Spleen) were extracted from the same goat which was sequenced. To investigate the genic basis underlying the development of cashmere fibers using the goat reference genome assembly and annotated genes, we extracted RNA samples of primary hair follicle and secondary hair follicle from three Inner Mongolia cashmere goats and conducted transcriptome sequencing and DGE analysis. This submission represents RNA-Seq component of study.