Project description:Human endogenous retroviruses (HERVs) are a family of endogenous retroviruses that comprise the ~8.93% of the human genome sequence, with a high proportion being human specific. The recent expansion of repeated HERV sequences has offered a framework for genetic and epigenetic innovation. In the current report, a systematic approach is implemented to catalogue regulatory elements within HERVs, as a roadmap to potential functions of HERV sequences in gene networks. ENCODE Project has offered a wealth of epigenetic data based on omics technologies. I analyzed the presence of HERV sequences on consensus cis-regulatory elements (cCREs) from ENCODE data. On the one side, HERVs are in 1 out of 9 cCREs (>100.000 cCREs in total), dispersed within the genome and present in cis-regulatory regions of ~81% of human genes, as calculated following gene enrichment analysis. On the other side, promoter-associated HERV cCREs are present adjacent to (in a 200 bp window) the transcription start sites of 256 human genes. Regulatory network production, followed by centrality analysis led to the discovery of 90 core genes containing HERV-associated promoters. Pathway analysis on the core network genes and their immediate neighbors revealed a regulatory footprint that, among others, is associated with inflammation, chemokine signaling and response to viral infection. Collectively, these results support the concept that the expansion of regulatory sequences derived from HERVs is critical for epigenetic innovation that may have wired together genes into novel transcriptional networks with critical roles in cellular physiology and pathology.
Project description:Investigating the molecular evolution of human genome has paved the way to understand genetic adaptation of humans to the environmental changes and corresponding complex diseases. In this review, we discussed the historical origin of genetic diversity among human populations, the evolutionary driving forces that can affect genetic diversity among populations, and the effects of human movement into new environments and gene flow on population genetic diversity. Furthermore, we presented the role of natural selection on genetic diversity and complex diseases. Then we reviewed the disadvantageous consequences of historical selection events in modern time and their relation to the development of complex diseases. In addition, we discussed the effect of consanguinity on the incidence of complex diseases in human populations. Finally, we presented the latest information about the role of ancient genes acquired from interbreeding with ancient hominids in the development of complex diseases.
Project description:The demographic history of anatomically modern humans (AMH) involves multiple migration events, population extinctions and genetic adaptations. As genome-wide data from complete genome sequencing becomes increasingly abundant and available even from extinct hominins, new insights of the evolutionary history of our species are discovered. It is currently known that AMH interbred with archaic hominins once they left the African continent. Current non-African human genomes carry fragments of archaic origin. This review focuses on the fitness consequences of archaic interbreeding in current human populations. We discuss new insights and challenges that researchers face when interpreting the potential impact of introgression on fitness and testing hypotheses about the role of selection within the context of health and disease.
Project description:BackgroundThe data from high throughput genomics technologies provide unique opportunities for studies of complex biological systems, but also pose many new challenges. The shift to the genome scale in evolutionary biology, for example, has led to many interesting, but often controversial studies. It has been suggested that part of the conflict may be due to errors in the initial sequences. Most gene sequences are predicted by bioinformatics programs and a number of quality issues have been raised, concerning DNA sequencing errors or badly predicted coding regions, particularly in eukaryotes.ResultsWe investigated the impact of these errors on evolutionary studies and specifically on the identification of important genetic events. We focused on the detection of asymmetric evolution after duplication, which has been the subject of controversy recently. Using the human genome as a reference, we established a reliable set of 688 duplicated genes in 13 complete vertebrate genomes, where significantly different evolutionary rates are observed. We estimated the rates at which protein sequence errors occur and are accumulated in the higher-level analyses. We showed that the majority of the detected events (57%) are in fact artifacts due to the putative erroneous sequences and that these artifacts are sufficient to mask the true functional significance of the events.ConclusionsInitial errors are accumulated throughout the evolutionary analysis, generating artificially high rates of event predictions and leading to substantial uncertainty in the conclusions. This study emphasizes the urgent need for error detection and quality control strategies in order to efficiently extract knowledge from the new genome data.
Project description:Infectious diseases exert a constant evolutionary pressure on the genetic makeup of our innate immune system. Polymorphisms in Toll-like receptor 4 (TLR4) have been related to susceptibility to Gram-negative infections and septic shock. Here we show that two polymorphisms of TLR4, Asp299Gly and Thr399Ile, have unique distributions in populations from Africa, Asia, and Europe. Genetic and functional studies are compatible with a model in which the nonsynonymous polymorphism Asp299Gly has evolved as a protective allele against malaria, explaining its high prevalence in subSaharan Africa. However, the same allele could have been disadvantageous after migration of modern humans into Eurasia, putatively because of increased susceptibility to severe bacterial infections. In contrast, the Asp299Gly allele, when present in cosegregation with Thr399Ile to form the Asp299Gly/Thr399Ile haplotype, shows selective neutrality. Polymorphisms in TLR4 exemplify how the interaction between our innate immune system and the infectious pressures in particular environments may have shaped the genetic variations and function of our immune system during the out-of-Africa migration of modern humans.
Project description:Genotyping arrays are by far the most widely used genetic tests but are not generally utilized for diagnostic purposes in a medical context. In the present study, we examined the diagnostic value of a standard genotyping array (Illumina Global Screening Array) for a range of indications. Applications included stand-alone testing for specific variants (32 variants in 10 genes), first-tier array variant screening for monogenic conditions (10 different autosomal recessive metabolic diseases), and diagnostic workup for specific conditions caused by variants in multiple genes (suspected familial breast and ovarian cancer, and hypercholesterolemia). Our analyses showed a high analytical sensitivity and specificity of array-based analyses for validated and non-validated variants, and identified pitfalls that require attention. Ethical-legal assessment highlighted the need for a software solution that allows for individual indication-based consent and the reliable exclusion of non-consented results. Cost/time assessment revealed excellent performance of diagnostic array analyses, depending on indication, proband data, and array design. We have implemented some analyses in our diagnostic portfolio, but array optimization is required for the implementation of other indications.
Project description:Complex diseases are caused by a combination of genetic and environmental factors. Uncovering the molecular pathways through which genetic factors affect a phenotype is always difficult, but in the case of complex diseases this is further complicated since genetic factors in affected individuals might be different. In recent years, systems biology approaches and, more specifically, network based approaches emerged as powerful tools for studying complex diseases. These approaches are often built on the knowledge of physical or functional interactions between molecules which are usually represented as an interaction network. An interaction network not only reports the binary relationships between individual nodes but also encodes hidden higher level organization of cellular communication. Computational biologists were challenged with the task of uncovering this organization and utilizing it for the understanding of disease complexity, which prompted rich and diverse algorithmic approaches to be proposed. We start this chapter with a description of the general characteristics of complex diseases followed by a brief introduction to physical and functional networks. Next we will show how these networks are used to leverage genotype, gene expression, and other types of data to identify dysregulated pathways, infer the relationships between genotype and phenotype, and explain disease heterogeneity. We group the methods by common underlying principles and first provide a high level description of the principles followed by more specific examples. We hope that this chapter will give readers an appreciation for the wealth of algorithmic techniques that have been developed for the purpose of studying complex diseases as well as insight into their strengths and limitations.
Project description:This submission corresponds to peptides identified from proteomics analysis of tryptic digests of lung homogenates from C57Bl/6 mice infected with avian influenza (A/VN/1203/04). Mice were infected at 10e2, 10e3, or 10e4 pfu (n = 5 each) and sacrificed at 1, 2, 4, and 7 d post-infection. Aliquots of lung proteins were combined to create two pools corresponding to early (1 and 2 d post-infection) and late (4 and 7 d post-infection) infection and each pool was then subjected to tryptic digestion, followed by strong-cation exchange fractionation, resulting in 24 fractions each. Capillary LC-MS/MS analysis was then performed on each fraction. This submission corresponds to proteomics data collected from capillary LC-MS/MS analysis of the early infection pool. The MS/MS datasets were searched with SEQUEST using no enzyme search rules, and the peptide identification results were filtered using the following cutoffs; XCorr >= 1.6, 2.4, or 3.2 for 1+, 2+, or 3+ for fully tryptic peptides, and >= 4.3 or 4.7 for 2+ or 3+ partially tryptic or non-tryptic protein terminal peptides. Additionally, to reduce the total data size to a reasonable level, each spectrum was filtered to only include the top 100 most abundant ions in each 100 m/z-wide window. This submission is the aggregation of 24 individual LC-MS/MS analyses into one pseudo dataset. More information can be found at the following web sites: https://www.systemsvirology.org/project/home/begin.view? http://omics.pnl.gov
Project description:Motivated by growing evidence for pathway heterogeneity and alternative functions of molecular machines, we demonstrate a computational approach for investigating two questions: (1) Are there multiple mechanisms (state-space pathways) by which a machine can perform a given function, such as cotransport across a membrane? (2) How can additional functionality, such as proofreading/error-correction, be built into machine function using standard biochemical processes? Answers to these questions will aid both the understanding of molecular-scale cell biology and the design of synthetic machines. Focusing on transport in this initial study, we sample a variety of mechanisms by employing Metropolis Markov chain Monte Carlo. Trial moves adjust transition rates among an automatically generated set of conformational and binding states while maintaining fidelity to thermodynamic principles and a user-supplied fitness/functionality goal. Each accepted move generates a new model. The simulations yield both single and mixed reaction pathways for cotransport in a simple environment with a single substrate along with a driving ion. In a "competitive" environment including an additional decoy substrate, several qualitatively distinct reaction pathways are found which are capable of extremely high discrimination coupled to a leak of the driving ion, akin to proofreading. The array of functional models would be difficult to find by intuition alone in the complex state-spaces of interest.
Project description:Congenital hypothyroidism (CH) is the most common neonatal endocrine disorder and one of the most common preventable causes of intellectual disability in the world. CH may be due to developmental or functional thyroid defects (primary or peripheral CH) or be hypothalamic-pituitary in origin (central CH). In most cases, primary CH is caused by a developmental malformation of the gland (thyroid dysgenesis, TD) or by a defect in thyroid hormones synthesis (dyshormonogenesis, DH). TD represents about 65% of CH and a genetic cause is currently identified in fewer than 5% of patients. The remaining 35% are cases of DH and are explained with certainty at the molecular level in more than 50% of cases. The etiology of CH is mostly unknown and may include contributions from individual and environmental factors. In recent years, the detailed phenotypic description of patients, high-throughput sequencing technologies, and the use of animal models have made it possible to discover new genes involved in the development or function of the thyroid gland. This paper reviews all the genetic causes of CH. The modes by which CH is transmitted will also be discussed, including a new oligogenic model. CH is no longer simply a dominant disease for cases of CH due to TD and recessive for cases of CH due to DH, but a far more complex disorder.