Project description:Nanopore sequencing for forensic short tandem repeats (STR) genotyping comes with the advantages associated with massively parallel sequencing (MPS) without the need for a high up-front device cost, but genotyping is inaccurate, partially due to the occurrence of homopolymers in STR loci. The goal of this study was to apply the latest progress in nanopore sequencing by Oxford Nanopore Technologies in the field of STR genotyping. The experiments were performed using the state of the art R9.4 flow cell and the most recent R10 flow cell, which was specifically designed to improve consensus accuracy of homopolymers. Two single-contributor samples and one mixture sample were genotyped using Illumina sequencing, Nanopore R9.4 sequencing, and Nanopore R10 sequencing. The accuracy of genotyping was comparable for both types of flow cells, although the R10 flow cell provided improved data quality for loci characterized by the presence of homopolymers. We identify locus-dependent characteristics hindering accurate STR genotyping, providing insights for the design of a panel of STR loci suited for nanopore sequencing. Repeat number, the number of different reference alleles for the locus, repeat pattern complexity, flanking region complexity, and the presence of homopolymers are identified as unfavorable locus characteristics. For single-contributor samples and for a limited set of the commonly used STR loci, nanopore sequencing could be applied. However, the technology is not mature enough yet for implementation in routine forensic workflows.
Project description:Enterohaemorrhagic E. coli (EHEC) is a significant human pathogens that cause outbreaks of haemorrhagic colitis and haemolytic uremic syndrome. During infection, pathogens compete for iron with the host, and one mechanism by which EHEC obtains iron is through haem uptake and utilitisation which is encoded by the chu operon. We have demonstrated that the haem receptor chuA is regulated by the Crp-cAMP-dependent sRNA CyaR. We further demonstrate that activation of chuA by CyaR is independent of the chuA RNA-thermometer and termination by Rho. These results highlight the ability of regulatory sRNAs to integrate multiple environmental signals into a layered hierarchy of signal input.
Project description:Short tandem repeats are among the most polymorphic loci in the human genome. These loci play a role in the etiology of a range of genetic diseases and have been frequently utilized in forensics, population genetics, and genetic genealogy. Despite this plethora of applications, little is known about the variation of most STRs in the human population. Here, we report the largest-scale analysis of human STR variation to date. We collected information for nearly 700,000 STR loci across more than 1000 individuals in Phase 1 of the 1000 Genomes Project. Extensive quality controls show that reliable allelic spectra can be obtained for close to 90% of the STR loci in the genome. We utilize this call set to analyze determinants of STR variation, assess the human reference genome's representation of STR alleles, find STR loci with common loss-of-function alleles, and obtain initial estimates of the linkage disequilibrium between STRs and common SNPs. Overall, these analyses further elucidate the scale of genetic variation beyond classical point mutations.
Project description:The Qiang ethnic group is one of the oldest ethnic groups in China and is the most active ethnic group among all the populations along the Tibetan-Yi corridor. They have had a profound impact nationally and internationally. The paternal and maternal genetic feature of the Qiang ethnic group has been revealed, leaving the question of the genetic characteristics from autosomes and X chromosome not answered. The aim of this study was to explore the potential of 36 A-STR (Microreader™ 36A ID System) and 19 X-STR (Microreader™ 19X System) for application in the Qiang population and to elucidate their genetic diversity in southwest China. The cumulative probability of exclusion (CPE) for autosomal STRs is 1-1.3814 × 10-15 and the mean paternity exclusion chance (MEC) for X-STRs is 1-1.7323 × 10-6. Forensic parameters suggest that the STRs analyzed here are well-suited for forensic applications. The results of phylogenetic, interpopulation differentiation, and principal coordinates analysis (PCoA) indicate that the Qiang people have extensive connections with ethnic minorities in China, supporting the view that the Qiang people are the oldest group in the entire Sino-Tibetan language family. The Qiang appeared genetically more associated with most ethnic groups in China, especially the Han. The calculation of random matching probability (RMP) was improved by Fst correction of allele frequencies to make RMP more accurate and reasonable. This study can fill in the gaps in the Qiang STR reference database, providing valuable frequency data for forensic applications and evidence for the Qiang's genetic pattern as an important ancestral position in the Sino-Tibetan populations.
Project description:Life span is a complex and multifactorial trait, which is shaped by genetic, epigenetic, environmental, and stochastic factors. The possibility that highly hypervariable short tandem repeats (STRs) associated with longevity has been largely explored by comparing the genotypic pools of long lived and younger individuals, but results so far have been contradictory. In view of these contradictory findings, the present study aims to investigate whether HUMTHO1 and HUMCSF1PO STRs, previously associated with longevity, exert a role as a modulator of life expectancy, as well as to assess the extent to which other autosomal STR markers are associated with human longevity in population from northern Spain. To that end, 21 autosomal microsatellite markers have been studied in 304 nonagenarian individuals (more than 90 years old) and 516 younger controls of European descent. Our results do not confirm the association found in previous studies between longevity and THO1 and CSF1PO loci. However, significant association between longevity and autosomal STR markers D12S391, D22S1045, and DS441 was observed. Even more, when we compared allelic frequency distribution of the 21 STR markers between cases and controls, we found that 6 out of the 21 STRs studied showed different allelic frequencies, thus suggesting that the genomic portrait of the human longevity is far complex and probably shaped by a high number of genomic loci.
Project description:BackgroundIn order to isolate an individual's genotype from a sample of biological material, most laboratories use PCR and Capillary Electrophoresis (CE) to construct a genetic profile based on polymorphic loci known as Short Tandem Repeats (STRs). The resulting profile consists of CE signal which contains information about the length and number of STR units amplified. For samples collected from the environment, interpretation of the signal can be challenging given that information regarding the quality and quantity of the DNA is often limited. The signal can be further compounded by the presence of noise and PCR artifacts such as stutter which can mask or mimic biological alleles. Because manual interpretation methods cannot comprehensively account for such nuances, it would be valuable to develop a signal model that can effectively characterize the various components of STR signal independent of a priori knowledge of the quantity or quality of DNA.ResultsFirst, we seek to mathematically characterize the quality of the profile by measuring changes in the signal with respect to amplicon size. Next, we examine the noise, allele, and stutter components of the signal and develop distinct models for each. Using cross-validation and model selection, we identify a model that can be effectively utilized for downstream interpretation. Finally, we show an implementation of the model in NOCIt, a software system that calculates the a posteriori probability distribution on the number of contributors.ConclusionThe model was selected using a large, diverse set of DNA samples obtained from 144 different laboratory conditions; with DNA amounts ranging from a single copy of DNA to hundreds of copies, and the quality of the profiles ranging from pristine to highly degraded. Implemented in NOCIt, the model enables a probabilisitc approach to estimating the number of contributors to complex, environmental samples.
Project description:Eritrea is a multi-ethnic country of over 3 million of people consisting of different ethnic groups, having each its own language and cultural tradition. Due to the lack of population genetic data for markers of forensic interest, in this study, we analyzed the genetic polymorphisms of 23 Y-chromosome STR loci and of 12 X-chromosome STR loci in a sample of 255 unrelated individuals from 8 Eritrean ethnic groups, with the aim to generate a reference haplotype database for anthropological and forensic applications. X- and Y-chromosomes markers may indeed offer information especially in personal identification and kinship testing, when relying on the availability of large local population data to derive sufficiently accurate frequency estimates. The population genetic analyses in the Eritrean sample for both the two set of Y- and X-STR markers showed high power of discrimination both at country-based and population levels. Comparison population results highlight the importance of considering the ethnic composition within the analyzed country and the necessity of increasing available data especially when referring to heterogeneous populations such as the African ones.
Project description:Oxford Nanopore Technology (ONT) sequencing is a third-generation sequencing technology that enables cost-effective long-read sequencing, with broad applications in biological research. However, its high sequencing error rate in low-complexity regions hampers its applications in short tandem repeat (STR)-related research. To address this, we generated a comprehensive STR error profile of ONT by analyzing publicly available Nanopore sequencing datasets. We show that the sequencing error rate is influenced not only by STR length but also by the repeat unit and the flanking sequences of STR regions. Interestingly, certain flanking sequences were associated with higher sequencing accuracy, suggesting that certain STR loci are more suitable for Nanopore sequencing compared to other loci. While base quality scores of substitution errors within the STR regions were lower than those of correctly sequenced bases, such patterns were not observed for indel errors. Furthermore, choosing the most recent basecaller version and using the super accuracy model significantly improved STR sequencing accuracy. Finally, we present NanoMnT, a lightweight Python tool that corrects STR sequencing errors in sequencing data and estimates STR allele sizes. NanoMnT leverages the characteristics of ONT when estimating STR allele size and exhibits superior results for 1-bp- and 2-bp repeat STR compared to existing tools. By integrating our findings, we improved STR allele estimation accuracy for Ax10 repeats from 55% to 78% and up to 85% when excluding loci with unfavorable flanking sequences. Using NanoMnT, we present the utility of our findings by identifying microsatellite instability status in cancer sequencing data. NanoMnT is publicly available at https://github.com/18parkky/NanoMnT.
Project description:Genotyping of highly polymorphic autosomal short tandem repeat (STR) markers is a potent tool for elucidating genetic diversity. In the present study, fifteen autosomal STR markers were analyzed in unrelated healthy male Gorkha individuals (n = 98) serving in the Indian Army by using AmpFlSTR Identifiler Plus PCR Amplification Kit. In total, 138 alleles were observed with corresponding allele frequencies ranging from 0.005 to 0.469. The studied loci were in Hardy-Weinberg Equilibrium (HWE). Heterozygosity ranged from 0.602 to 0.867. The most polymorphic locus was Fibrinogen Alpha (FGA) chain which was also the most discriminating locus as expected. Neighbor Joining (NJ) tree and principal component analysis (PCA) plot clustered the Gorkhas with those of Nepal and other Tibeto-Burman population while lowlander Indian population formed separate cluster substantiating the closeness of the Gorkhas with the Tibeto-Burman linguistic phyla. Furthermore, the dataset of STR markers obtained in the study presents a valuable information source of STR DNA profiles from personnel for usage in disaster victim identification in military exigencies and adds to the Indian database of military soldiers and military hospital repository.
Project description:The forensic genetics field is generating extensive population data on polymorphism of short tandem repeats (STR) markers in globally distributed samples. In this study we explored and quantified the informative power of these datasets to address issues related to human evolution and diversity, by using two online resources: an allele frequency dataset representing 141 populations summing up to almost 26 thousand individuals; a genotype dataset consisting of 42 populations and more than 11 thousand individuals. We show that the genetic relationships between populations based on forensic STRs are best explained by geography, as observed when analysing other worldwide datasets generated specifically to study human diversity. However, the global level of genetic differentiation between populations (as measured by a fixation index) is about half the value estimated with those other datasets, which contain a much higher number of markers but much less individuals. We suggest that the main factor explaining this difference is an ascertainment bias in forensics data resulting from the choice of markers for individual identification. We show that this choice results in average low variance of heterozygosity across world regions, and hence in low differentiation among populations. Thus, the forensic genetic markers currently produced for the purpose of individual assignment and identification allow the detection of the patterns of neutral genetic structure that characterize the human population but they do underestimate the levels of this genetic structure compared to the datasets of STRs (or other kinds of markers) generated specifically to study the diversity of human populations.