Statistical Detection of Relatives Typed with Disjoint Forensic and Biomedical Loci.
ABSTRACT: In familial searching in forensic genetics, a query DNA profile is tested against a database to determine whether it represents a relative of a database entrant. We examine the potential for using linkage disequilibrium to identify pairs of profiles as belonging to relatives when the query and database rely on nonoverlapping genetic markers. Considering data on individuals genotyped with both microsatellites used in forensic applications and genome-wide SNPs, we find that ?30%-32% of parent-offspring pairs and ?35%-36% of sib pairs can be identified from the SNPs of one member of the pair and the microsatellites of the other. The method suggests the possibility of performing familial searches of microsatellite databases using query SNP profiles, or vice versa. It also reveals that privacy concerns arising from computations across multiple databases that share no genetic markers in common entail risks, not only for database entrants, but for their close relatives as well.
Project description:In forensic familial search methods, a query DNA profile is tested against a database to determine if the query profile represents a close relative of a database entrant. One challenge for familial search is that the calculations may require specification of allele frequencies for the unknown population from which the query profile has originated. The choice of allele frequencies affects the rate at which non-relatives are erroneously classified as relatives, and allele-frequency misspecification can substantially inflate false positive rates compared to use of allele frequencies drawn from the same population as the query profile. Here, we use ancestry inference on the query profile to circumvent the high false positive rates that result from highly misspecified allele frequencies. In particular, we perform ancestry inference on the query profile and make use of allele frequencies based on its inferred genetic ancestry. In a test for sibling matches on profiles that represent unrelated individuals, we demonstrate that false positive rates for familial search with use of ancestry inference to specify the allele frequencies are similar to those seen when allele frequencies align with the population of origin of a profile. Because ancestry inference is possible to perform on query profiles, the extreme allele-frequency misspecifications that produce the highest false positive rates can be avoided. We discuss the implications of the results in the context of concerns about the forensic use of familial searching.
Project description:With the expansion of offender/arrestee DNA profile databases, genetic forensic identification has become commonplace in the United States criminal justice system. Implementation of familial searching has been proposed to extend forensic identification to family members of individuals with profiles in offender/arrestee DNA databases. In familial searching, a partial genetic profile match between a database entrant and a crime scene sample is used to implicate genetic relatives of the database entrant as potential sources of the crime scene sample. In addition to concerns regarding civil liberties, familial searching poses unanswered statistical questions. In this study, we define confidence intervals on estimated likelihood ratios for familial identification. Using these confidence intervals, we consider familial searching in a structured population. We show that relatives and unrelated individuals from population samples with lower gene diversity over the loci considered are less distinguishable. We also consider cases where the most appropriate population sample for individuals considered is unknown. We find that as a less appropriate population sample, and thus allele frequency distribution, is assumed, relatives and unrelated individuals become more difficult to distinguish. In addition, we show that relationship distinguishability increases with the number of markers considered, but decreases for more distant genetic familial relationships. All of these results indicate that caution is warranted in the application of familial searching in structured populations, such as in the United States.
Project description:Abstract: Distinguishing between maternal relatives through mitochondrial (mt) DNA sequence analysis has been a longstanding desire of the forensic community. Using a deep-coverage, massively parallel sequencing (DCMPS) approach, we studied the pattern of mtDNA heteroplasmy across the mtgenomes of 39 mother-child pairs of European decent; haplogroups H, J, K, R, T, U, and X. Both shared and differentiating heteroplasmy were observed on a frequent basis in these closely related maternal relatives, with the minor variant often presented as 2-10% of the sequencing reads. A total of 17 pairs exhibited differentiating heteroplasmy (44%), with the majority of sites (76%, 16 of 21) occurring in the coding region, further illustrating the value of conducting sequence analysis on the entire mtgenome. A number of the sites of differentiating heteroplasmy resulted in non-synonymous changes in protein sequence (5 of 21), and to changes in transfer or ribosomal RNA sequences (5 of 21), highlighting the potentially deleterious nature of these heteroplasmic states. Shared heteroplasmy was observed in 12 of the 39 mother-child pairs (31%), with no duplicate sites of either differentiating or shared heteroplasmy observed; a single nucleotide position (16093) was duplicated between the data sets. Finally, rates of heteroplasmy in blood and buccal cells were compared, as it is known that rates can vary across tissue types, with similar observations in the current study. Our data support the view that differentiating heteroplasmy across the mtgenome can be used to frequently distinguish maternal relatives, and could be of interest to both the medical genetics and forensic communities.
Project description:As an increasing number of reliable protein-protein interactions (PPIs) become available and high-throughput experimental methods provide systematic identification of PPIs, there is a growing need for fast and accurate methods for discovering homologous PPIs of a newly determined PPI. PPISearch is a web server that rapidly identifies homologous PPIs (called PPI family) and infers transferability of interacting domains and functions of a query protein pair. This server first identifies two homologous families of the query, respectively, by using BLASTP to scan an annotated PPIs database (290 137 PPIs in 576 species), which is a collection of five public databases. We determined homologous PPIs from protein pairs of homologous families when these protein pairs were in the annotated database and have significant joint sequence similarity (E < or = 10(-40)) with the query. Using these homologous PPIs across multiple species, this sever infers the conserved domain-domain pairs (Pfam and InterPro domains) and function pairs (Gene Ontology annotations). Our results demonstrate that the transferability of conserved domain-domain pairs between homologous PPIs and query pairs is 88% using 103 762 PPI queries, and the transferability of conserved function pairs is 69% based on 106 997 PPI queries. The PPISearch server should be useful for searching homologous PPIs and PPI families across multiple species. The PPISearch server is available through the website at http://gemdock.life.nctu.edu.tw/ppisearch/.
Project description:To determine the extent of an inherited contribution to amyotrophic lateral sclerosis (ALS) mortality.Death certificates (DCs) from 1904 to 2009 were analyzed from patients with at least 3 generations recorded in the Utah Population Database, a genealogic and medical database of more than 2 million Utah residents. Among probands whose DCs listed ALS, the relative risk (RR) of death with ALS was determined among spouses and first- through fifth-degree relatives, using birth year-, sex-, and birthplace-matched cohorts.Eight hundred seventy-three patients with ALS met the inclusion criteria. Among 3,531 deceased first-degree relatives of probands, the RR of dying with ALS was increased compared with control cohorts (RR = 4.91, 95% confidence interval 3.36, 6.94). The RR of dying with ALS was also increased among 9,386 deceased second-degree relatives (RR = 2.85, 95% confidence interval 2.06, 3.84). The RR of dying with ALS was not increased among third- through fifth-degree relatives. More affected first-degree relatives were male (p = 0.014). No cases of conjugal ALS were observed.This study is suggestive of familial clustering in excess of expected for ALS. Our results confirm the results of prior studies of familial ALS, suggesting applicability of our findings to other mixed European populations. Furthermore, this work expands on previous studies by quantifying the RR of ALS among more distant relatives. The use of mortality data obtained from DCs reduces the ascertainment and recall bias of many previous studies. Finally, the excess of ALS among second-degree relatives and lack of conjugal ALS are strongly supportive of a genetic contribution.
Project description:Since the concept of microhaplotypes was proposed by Kidd in 2013, various microhaplotype markers have been investigated for various forensic purposes, such as individual identification, deconvolution of DNA mixtures, or forensic ancestry inference. In our opinion, various compound markers are also regarded as generalized microhaplotypes, encompassing two or more variants in a short segment of DNA (e.g., 200 bp). That is, a set of variants (referred to herein as multi-variants) within a certain length includes single nucleotide polymorphisms (SNP), insertion/deletion polymorphisms (Indels), or short tandem repeat polymorphisms (STRs). At present, multi-variant is mainly aimed at multi-SNPs. However, the haplotype genotyping of multi-variants relies on single-strand analysis, mainly using massively parallel sequencing (MPS). Here, we describe a method based on a capillary electrophoresis (CE) platform that can directly obtain haplotypes of individuals. Several microhaplotypes consisting of three or more Indels with different insertion or deletion lengths in the range of less than 200 bp were screened out, each of which had at least three haplotypes. As a result, the haplotype of an individual was reflected by the length of its polymorphism. Finally, we established a multiplex amplification system containing 18 multi-Indel markers that could identify haplotypes on each chromosome of an individual. The combined power of discrimination (CPD) and the cumulative probability of exclusion (CPE) were 0.999999999997234 and 0.9984, respectively.
Project description:Footwear outsole images were obtained from 150 pairs of used shoes. The motivation for constructing the database was to enable a statistical analysis of two-dimensional (2D) images of shoe outsoles, to understand within shoe (between replicate images of the same shoe) and between shoe variability, and to develop methods for the evaluation of forensic pattern evidence of shoeprints. Since we scanned the outsole of the used shoes, the images capture not only the outsole pattern design but also the marks that arise from wear and tear and that may help identify the shoe that made the impression. Each shoe in a pair was scanned five times, so that replicate images can be used to estimate within-shoe variability. In total, there are 1500 2D images in the database. The EverOS footwear scanner was used to capture the outsole of each shoe. The scanner detects the weight distribution of the person wearing the shoe when he or she steps on the scanning surface. It images the portions of the outsole that make contact with the scanning surface. The database is a useful resource for forensic scientists or for anybody else with an interest in image comparison. The database we describe, was constructed by researchers in the Center for Statistics and Applications in Forensic Evidence (CSAFE) at Iowa State University.
Project description:This research examined the familial aggregation of migraine, depression, and their co-occurrence.Diagnoses of migraine and depression were determined in a sample of 5,319 Australian twins. Migraine was diagnosed by either self-report, the ID migraine™ Screener, or International Headache Society (IHS) criteria. Depression was defined by fulfilling either major depressive disorder (MDD) or minor depressive disorder (MiDD) based on the Diagnostic and Statistical Manual of Mental Disorders (DSM) criteria. The relative risks (RR) for migraine and depression were estimated in co-twins of twin probands reporting migraine or depression to evaluate their familial aggregation and co-occurrence.An increased RR of both migraine and depression in co-twins of probands with the same trait was observed, with significantly higher estimates within monozygotic (MZ) twin pairs compared to dizygotic (DZ) twin pairs. For cross-trait analysis, the RR for migraine in co-twins of probands reporting depression was 1.36 (95% CI: 1.24-1.48) in MZ pairs and 1.04 (95% CI: 0.95-1.14) in DZ pairs; and the RR for depression in co-twins of probands reporting migraine was 1.26 (95% CI: 1.14-1.38) in MZ pairs and 1.02 (95% CI: 0.94-1.11) in DZ pairs. The RR for strict IHS migraine in co-twins of probands reporting MDD was 2.23 (95% CI: 1.81-2.75) in MZ pairs and 1.55 (95% CI: 1.34-1.79) in DZ pairs; and the RR for MDD in co-twins of probands reporting IHS migraine was 1.35 (95% CI: 1.13-1.62) in MZ pairs and 1.06 (95% CI: 0.93-1.22) in DZ pairs.We observed significant evidence for a genetic contribution to familial aggregation of migraine and depression. Our findings suggest a bi-directional association between migraine and depression, with an increased risk for depression in relatives of probands reporting migraine, and vice versa. However, the observed risk for migraine in relatives of probands reporting depression was considerably higher than the reverse. These results add further support to previous studies suggesting that patients with comorbid migraine and depression are genetically more similar to patients with only depression than patients with only migraine.
Project description:Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time consuming and expensive process. Second, it requires large-scale computation and storage systems to process genomic sequences. Third, genomic databases are often owned by different organizations, and thus, not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 Single Nucleotide Polymorphisms (SNPs) in a database of 20 000 records takes around 100 and 150 s, respectively.
Project description:Nucleotide sequence differences on the whole-genome scale have been computed for 1,092 people from 14 populations publicly available by the 1000 Genomes Project. Total number of differences in genetic variants between 96,464 human pairs has been calculated. The distributions of these differences for individuals within European, Asian, or African origin were characterized by narrow unimodal peaks with mean values of 3.8, 3.5, and 5.1 million, respectively, and standard deviations of 0.1-0.03 million. The total numbers of genomic differences between pairs of all known relatives were found to be significantly lower than their respective population means and in reverse proportion to the distance of their consanguinity. By counting the total number of genomic differences it is possible to infer familial relations for people that share down to 6% of common loci identical-by-descent. Detection of familial relations can be radically improved when only very rare genetic variants are taken into account. Counting of total number of shared very rare single nucleotide polymorphisms (SNPs) from whole-genome sequences allows establishing distant familial relations for persons with eighth and ninth degrees of relationship. Using this analysis we predicted 271 distant familial pairwise relations among 1,092 individuals that have not been declared by 1000 Genomes Project. Particularly, among 89 British and 97 Chinese individuals we found three British-Chinese pairs with distant genetic relationships. Individuals from these pairs share identical-by-descent DNA fragments that represent 0.001%, 0.004%, and 0.01% of their genomes. With affordable whole-genome sequencing techniques, very rare SNPs should become important genetic markers for familial relationships and population stratification.