Principal component analysis reveals the 1000 Genomes Project does not sufficiently cover the human genetic diversity in Asia.
ABSTRACT: The 1000 Genomes Project (1KG) aims to provide a comprehensive resource on human genetic variations. With an effort of sequencing 2,500 individuals, 1KG is expected to cover the majority of the human genetic diversities worldwide. In this study, using analysis of population structure based on genome-wide single nucleotide polymorphisms (SNPs) data, we examined and evaluated the coverage of genetic diversity of 1KG samples with the available genome-wide SNP data of 3,831 individuals representing 140 population samples worldwide. We developed a method to quantitatively measure and evaluate the genetic diversity revealed by population structure analysis. Our results showed that the 1KG does not have sufficient coverage of the human genetic diversity in Asia, especially in Southeast Asia. We suggested a good coverage of Southeast Asian populations be considered in 1KG or a regional effort be initialized to provide a more comprehensive characterization of the human genetic diversity in Asia, which is important for both evolutionary and medical studies in the future.
Project description:BACKGROUND: Wild boar, Sus scrofa, is an extant wild ancestor of the domestic pig as an agro-economically important mammal. Wild boar has a worldwide distribution with its geographic origin in Southeast Asia, but genetic diversity and genetic structure of wild boar in East Asia are poorly understood. To characterize the pattern and amount of genetic variation and population structure of wild boar in East Asia, we genotyped and analyzed microsatellite loci for a total of 238 wild boar specimens from ten locations across six countries in East and Southeast Asia. RESULTS: Our data indicated that wild boar populations in East Asia are genetically diverse and structured, showing a significant correlation of genetic distance with geographic distance and implying a low level of gene flow at a regional scale. Bayesian-based clustering analysis was indicative of seven inferred genetic clusters in which wild boars in East Asia are geographically structured. The level of genetic diversity was relatively high in wild boars from Southeast Asia, compared with those from Northeast Asia. This gradient pattern of genetic diversity is consistent with an assumed ancestral population of wild boar in Southeast Asia. Genetic evidences from a relationship tree and structure analysis suggest that wild boar in Jeju Island, South Korea have a distinct genetic background from those in mainland Korea. CONCLUSIONS: Our results reveal a diverse pattern of genetic diversity and the existence of genetic differentiation among wild boar populations inhabiting East Asia. This study highlights the potential contribution of genetic variation of wild boar to the high genetic diversity of local domestic pigs during domestication in East Asia.
Project description:Does genotype imputation with public reference panels identify variants contributing to disease? Genotype imputation using the 1000 Genomes Project (1KG; 2504 individuals) displayed poor coverage at the causal cystic fibrosis (CF) transmembrane conductance regulator (CFTR) locus for the International CF Gene Modifier Consortium. Imputation with the larger Haplotype Reference Consortium (HRC; 32,470 individuals) displayed improved coverage but low sensitivity of variants clinically relevant for CF. A hybrid reference that combined whole genome sequencing (WGS) from 101 CF individuals with the 1KG imputed a greater number of single-nucleotide variants (SNVs) that would be analyzed in a genetic association study (r2???0.3 and MAF???0.5%) than imputation with the HRC, while the HRC excelled in the lower frequency spectrum. Using the 1KG or HRC as reference panels missed the most common CF-causing variants or displayed low imputation accuracy. Designs that incorporate population-specific WGS can improve imputation accuracy at disease-specific loci, while imputation using public data sets can omit disease-relevant genotypes.
Project description:Myanmar locates in the crossroads of South Asia, Southeast Asia, and East Asia, and is known for high culture diversity in different ethnic groups. It is considered to be important for understanding human evolutionary history and genetic diversity in East Eurasia. However, relatively few studies have examined the population structure and demographic history in Myanmar to date. In this study, we analyzed more than 220,000 genome-wide SNPs in 175 new samples of five ethnic groups from Myanmar and compared them with the published data. Our results showed that the Myanmar population is intricately substructured, with the main observed clusters corresponding roughly to western/northern highlanders (Chin, Naga, and Jingpo) and central/southern lowlanders (Bamar and Rakhine). The gene flow inferred from South Asia has a substantial influence (~11%) on the gene pool of central/southern lowlanders rather than western/northern highlanders. The genetic admixture is dated around 650 years ago. These findings suggest that the genome-wide variation in Myanmar was likely shaped by the linguistic, cultural, and historical changes. Overall design: To investigate the genetic structure of Myanmer populations, we genotyped 240 samples of four different ethnic groups (i.e. Bamar also Burman, Chin, Naga, and Rakhine) in Myanmar, neighboring Chinese minor ethnic groups (Jingpo), and Africans from Nigeria conducted on Illumina HumanOmniZhongHua-8 BeadChip. Three Africans here were used as out group. Sample type is normal blood. For general population genetics analysis, we filtered out 4 samples for duplication, 2 for wrong gender information, and 56 samples for close genetic relationship (using IBD test). Finally, 175 genetic-independent individuals data were used to reflect the genetic structure of Myanmar.
Project description:BACKGROUND:The oriental fruit fly, Bactrocera dorsalis s.s., is one of the most important quarantine pests in many countries, including China. Although the oriental fruit fly has been investigated extensively, its origins and genetic structure remain disputed. In this study, the NADH dehydrogenase subunit 1 (ND1) gene was used as a genetic marker to examine the genetic diversity, population structure, and gene flow of B. dorsalis s.s. throughout its range in China and southeast Asia. RESULTS:Haplotype networks and phylogenetic analysis indicated two distinguishable lineages of the fly population but provided no strong support for geographical subdivision in B. philippinensis. Demographic analysis revealed rapid expansion of B. dorsalis s.s. populations in China and Southeast Asia in the recent years. The greatest amount of genetic diversity was observed in Manila, Pattaya, and Bangkok, and asymmetric migration patterns were observed in different parts of China. The data collected here further show that B. dorsalis s.s. in Yunnan, Guangdong, and Fujian Provinces, and in Taiwan might have different origins within southeast Asia. CONCLUSIONS:Using the mitochondrial ND1 gene, the results of the present study showed B. dorsalis s.s. from different parts of China to have different genetic structures and origins. B. dorsalis s.s. in China and southeast Asia was found to have experienced rapid expansion in recent years. Data further support the existence of two distinguishable lineages of B. dorsalis s.s. in China and indicate genetic diversity and gene flow from multiple origins.The sequences in this paper have been deposited in GenBank/NCBI under accession numbers KC413034-KC413367.
Project description:Pteropine orthoreovirus (PRV; Reoviridae: Spinareovirinae) is an emerging bat-borne zoonotic virus that causes influenza-like illness (ILI). PRV has thus far been found only in Australia and Asia, where diverse old-world fruit bats (Pteropodidae) serve as hosts. In this study, we report the discovery of PRV in Africa, in an Angolan soft-furred fruit bat (Lissonycteris angolensis ruwenzorii) from Bundibugyo District, Uganda. Metagenomic characterization of a rectal swab yielded 10 dsRNA genome segments, revealing this virus to cluster within the known diversity of PRV variants detected in bats and humans in Southeast Asia. Phylogeographic analyses revealed a correlation between geographic distance and genetic divergence of PRVs globally, which suggests a geographic continuum of PRV diversity spanning Southeast Asia to sub-Saharan Africa. The discovery of PRV in an African bat dramatically expands the geographic range of this zoonotic virus and warrants further surveillance for PRVs outside of Southeast Asia.
Project description:Antimalarial drugs are a key tool in malaria elimination programs. With the emergence of artemisinin resistance in southeast Asia, an effort to identify molecular markers for surveillance of resistant malaria parasites is underway. Non-synonymous mutations in the kelch propeller domain (K13-propeller) in Plasmodium falciparum have been associated with artemisinin resistance in samples from southeast Asia, but additional studies are needed to characterize this locus in other P. falciparum populations with different levels of artemisinin use. Here, we sequenced the K13-propeller locus in 82 samples from Haiti, where limited government oversight of non-governmental organizations may have resulted in low-level use of artemisinin-based combination therapies. We detected a single-nucleotide polymorphism (SNP) at nucleotide 1,359 in a single isolate. Our results contribute to our understanding of the global genomic diversity of the K13-propeller locus in P. falciparum populations.
Project description:The Austronesian expansion, one of the last major human migrations, influenced regions as distant as tropical Asia, Remote Oceania and Madagascar, off the east coast of Africa. The identity of the Asian groups that settled Madagascar is particularly mysterious. While language connects Madagascar to the Ma'anyan of southern Borneo, haploid genetic data are more ambiguous. Here, we screened genome-wide diversity in 211 individuals from the Ma'anyan and surrounding groups in southern Borneo. Surprisingly, the Ma'anyan are characterized by a distinct, high frequency genomic component that is not found in Malagasy. This novel genetic layer occurs at low levels across Island Southeast Asia and hints at a more complex model for the Austronesian expansion in this region. In contrast, Malagasy show genomic links to a range of Island Southeast Asian groups, particularly from southern Borneo, but do not have a clear genetic connection with the Ma'anyan despite the obvious linguistic association.
Project description:The 1000 Genomes (1KG) Project provides a near-comprehensive resource on human genetic variation in worldwide reference populations. 1KG variants can be accessed through a browser and through the raw and annotated data that are regularly released on an ftp server. We developed Ferret, a user-friendly Java tool, to easily extract genetic variation information from these large and complex data files. From a locus, gene(s) or SNP(s) of interest, Ferret retrieves genotype data for 1KG SNPs and indels, and computes allelic frequencies for 1KG populations and optionally, for the Exome Sequencing Project populations. By converting the 1KG data into files that can be imported into popular pre-existing tools (e.g. PLINK and HaploView), Ferret offers a straightforward way, even for non-bioinformatics specialists, to manipulate, explore and merge 1KG data with the user's dataset, as well as visualize linkage disequilibrium pattern, infer haplotypes and design tagSNPs.Ferret tool and source code are publicly available at http://limousophie35.github.io/Ferretfirstname.lastname@example.orgSupplementary data are available at Bioinformatics online.
Project description:Hanging Coffin is a unique and ancient burial custom that has been practiced in southern China, Southeast Asia, and near Oceania regions for more than 3,000 years. Here, we conducted mitochondrial whole-genome analyses of 41 human remains sampled from 13 Hanging Coffin sites in southern China and northern Thailand, which were dated between ?2,500 and 660 years before present. We found that there were genetic connections between the Hanging Coffin people living in different geographic regions. Notably, the matrilineal genetic diversity of the Hanging Coffin people from southern China is much higher than those from northern Thailand, consistent with the hypothesized single origin of the Hanging Coffin custom in southern China about 3,600 years ago, followed by its dispersal in southern China through demic diffusion, whereas the major dispersal pattern in Southeast Asia is cultural assimilation in the past 2,000 years.
Project description:The history of human settlement in Southeast Asia has been complex and involved several distinct dispersal events. Here we report the analyses of 1825 individuals from Southeast Asia including new genome-wide genotype data for 146 individuals from three Mainland Southeast Asian (Burmese, Malay and Vietnamese) and four Island Southeast Asian (Dusun, Filipino, Kankanaey and Murut) populations. While confirming the presence of previously recognized major ancestry components in the Southeast Asian population structure, we highlight the Kankanaey Igorots from the highlands of the Philippine Mountain Province as likely the closest living representatives of the source population that may have given rise to the Austronesian expansion. This conclusion rests on independent evidence from various analyses of autosomal data and uniparental markers. Overall design: 196 samples were analysed with the Illumina platform OmniExpress BeadChips and are described herein.