Population Stratification and Underrepresentation of Indian Subcontinent Genetic Diversity in the 1000 Genomes Project Dataset.
ABSTRACT: Genomic variation in Indian populations is of great interest due to the diversity of ancestral components, social stratification, endogamy and complex admixture patterns. With an expanding population of 1.2 billion, India is also a treasure trove to catalogue innocuous as well as clinically relevant rare mutations. Recent studies have revealed four dominant ancestries in populations from mainland India: Ancestral North-Indian (ANI), Ancestral South-Indian (ASI), Ancestral Tibeto-Burman (ATB) and Ancestral Austro-Asiatic (AAA). The 1000 Genomes Project (KGP) Phase-3 data include about 500 genomes from five linguistically defined Indian-Subcontinent (IS) populations (Punjabi, Gujrati, Bengali, Telugu and Tamil) some of whom are recent migrants to USA or UK. Comparative analyses show that despite the distinct geographic origins of the KGP-IS populations, the ANI component is predominantly represented in this dataset. Previous studies demonstrated population substructure in the HapMap Gujrati population, and we found evidence for additional substructure in the Punjabi and Telugu populations. These substructured populations have characteristic/significant differences in heterozygosity and inbreeding coefficients. Moreover, we demonstrate that the substructure is better explained by factors like differences in proportion of ancestral components, and endogamy driven social structure rather than invoking a novel ancestral component to explain it. Therefore, using language and/or geography as a proxy for an ethnic unit is inadequate for many of the IS populations. This highlights the necessity for more nuanced sampling strategies or corrective statistical approaches, particularly for biomedical and population genetics research in India.
Project description:India has been underrepresented in genome-wide surveys of human variation. We analyse 25 diverse groups in India to provide strong evidence for two ancient populations, genetically divergent, that are ancestral to most Indians today. One, the 'Ancestral North Indians' (ANI), is genetically close to Middle Easterners, Central Asians, and Europeans, whereas the other, the 'Ancestral South Indians' (ASI), is as distinct from ANI and East Asians as they are from each other. By introducing methods that can estimate ancestry without accurate ancestral populations, we show that ANI ancestry ranges from 39-71% in most Indian groups, and is higher in traditionally upper caste and Indo-European speakers. Groups with only ASI ancestry may no longer exist in mainland India. However, the indigenous Andaman Islanders are unique in being ASI-related groups without ANI ancestry. Allele frequency differences between groups in India are larger than in Europe, reflecting strong founder effects whose signatures have been maintained for thousands of years owing to endogamy. We therefore predict that there will be an excess of recessive diseases in India, which should be possible to screen and map genetically.
Project description:Most Indian groups descend from a mixture of two genetically divergent populations: Ancestral North Indians (ANI) related to Central Asians, Middle Easterners, Caucasians, and Europeans; and Ancestral South Indians (ASI) not closely related to groups outside the subcontinent. The date of mixture is unknown but has implications for understanding Indian history. We report genome-wide data from 73 groups from the Indian subcontinent and analyze linkage disequilibrium to estimate ANI-ASI mixture dates ranging from about 1,900 to 4,200 years ago. In a subset of groups, 100% of the mixture is consistent with having occurred during this period. These results show that India experienced a demographic transformation several thousand years ago, from a region in which major population mixture was common to one in which mixture even between closely related groups became rare because of a shift to endogamy.
Project description:The Indus Valley has been the backdrop for several historic and prehistoric population movements between South Asia and West Eurasia. However, the genetic structure of present-day populations from Northwest India is poorly characterized. Here we report new genome-wide genotype data for 45 modern individuals from four Northwest Indian populations, including the Ror, whose long-term occupation of the region can be traced back to the early Vedic scriptures. Our results suggest that although the genetic architecture of most Northwest Indian populations fits well on the broader North-South Indian genetic cline, culturally distinct groups such as the Ror stand out by being genetically more akin to populations living west of India; such populations include prehistorical and early historical ancient individuals from the Swat Valley near the Indus Valley. We argue that this affinity is more likely a result of genetic continuity since the Bronze Age migrations from the Steppe Belt than a result of recent admixture. The observed patterns of genetic relationships both with modern and ancient West Eurasians suggest that the Ror can be used as a proxy for a population descended from the Ancestral North Indian (ANI) population. Collectively, our results show that the Indus Valley populations are characterized by considerable genetic heterogeneity that has persisted over thousands of years.
Project description:The Bene Israel Jewish community from West India is a unique population whose history before the 18th century remains largely unknown. Bene Israel members consider themselves as descendants of Jews, yet the identity of Jewish ancestors and their arrival time to India are unknown, with speculations on arrival time varying between the 8th century BCE and the 6th century CE. Here, we characterize the genetic history of Bene Israel by collecting and genotyping 18 Bene Israel individuals. Combining with 486 individuals from 41 other Jewish, Indian and Pakistani populations, and additional individuals from worldwide populations, we conducted comprehensive genome-wide analyses based on FST, principal component analysis, ADMIXTURE, identity-by-descent sharing, admixture linkage disequilibrium decay, haplotype sharing and allele sharing autocorrelation decay, as well as contrasted patterns between the X chromosome and the autosomes. The genetics of Bene Israel individuals resemble local Indian populations, while at the same time constituting a clearly separated and unique population in India. They are unique among Indian and Pakistani populations we analyzed in sharing considerable genetic ancestry with other Jewish populations. Putting together the results from all analyses point to Bene Israel being an admixed population with both Jewish and Indian ancestry, with the genetic contribution of each of these ancestral populations being substantial. The admixture took place in the last millennium, about 19-33 generations ago. It involved Middle-Eastern Jews and was sex-biased, with more male Jewish and local female contribution. It was followed by a population bottleneck and high endogamy, which can lead to increased prevalence of recessive diseases in this population. This study provides an example of how genetic analysis advances our knowledge of human history in cases where other disciplines lack the relevant data to do so.
Project description:Oral health related quality of life research among children in India is still nascent and no measures have been validated to date. Although CPQ11-14 has been previously used in studies from the Indian sub-continent, the instrument has never been tested for cross-cultural adaptability. This study aimed to assess the validity and reliability of CPQ11-14 in Telugu speaking Indian school children. Primary school children of Medak district, Telangana State, India, were recruited by a multi-stage probability sampling method. The translated questionnaire was initially pilot tested on a small subset of children (n = 40). Children with informed consent from parents (N = 1342) were then provided with questionnaires containing the Telugu translation of CPQ11-14, followed by a clinical examination conducted by a single examiner, using Basic WHO survey methods for dental caries, malocclusion, and Dean's Fluorosis index. Children (n = 161) in randomly chosen schools were re-administered the same questionnaire after a two week interval to test reliability of CPQ11-14 on repeated administrations. Internal consistency and test-retest reliability as determined by Cronbach's alpha and Intra-class correlation coefficient for overall CPQ11-14 scale were 0.925 and 0.923, respectively. CPQ11-14 discriminated between the categories of fluorosis and malocclusion while its discriminant validity with respect to dental caries was limited. CPQ11-14 also demonstrated good construct validity with both overall CPQ11-14 and its subscales having significant positive correlation with global ratings of oral health and overall wellbeing, even after adjusting for confounding variables. CPQ11-14 had a correlation of 0.405 with self-evaluated oral health and 0.407 with self-evaluated impact of oral health on overall wellbeing. In conclusion, Telugu translation of CPQ11-14 demonstrated good internal consistency and excellent reliability on repeated administrations after two weeks. It also exhibited good discriminant and construct validity.
Project description:Helicoverpa armigera is an important pest of cotton and other agricultural crops in the Old World. Its wide host range, high mobility and fecundity, and the ability to adapt and develop resistance against all common groups of insecticides used for its management have exacerbated its pest status. An understanding of the population genetic structure in H. armigera under Indian agricultural conditions will help ascertain gene flow patterns across different agricultural zones. This study inferred the population genetic structure of Indian H. armigera using five Exon-Primed Intron-Crossing (EPIC)-PCR markers. Nested alternative EPIC markers detected moderate null allele frequencies (4.3% to 9.4%) in loci used to infer population genetic structure but the apparently genome-wide heterozygote deficit suggests in-breeding or a Wahlund effect rather than a null allele effect. Population genetic analysis of the 26 populations suggested significant genetic differentiation within India but especially in cotton-feeding populations in the 2006-07 cropping season. In contrast, overall pair-wise F(ST) estimates from populations feeding on food crops indicated no significant population substructure irrespective of cropping seasons. A Baysian cluster analysis was used to assign the genetic make-up of individuals to likely membership of population clusters. Some evidence was found for four major clusters with individuals in two populations from cotton in one year (from two populations in northern India) showing especially high homogeneity. Taken as a whole, this study found evidence of population substructure at host crop, temporal and spatial levels in Indian H. armigera, without, however, a clear biological rationale for these structures being evident.
Project description:India, occupying the center stage of Paleolithic and Neolithic migrations, has been underrepresented in genome-wide studies of variation. Systematic analysis of genome-wide data, using multiple robust statistical methods, on (i) 367 unrelated individuals drawn from 18 mainland and 2 island (Andaman and Nicobar Islands) populations selected to represent geographic, linguistic, and ethnic diversities, and (ii) individuals from populations represented in the Human Genome Diversity Panel (HGDP), reveal four major ancestries in mainland India. This contrasts with an earlier inference of two ancestries based on limited population sampling. A distinct ancestry of the populations of Andaman archipelago was identified and found to be coancestral to Oceanic populations. Analysis of ancestral haplotype blocks revealed that extant mainland populations (i) admixed widely irrespective of ancestry, although admixtures between populations was not always symmetric, and (ii) this practice was rapidly replaced by endogamy about 70 generations ago, among upper castes and Indo-European speakers predominantly. This estimated time coincides with the historical period of formulation and adoption of sociocultural norms restricting intermarriage in large social strata. A similar replacement observed among tribal populations was temporally less uniform.
Project description:Zoroastrianism is one of the oldest extant religions in the world, originating in Persia (present-day Iran) during the second millennium BCE. Historical records indicate that migrants from Persia brought Zoroastrianism to India, but there is debate over the timing of these migrations. Here we present genome-wide autosomal, Y chromosome, and mitochondrial DNA data from Iranian and Indian Zoroastrians and neighboring modern-day Indian and Iranian populations and conduct a comprehensive genome-wide genetic analysis in these groups. Using powerful haplotype-based techniques, we find that Zoroastrians in Iran and India have increased genetic homogeneity relative to other sampled groups in their respective countries, consistent with their current practices of endogamy. Despite this, we infer that Indian Zoroastrians (Parsis) intermixed with local groups sometime after their arrival in India, dating this mixture to 690-1390 CE and providing strong evidence that Iranian Zoroastrian ancestry was maintained primarily through the male line. By making use of the rich information in DNA from ancient human remains, we also highlight admixture in the ancestors of Iranian Zoroastrians dated to 570 BCE-746 CE, older than admixture seen in any other sampled Iranian group, consistent with a long-standing isolation of Zoroastrians from outside groups. Finally, we report results, and challenges, from a genome-wide scan to identify genomic regions showing signatures of positive selection in present-day Zoroastrians that might correlate to the prevalence of particular diseases among these communities.
Project description:<h4>Background</h4>Major population movements, social structure, and caste endogamy have influenced the genetic structure of Indian populations. An understanding of these influences is increasingly important as gene mapping and case-control studies are initiated in South Indian populations.<h4>Results</h4>We report new data on 155 individuals from four Tamil caste populations of South India and perform comparative analyses with caste populations from the neighboring state of Andhra Pradesh. Genetic differentiation among Tamil castes is low (RST = 0.96% for 45 autosomal short tandem repeat (STR) markers), reflecting a largely common origin. Nonetheless, caste- and continent-specific patterns are evident. For 32 lineage-defining Y-chromosome SNPs, Tamil castes show higher affinity to Europeans than to eastern Asians, and genetic distance estimates to the Europeans are ordered by caste rank. For 32 lineage-defining mitochondrial SNPs and hypervariable sequence (HVS) 1, Tamil castes have higher affinity to eastern Asians than to Europeans. For 45 autosomal STRs, upper and middle rank castes show higher affinity to Europeans than do lower rank castes from either Tamil Nadu or Andhra Pradesh. Local between-caste variation (Tamil Nadu RST = 0.96%, Andhra Pradesh RST = 0.77%) exceeds the estimate of variation between these geographically separated groups (RST = 0.12%). Low, but statistically significant, correlations between caste rank distance and genetic distance are demonstrated for Tamil castes using Y-chromosome, mtDNA, and autosomal data.<h4>Conclusion</h4>Genetic data from Y-chromosome, mtDNA, and autosomal STRs are in accord with historical accounts of northwest to southeast population movements in India. The influence of ancient and historical population movements and caste social structure can be detected and replicated in South Indian caste populations from two different geographic regions.
Project description:BACKGROUND: Human genetic diversity observed in Indian subcontinent is second only to that of Africa. This implies an early settlement and demographic growth soon after the first 'Out-of-Africa' dispersal of anatomically modern humans in Late Pleistocene. In contrast to this perspective, linguistic diversity in India has been thought to derive from more recent population movements and episodes of contact. With the exception of Dravidian, which origin and relatedness to other language phyla is obscure, all the language families in India can be linked to language families spoken in different regions of Eurasia. Mitochondrial DNA and Y chromosome evidence has supported largely local evolution of the genetic lineages of the majority of Dravidian and Indo-European speaking populations, but there is no consensus yet on the question of whether the Munda (Austro-Asiatic) speaking populations originated in India or derive from a relatively recent migration from further East. RESULTS: Here, we report the analysis of 35 novel complete mtDNA sequences from India which refine the structure of Indian-specific varieties of haplogroup R. Detailed analysis of haplogroup R7, coupled with a survey of approximately 12,000 mtDNAs from caste and tribal groups over the entire Indian subcontinent, reveals that one of its more recently derived branches (R7a1), is particularly frequent among Munda-speaking tribal groups. This branch is nested within diverse R7 lineages found among Dravidian and Indo-European speakers of India. We have inferred from this that a subset of Munda-speaking groups have acquired R7 relatively recently. Furthermore, we find that the distribution of R7a1 within the Munda-speakers is largely restricted to one of the sub-branches (Kherwari) of northern Munda languages. This evidence does not support the hypothesis that the Austro-Asiatic speakers are the primary source of the R7 variation. Statistical analyses suggest a significant correlation between genetic variation and geography, rather than between genes and languages. CONCLUSION: Our high-resolution phylogeographic study, involving diverse linguistic groups in India, suggests that the high frequency of mtDNA haplogroup R7 among Munda speaking populations of India can be explained best by gene flow from linguistically different populations of Indian subcontinent. The conclusion is based on the observation that among Indo-Europeans, and particularly in Dravidians, the haplogroup is, despite its lower frequency, phylogenetically more divergent, while among the Munda speakers only one sub-clade of R7, i.e. R7a1, can be observed. It is noteworthy that though R7 is autochthonous to India, and arises from the root of hg R, its distribution and phylogeography in India is not uniform. This suggests the more ancient establishment of an autochthonous matrilineal genetic structure, and that isolation in the Pleistocene, lineage loss through drift, and endogamy of prehistoric and historic groups have greatly inhibited genetic homogenization and geographical uniformity.