What makes krakkkaz krakkk?
The next time you fight off the flu, you might want to thank your ancestors for flirting with the Neanderthal down the way. According to a pair of new studies, interbreeding between several early human species may have given us a key ingredient in fighting disease.
While scientists once scoffed at the idea that our ancestors may have mated with their “cousins,” over the last six years a growing body of evidence drawn from several large genetic sequencing projects says otherwise. Not only did our ancient ancestors interbreed with Neanderthals, but recent finds indicate they likely mated with a third ancient human species called the Denisovans as well.
And this wasn’t just a one-time thing. Studies indicate that our ancestors got it on with these other ancient humans often enough that us modern humans have inherited about 1 to 2 percent of our DNA from them, Sarah Kaplan reports for the Washington Post.
Now, scientists working on two independent studies have come to similar conclusions. Some of this DNA left over from liasons with Neanderthals and Denisovans play a big role in strengthening our immune systems to fight off infection and disease.
"At some point in history it might have been an advantage to have these Neanderthal genes in terms of fighting off infections or lethal pathogens from 10,000 years ago,” study co-author Michael Dannemann of the Max Planck Institute for Evolutionary Anthropology tells Helen Briggs for the BBC.
Dannemann and his colleagues analyzed genes from both modern humans and ancient Homo sapiens to see how our immune systems changed over the millennia. When they looked closely, they discovered several fragments of Neanderthal DNA in modern humans that are tied to our hardy immune systems. At the same time, researchers working on another, separate project at the Pasteur Institute in Paris came to similar conclusions while scanning the modern human genome for similarities to Neanderthal and Denisovan DNA, Ian Sample reports for The Guardian. Both studies were published this week in the American Journal of Human Genetics.
“A small group of modern humans leaving Africa would not carry much genetic variation,” Janet Kelso, co-author of the study from the Max Planck Institute, tells Sample. “You can adapt through mutations, but if you interbreed with the local population who are already there, you can get some of these adaptations for free.”
The findings indicate that modern humans inherited three genes in three waves, depending on when their ancestors interacted with Neanderthals and Denisovans—two from Neanderthals and one from the Denisovans. According to Lluis Quintana-Murci, who co-authored the study at the Pasteur Institute, these three genes are some of the most common Neanderthal or Denisovan DNA found in modern human beings, Sample reports.
While these genes may have helped our ancestors fight off disease, they are also responsible for a more unpleasant side effect: allergies. When these three genes gave our ancestors more protection from pathogens, they also made it likely that harmless things like pollen and grass could set off their burgeoning immune system. Sadly, that overactive immune response has been passed down along with the added protection, Megan Thielking writes for STAT.
“We see it as a trade-off,” Kelso tells Thielking.
Read more: https://www.smithsonianmag.com/smart-news/thank-neanderthals-your-immune-system-180957761/#07sd3HiTc1ut8T9k.99
Give the gift of Smithsonian magazine for only $12! http://bit.ly/1cGUiGv
Follow us: @SmithsonianMag on Twitter
Call it humanity’s unexpected U-turn. One of the biggest events in the history of our species is the exodus out of Africa some 65,000 years ago, the start of Homo sapiens‘ long march across the world. Now a study of southern African genes shows that, unexpectedly, another migration took western Eurasian DNA back to the very southern tip of the continent 3000 years ago.
According to conventional thinking, the Khoisan tribes of southern Africa, have lived in near-isolation from the rest of humanity for thousands of years. In fact, the study shows that some of their DNA matches most closely people from modern-day southern Europe, including Spain and Italy.
Because Eurasian people also carry traces of Neanderthal DNA, the finding also shows – for the first time – that genetic material from our extinct cousin may be widespread in African populations.
The Khoisan tribes of southern Africa are hunter-gatherers and pastoralists who speak unique click languages. Their extraordinarily diverse gene pool split from everyone else’s before the African exodus.
“These are very special, isolated populations, carrying what are probably the most ancient lineages in human populations today,” says David Reich of Harvard University. “For a lot of our genetic studies we had treated them as groups that had split from all other present-day humans before they had split from each other.”
So he and his colleagues were not expecting to find signs of western Eurasian genes in 32 individuals belonging to a variety of Khoisan tribes. “I think we were shocked,” says Reich.
The unexpected snippets of DNA most resembled sequences from southern Europeans, including Sardinians, Italians and people from the Basque region (see “Back to Africa – but from where?“). Dating methods suggested they made their way into the Khoisan DNA sometime between 900 and 1800 years ago – well before known European contact with southern Africa (see map).
Archaeological and linguistic studies of the region can make sense of the discovery. They suggest that a subset of the Khoisan, known as the Khoe-Kwadi speakers, arrived in southern Africa from east Africa around 2200 years ago. Khoe-Kwadi speakers were – and remain – pastoralists who make their living from herding cows and sheep. The suggestion is that they introduced herding to a region that was otherwise dominated by hunter-gatherers.
Reich and his team found that the proportion of Eurasian DNA was highest in Khoe-Kwadi tribes, who have up to 14 per cent of western Eurasian ancestry. What is more, when they looked at the east African tribes from which the Khoe-Kwadi descended, they found a much stronger proportion of Eurasian DNA – up to 50 per cent.
That result confirms a 2012 study by Luca Pagani of the Wellcome Trust Sanger Institute in Hinxton, UK, which found non-African genes in people living in Ethiopia. Both the 2012 study and this week’s new results show that the Eurasian genes made their way into east African genomes around 3000 years ago. About a millennium later, the ancestors of the Khoe-Kwadi headed south, carrying a weaker signal of the Eurasian DNA into southern Africa.
The cultural implications are complex and potentially uncomfortably close to European colonial themes. “I actually am not sure there’s any population that doesn’t have west Eurasian [DNA],” says Reich.
“These populations were always thought to be pristine hunter-gatherers who had not interacted with anyone for millennia,” says Reich’s collaborator, linguist Brigitte Pakendorf of the University of Lyon in France. “Well, no. Just like the rest of the world, Africa had population movements too. There was simply no writing, no Romans or Greeks to document it.”
Twist in tale
There’s one more twist to the tale. In 2010 a research team – including Reich – published the first draft genome of a Neanderthal. Comparisons with living humans revealed traces of Neanderthal DNA in all humans with one notable exception: sub-Saharan peoples like the Yoruba and Khoisan.
That made sense. After early humans migrated out of Africa around 60,000 years ago, they bumped into Neanderthals somewhere in what is now the Middle East. Some got rather cosy with each other. As their descendants spread across the world to Europe, Asia and eventually the Americas, they spread bits of Neanderthal DNA along with their own genes. But because those descendants did not move back into Africa until historical times, most of this continent remained a Neanderthal DNA-free zone.
Or so it seemed at the time. Now it appears that the Back to Africa migration 3000 years ago carried a weak Neanderthal genetic signal deep into the homeland. Indeed one of Reich’s analyses, published last month, found Neanderthal traces in Yoruba DNA (Nature, DOI: 10.1038/nature12886).
In other words, not only is western Eurasian DNA ancestry a global phenomenon, so is having a bit of Neanderthal living on inside you.
Journal reference: PNAS, DOI: 10.1073/pnas.1313787111
Back to Africa – but from where?
Reich and his colleagues found that DNA sequences in the Khoisan people most closely resemble some found in people who today live in southern Europe. That, however, does not mean the migration back to Africa started in Italy or Spain. More likely, the migration began in what is now the Middle East.
We know that southern Europeans can trace their ancestry to the Middle East. However, in the thousands of years since they – and the ancestors of the Khoisan – left the region, it has experienced several waves of immigration. These waves have had a significant effect on the genes of people living in the Middle East today, and and means southern Europeans are much closer to the original inhabitants of the Levant than modern-day Middle Easterners.
Skin pigmentation is one of the most variable phenotypic traits in humans. A non-synonymous substitution (rs1426654) in the third exon of SLC24A5 accounts for lighter skin in Europeans but not in East Asians. A previous genome-wide association study carried out in a heterogeneous sample of UK immigrants of South Asian descent suggested that this gene also contributes significantly to skin pigmentation variation among South Asians. In the present study, we have quantitatively assessed skin pigmentation for a largely homogeneous cohort of 1228 individuals from the Southern region of the Indian subcontinent. Our data confirm significant association of rs1426654 SNP with skin pigmentation, explaining about 27% of total phenotypic variation in the cohort studied. Our extensive survey of the polymorphism in 1573 individuals from 54 ethnic populations across the Indian subcontinent reveals wide presence of the derived-A allele, although the frequencies vary substantially among populations. We also show that the geospatial pattern of this allele is complex, but most importantly, reflects strong influence of language, geography and demographic history of the populations. Sequencing 11.74 kb of SLC24A5 in 95 individuals worldwide reveals that the rs1426654-A alleles in South Asian and West Eurasian populations are monophyletic and occur on the background of a common haplotype that is characterized by low genetic diversity. We date the coalescence of the light skin associated allele at 22–28 KYA. Both our sequence and genome-wide genotype data confirm that this gene has been a target for positive selection among Europeans. However, the latter also shows additional evidence of selection in populations of the Middle East, Central Asia, Pakistan and North India but not in South India.
Human skin color is one of the most visible aspects of human diversity. The genetic basis of pigmentation in Europeans has been understood to some extent, but our knowledge about South Asians has been restricted to a handful of studies. It has been suggested that a single nucleotide difference in SLC24A5 accounts for 25–38% European-African pigmentation differences and correlates with lighter skin. This genetic variant has also been associated with skin color variation among South Asians living in the UK. Here, we report a study based on a homogenous cohort of South India. Our results confirm that SLC24A5 plays a key role in pigmentation diversity of South Asians. Country-wide screening of the variant reveals that the light skin associated allele is widespread in the Indian subcontinent and its complex patterning is shaped by a combination of processes involving selection and demographic history of the populations. By studying the variation of SLC24A5 sequences among a diverse set of individuals, we show that the light skin associated allele in South Asians is identical by descent to that found in Europeans. Our study also provides new insights into positive selection acting on the gene and the evolutionary history of light skin in humans.
Citation: Basu Mallick C, Iliescu FM, Möls M, Hill S, Tamang R, Chaubey G, et al. (2013) The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent. PLoS Genet 9(11): e1003912. https://doi.org/10.1371/journal.pgen.1003912
Editor: Joshua M. Akey, University of Washington, United States of America
Received: April 22, 2013; Accepted: September 7, 2013; Published: November 7, 2013
Copyright: © 2013 Basu Mallick et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This study was mainly supported by Tartu University grant (PBGMR06901) to TK and European Union European Regional Development Fund through the Centre of Excellence in Genomics to Estonian Biocentre and University of Tartu to RV, CBM, MMe, GH and GC. Other supporting funding sources were ERC Starting Investigator grant (FP7 - 261213) to TK; Estonian Basic Research grant SF0270177As08 to RV; European Commission grant (ECOGENE 205419) to MMe, GH and RV; Estonian Science Foundation grant (8973) to MMe and UK-India Education Research Initiative (RG47772) grant to MML, KT, TK, FMI, IGR and FC. SYWH was supported by the Australian Research Council. LS and KT were supported by the Council of Scientific and Industrial Research, Government of India. FMI was supported by a studentship from the Darwin Trust of Edinburgh. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Human skin color varies widely among and within populations and is a classic example of adaptive evolution. Skin pigmentation in humans is largely determined by the quantity and distribution of the pigment melanin, which is packed in melanosomes and then transferred from melanocytes (melanin-forming cells) to the surrounding epidermal keratinocytes . Human melanin is primarily composed of two distinct polymers: eumelanin (brown/black) and pheomelanin (yellow/red), which differ in their physical properties and chemical composition . In addition to the amount and type of melanin, other factors such as the size, shape, number, and cellular distribution of melanosomes also contribute to the variation in skin color . Comparative studies of model organisms, pigmentation disorders and genome-wide studies have played a key role in the identification of human pigmentation genes –. A total of 378 candidate loci, including 171 cloned genes, are currently recorded in the Color Genes database ( http://www.espcr.org/micemut/ ), yet only a few of them have been confirmed to have potentially function-altering polymorphisms in humans.
A significant correlation between skin color and ultraviolet radiation (UVR) levels observed at the global scale suggests that natural selection plays an important role in determining the distribution of this phenotypic trait . The evolution of dark skin at low latitudes has been mainly accredited to the requirement of photo-protection against UVR which causes sunburn and skin cancer, whereas the evolution of light skin has been most commonly associated with vitamin D deficiency , . It has been proposed that as humans started to colonize higher latitudes, where UVR levels were lower, dark skin could not absorb sufficient UVR for efficient vitamin D synthesis, hence natural selection favored the evolution of light skin , . This is indirectly supported by the observation that candidate pigmentation genes are collectively enriched by high-FST single-nucleotide polymorphisms (SNP) –. Furthermore, data mining of publicly available datasets, such as HapMap, Perlegen and Human Genome Diversity Project (HGDP), has provided evidence of selection signals in pigmentation-related genes in one or more populations (see  and references therein),  thus elucidating the history of human adaptation to local environments for this complex trait.
One of the key pigmentation genes in humans is SLC24A5 (OMIM 609802). It is located on chromosome 15q21.1 and encodes a protein called NCKX5. The association of this gene with lighter pigmentation was initially discovered in zebrafish . Using admixed populations, it was further demonstrated in this study  that a non-synonymous variant (ref SNP ID: rs1426654) in the third exon of this gene explains 25–38% of the skin color variation between Europeans and West Africans. The ancestral (G) allele of the SNP predominates in African and East Asian populations (93–100%), whereas the derived (A) allele is almost fixed in Europe (98.7–100%) . Functional assays of this gene suggested its direct involvement in human melanogenesis through cation-exchange activity , . However, the fact that the ancestral (G) allele is virtually fixed not only in Africans but also in East Asians suggests that light skin at high latitudes evolved independently in East and West Eurasia . Genome-wide scans have also identified SLC24A5 as one of the most important “hot spots” for positive selection in Europeans, thereby supporting the role of natural selection acting on this gene , , .
Populations of South Asia live at lower latitudes than would be expected to require selection for lighter skin color on the basis of improved vitamin D synthesis . Nevertheless, South Asians do exhibit a wide variation in skin color . Two previous studies have assessed the genetics of skin pigmentation variation in expatriates from South Asia. The first of these  concluded that non-synonymous variants at three genes, SLC24A5, SLC45A2 (OMIM 606202), and TYR (OMIM 606933), collectively contribute to variation in skin pigmentation in South Asians, with SLC24A5 showing the largest effect. The second study on common disease variants suggested high prevalence of the light skin associated allele of SLC24A5 in Asian Indians . Nevertheless, both the studies involved populations that were structured and represented only a small range of the vast ethnic and genetic landscape of South Asia. Hence, comprehensive assessment of this phenotypic trait in native populations of South Asia has been lacking so far.
Therefore, in the present study, we sought to address the following objectives. First, we aimed to quantify the amount of skin pigmentation variation that can be explained by the rs1426654 SNP of SLC24A5 in a homogeneous cohort of 1228 individuals from South Asia. Second, we studied the geospatial pattern of rs1426654-A allele in the Indian subcontinent using 1573 individuals from 54 populations and investigated how various factors influence its distribution. Third, we aimed to uncover the fine-scale genetic variation of SLC24A5 and determined the coalescence age of rs1426654 by resequencing 11.74 kb in a diverse set of 95 individuals. Lastly, we assessed whether SLC24A5 resequencing data and genome-wide genotype data were in concordance with the earlier reported evidence of positive selection in Europeans, and tested for any further evidence of selection among the studied populations. Our results confirm that rs1426654 plays a key role in pigmentation variation, while in-depth study of the light skin associated allele (rs1426654-A) among Indian populations reveals that the genetic architecture of skin pigmentation in South Asia is quite complex. The present study also provides important insights on evidence of positive selection and the evolutionary history of this light skin associated allele.
Variation of melanin index in South Asia and its association with rs1426654 SNP
Phenotypic assessment of melanin index (MI) across 1674 individuals from two distinct cohorts, Cohort A and Cohort B (see Materials and Methods; Tables 1, S1 and S2) demonstrated a wide variation in skin color (MI 28–79) in South Asia. Comparison with published datasets for the regions of the world revealed that the observed range in South Asians was three times greater than that in East Asians and Europeans and comparable to that of Southeast Asians (Table 1). Notably, Cohort A (n = 1228) which included individuals from three closely related agricultural castes of Andhra Pradesh in South India, shows remarkable variation in skin color (MI 30–64), similar to heterogeneous pool of samples in Cohort B (MI 28–79).
We tested the association of the rs1426654 SNP with pigmentation differences between the low (MI<38) and high (MI>50) MI groups of Cohort A (Figure 1A), using a logistic regression model. A likelihood-ratio test to discern the association of the rs1426654 SNP to skin pigmentation, in addition to the influence of sex and population (caste), showed a highly significant effect of rs1426654 genotype on skin pigmentation (p = 2.4×10−31) with an odds ratio of 26.2 (95% CI 12–67.5) for the A allele. Furthermore, the cross-validated Area Under the Curve (AUC) score of 0.83 suggested that this model has a high discrimination power between the low and high MI groups. In summary, most of the pigmentation differences observed between the low and high MI groups could be explained by the rs1426654 SNP.
(A) Distribution of melanin index (MI) in 1228 individuals of Cohort A. The two dotted black lines represent approximately 10% thresholds for the low (MI<38) and high (MI>50) MI groups, which were used to assess genotype-phenotype association using a logistic regression model. (B) Distribution of mean melanin index for the genotypes of rs1426654. The mean melanin indices for each genotype, as obtained separetely for males and females are shown together with their 95% confidence intervals, as estimated by multiple imputation model (Table S3A).
We further aimed to estimate the effect size of the SNP. However, direct estimation of the effect size based on the samples genotyped from high or low MI group of Cohort A would only allow us to assess the effect of genotype for the extremes of pigmentation phenotype rather than for the whole distribution. Therefore, to estimate how much variation in MI could be explained by the rs1426654 SNP if all 1228 individuals in Cohort A had been genotyped, we used a multiple imputation approach based on simulations. The distribution of estimated mean MI across the genotypes, as obtained separately for males and females from the imputed dataset, is presented in Figure 1B and Table S3A. We observed that the estimated mean MI for each genotype in females was lower than that of males (Table S3A). Analysis of the imputed datasets using a General Linear Model (GLM) revealed that the effect of genotype was highly significant (p<1×10−16). Notably, the total variation in pigmentation (R2) that can be explained by the full model (including sex and genotype) was calculated to be 29% (95% CI, 24–34), while that by the SNP alone was 27% (95% CI, 22–32).
Besides the quantitative assessment of the effect size, we found that the effect of the SNP was not exactly additive. Individuals with GG genotypes were darker than expected under the additive model (Table S3B). This result is consistent with the similar mode of inheritance observed in SLC24A5 by Lamason  and in other pigmentation genes, such as KITLG (OMIM 184745) and SLC45A2 , .
Similar to Cohort A, our genotype-phenotype association tests on heterogeneous populations of Cohort B (Table S2), using a GLM after adjusting for sex and population, revealed that the effect of genotype was significant (p = 3.24×10−8). However, unlike Cohort A, where we did not observe any significant difference in mean MI of three castes (p = 0.65), the effect of population in Cohort B was highly significant (p<2.2×10−16).
Geospatial distribution of rs1426654-A allele and its correlation with geography, language and ancestry component
In an attempt to map the geospatial pattern of rs1426654-A allele frequencies across South Asia, we genotyped 1054 individuals across 43 ethnic groups including major language groups and geographic regions (see Materials and Methods, Cohort C) from the Indian subcontinent. In summary, 1573 individuals from 54 distinct tribal and caste populations from all the three cohorts (A, B and C) were assessed for this polymorphism (Table S4; Figure S1). We found that the rs1426654-A allele is widely present throughout the subcontinent, although its frequency varies substantially among populations (0.03 to 1) with an average frequency of 0.53±0.32 (Table S4). To explain how the various genetic and non-genetic factors affect the geospatial distribution of the rs1426654-A allele in the Indian subcontinent, we assessed the correlation of rs1426654-A allele frequency with major geographical divisions, language families and the ancestry component detected in previous studies of Indian populations , . However, to avoid bias due to low sample sizes in some of the populations, only data from 1446 individuals representing 40 populations were used (Table S5).
Although we observe a considerable local heterogeneity, there is a general trend of rs1426654-A allele frequency being higher in the Northern (0.70±0.18) and Northwestern regions (0.87±0.13), moderate in the Southern (0.55±0.22), and very low or virtually absent in Northeastern populations of the Indian subcontinent (Figure 2, Table S6). Notably, the Onge and the Great Andamanese populations of Andaman Islands also showed absence of the derived-A allele. Given the fact that one can observe a pronounced latitudinal cline for skin pigmentation across world populations, we also sought to test the observed derived-A allele frequencies in terms of absolute latitude and longitude in South Asia. We found that the rs1426654-A allele frequency in South Asia does not significantly correlate with latitude (r = 0.23, p = 0.15). However, a significant negative correlation with longitude (r = −0.49; p = 0.002) was observed.
The map has been drawn based on rs1426654-A allele frequencies of 2763 subjects obtained from published datasets (Table S7) and 1446 individuals from the present study (Table S5). Red dots correspond to the sampling locations.
We found that the Tibeto-Burman and the Austroasiatic language families have the lowest frequencies of the A allele ( and Table S6). The rs1426654-A allele frequency was significantly higher in Indo-European speakers than in other language groups (Table S6). In particular, there was a significant difference (p<0.001) between the A allele frequencies of the Indo-European and the Dravidian speaking groups. We found that both language and geography have a significant influence on rs1426654-A allele frequency, as revealed by Mantel tests (p<0.001).
We also studied the geospatial pattern of rs1426654-A allele frequencies at the global level using 2763 subjects from previously published data (Table S7) and 1446 individuals from the present study (Table S5). The isofrequency map illustrates high frequencies of the rs1426654-A allele in Europe, Middle East, Pakistan, moderate to high frequencies in Northwest and Central Asia, while being almost absent in East Asians and Africans with notable exceptions in Bantu (Southwest), San, Mandeka, and Ethiopians (Table S7, Figure 2). As rs1426654-A allele frequency was found to be higher in West Eurasian populations that are known to share one of the genome-wide ancestry components of South Asia , , we sought to test the correlation between the derived-A allele frequency and the proportion of the West Eurasian ancestry component (as depicted by the “light green component” in ) for the studied populations. For this, we used the genome-wide information available on Indian populations from literature – (Table S8) and relevant global reference populations to perform the ADMIXTURE run. Population structure as inferred by ADMIXTURE analysis at K = 7 is shown in Figure S2A. The proportions of k5 light green ancestry component obtained at K = 7 for the populations studied were plotted against the rs1426654-A allele frequency available for all populations and South Asia in particular (Figure S2B). As shown in Figure S2B, we obtained a significant positive correlation for South Asian populations (r = 0.90, p<0.0001) but a weak, although significant correlation when all populations sharing the k5 component (r = 0.64, p = 0.04) were considered.
Fine-scale genetic variation of SLC24A5
We resequenced 11.74 kb of SLC24A5 (Figure 3), covering all the nine exons (1617 bp), introns (5797 bp), 5′ flanking (4150 bp), and 3′ flanking (177 bp) regions (Figure 3) in a global sample set of 95 individuals (see Materials and Methods) grouped into 8 broad geographic regions. A total of 60 variable sites (including 23 singletons), one insertion, and one tetranucleotide repeat were identified with derived allele frequencies ranging from 0.005 to 0.39. Results of the resequencing study for these variable sites are presented in Table S9. According to dbSNP ( http://www.ncbi.nlm.nih.gov/projects/SNP/ ) build 137 (June 2012), 21 of these 62 identified variants were novel. The insertion present in the 5′ flanking region (position 48411803) was confined to two San individuals (San 15 and San 17). Comparison of polymorphic sites across different regions revealed that the exons of SLC24A5 are highly conserved in humans. We detected only two variable positions within exons, with rs1426654 being the only non-synonymous SNP. The other variant, a synonymous (Ser-Ser) mutation identified at exon 7 at position 48431227, was shared by four Africans. In contrast to low variation in the exonic region, a highly polymorphic tetranucleotide repeat (GAAA) was observed in the 5′ flanking region (GAAA-GA-GAAA-GAAAAA-(GAAA)n-GAAAAA-GAAAA) at position 48412029. These repeats varied from 3 to 12 copies. A detailed analysis of the repeats did not reveal any correlation with the geographical origin of the samples or the haplogroups studied, in general (Table S10). However, chromosomes belonging to haplogroup H (Figure S3), defined by the rs1426654-A allele, were associated with larger repeat lengths (7–13), albeit this association was not restricted only to them (Table S10).
Exons of the gene are shown in yellow, introns in blue and 5′ flanking region in pink. The black lines underneath the gene show the regions resequenced in this study (total of 11741 bp) spanning 25674 bp. rs1426654 is the functional SNP located in the third exon.
The nucleotide diversity estimated for the consensus resequenced region (11741 bp) was observed to be 0.00042±0.00004 (with Jukes-Cantor correction), which is low compared to the average of 0.00071±0.00042 for 647 genes resequenced in the NIEHS SNP database ( http://egp.gs.washington.edu/ ). A sliding window approach based on similar measures (window size = 100 bp, step size = 25 bp) for the 5′ flanking region (4150 bp) sequenced revealed that the 2726–2875 region demonstrates the highest nucleotide diversity of 0.00651 (Figure S4). Various molecular diversity indices studied for the eight geographical groups are presented in Table S11 and Figure S5. Average pairwise differences observed among and within 8 different geographical regions using 11741 bp sequence data are summarized in Figure 4. Populations from regions previously reported to exhibit a high frequency of the rs1426654-A allele (North Africa and Middle East, Central Asia, South Asia and Europe; see Figure 2) show low levels of intra- and inter-population diversity in the resequenced region (Figure 4, Table S11).
The upper triangle of the matrix (green) shows average pairwise differences between populations (PiXY). The average number of pairwise differences (PiX) within each population is shown along the diagonal (orange). The lower triangle of the matrix (blue) shows differences between populations based on Nei's distance, i.e., corrected average pairwise differences (PiXY−(PiX+PiY)/2).
Evidence for positive selection
We tested if our sequence data supports the well-documented evidence of positive selection for SLC24A5 in previous studies , , , , ,  and whether it provides any additional evidence of selection. None of the populations tested showed significant departure from neutrality, except for Europeans, who had negative Tajima's D (p = 0.02) and Fu and Li's F* (p = 0.04) as estimated from calibrated population genetic models using COSI (Table S12).
Hence, these observations confirm that SLC24A5 has been under strong selective pressure in Europeans. In addition to this, we also performed haplotype-based selection tests based on genome-wide data (see Materials and Methods) of 1035 individuals including 145 Indians. XP-EHH scores demonstrated that SLC24A5 ranks among the top 10 candidate genes for positive selection in Europe, Middle East and Pakistan, and among the top 1% in Central Asia, Iran and North India (Table S13). Likewise, scores from our iHS analysis had significant empirical p-values for Central Asia and North India (Table S13). It is interesting to note that both of our haplotype-based selection tests demonstrated evidence of positive selection in North Indians, but no such evidence of positive selection was found in South Indians (Table S13). The difference in detecting selection signals from genotype and sequence data has also been pinpointed in a previous study .
Phylogenetic analysis and coalescence age estimates
Firstly, a phylogenetic tree was drawn on the basis of common variants observed in our worldwide resequencing data (11.74 kb) of 95 individuals. The schematic tree representing the 8 most common haplogroups is shown in Figure S3. Haplogroup G was the most common and geographically widely spread clade, being found in 7 of the 8 geographical groups examined. Haplogroup C was confined to sub-Saharan Africans only, while the rest of the observed haplogroups were shared between African and non-African populations. We conclude that all of the 73 phased chromosomes (from Europe, sub-Saharan Africa, Middle East, South Asia, North and Central Asia) with the rs1426654-A allele form a monophyletic group because they share the same haplotype background regardless of their geographic origin. In other words, all carriers of the mutation in our global sample share it by descent. The presence of the derived A allele in sub-Saharan Africa, although in low frequencies (2/73 - one heterozygous Mandeka and one heterozygous San individual) (Figure S3) is consistent with earlier findings .
We estimated the coalescence time of the rs1426654 mutation at 28,100 years (95% CI - 4,900 to 58,400 years) using BEAST. Using the same mutation rate, the coalescent age estimated by rho statistics was 21,702 years ±10,282 years. Despite the different assumptions used in the two coalescent age estimation methods, both the age estimates show substantial overlap.
Effect size of the rs1426654 SNP and its association with pigmentation variation in South Asia
A number of previous studies have focused on admixed populations in the search for genes that determine skin pigmentation variation in humans , , –. Our formal tests for association, using a large homogenous population from South India (Cohort A) as well as a heterogeneous pool of samples across India (Cohort B), demonstrated a highly significant effect of SLC24A5 on skin pigmentation. Further analysis of Cohort A revealed that this SNP determines most of the variation between the pigmentation extremes and contributes about 22–32% of the total skin color variation, thus suggesting that SLC24A5 plays a key role in the pigmentation diversity observed among South Asians.
Furthermore, confounding effect of population structure on the genetics of skin pigmentation, evident in Cohort B suggests that the marked population substructure of South Asians must be taken into account when genetic association studies are conducted in these populations.
The spread of rs1426654-A allele in South Asia
Our extensive survey of rs1426654-A allele frequency in the Indian subcontinent reveals an average frequency of 0.53 with a substantial variation among populations, ranging from 0.03 to 1 (Table S4). This finding stands in contrast to the previous understanding of the spread of this allele, where a study  based on a cohort of 15 Indian ethnic groups sampled in the US (n = 576), estimated the average A allele frequency at 0.86, with a relatively low level of variation among populations (observed range 0.70 to 1). The most plausible cause of this discordance might be that fewer populations were included in the former study and the groups were defined by their generic linguistic affiliation in major branches of the Dravidian and Indo-European languages, rather than by finer resolution of the endogamous units. Notably, in the subset of 8 populations that could be characterized on a similar basis in both studies, the estimates of A allele frequencies did not diverge significantly in their combined averages (Table S14). Therefore, these comparisons suggest that sampling strategies are pivotal in determining the extent of genetic diversity observed in Indian populations and that sampling of expatriates may have a homogenizing effect. Moreover, the expatriates are known to represent mainly urban populations of India, which constitute only 30% (Census 2011; http://censusindia.gov.in/ ) of the total population of the subcontinent, and therefore are unlikely to be representative of the wealth of genetic variation harbored within the subcontinent.
Factors shaping the complex pattern of the rs1426654-A allele in South Asia
Our quest to determine whether and to what extent the distribution of the rs1426654 derived- A allele frequency in South Asian populations correlates with language and/or geography revealed that both of these variables have a significant predictive value on allele frequencies. In particular, we found that although frequencies among populations studied vary considerably, this polymorphism has an evident geographic structure with higher frequencies of the derived allele in North and Northwest regions and a declining pattern as one moves further South and East (Table S5, Figure 2). However, when we plotted the rs1426654-A allele frequency against the geographical coordinates of our sampled populations, we found a significant correlation with longitude but not with latitude. The lack of a clear latitudinal (North-South) cline in the A allele frequency, which would have been expected under the model of natural selection, could be partly explained by the complexity of the South Asian genetic landscape, influenced by differences in population histories shaped by various micro-level migrations within the subcontinent, strict endogamy and social barriers. For example, Saurashtrians, who migrated from “Saurashtra” region of Gujarat to South India (Madurai) for work, have a relatively high rs1426654-A allele frequency of 0.70. It is believed that those Saurashtrians presently dwelling in Madurai were invited by Nayak kings for their expertise in silk-weaving . Similarly, Toda have higher A allele frequency (0.86) compared to Kurumba (0.20), their geographical neighbors, most likely due to their higher proportion of West Eurasian ancestry which is supported by Y chromosome evidence . Notably, Brahmins, irrespective of their geographic source (North, Central or South India) have higher A allele frequency (Table S5). Conversely, the higher longitudinal correlation could be due to the fact that Tibeto-Burman and Austroasiatic speakers are characterized by very low A allele frequency (Table S6) because of their East Asian ancestry , . Therefore, their inclusion in our sampling might have resulted in the inflation of the longitudinal correlation coefficient.
Coalescence age estimate of the rs1426654-A allele
Although the last decade has witnessed significant improvement in the understanding of the genetic basis of skin pigmentation, our knowledge about the exact mechanisms behind the evolution of light skin in humans is still incomplete. The genetic evidence that has accumulated till date suggests a complex evolutionary history for skin pigmentation. It has been argued that natural selection in response to UVR had a causative role in the evolution of light skin color at high latitudes , , . Evidence of population-specific signatures of selection of pigmentation genes at different timescales suggests that the evolution of light skin was not a one-step process ,  but a consequence of multiple events or episodes during human evolution. It appears that some of the mutations which have been associated with light skin started to accumulate relatively early in modern human history in the proto-Eurasian populations following the Out-of-Africa expansion, whereas other mutations arose after the divergence of East and West Eurasian populations , , , .
Hence, studies focusing on the time-scale of genetic changes in pigmentation genes are vital for understanding the complex evolutionary history of human skin pigmentation. Therefore, in this study, we focused on providing an age estimate of the rs1426654 mutation, which has a major effect on skin pigmentation in West Eurasian and South Asian populations. Notably, previous studies providing age estimates for this locus have been mostly confined to the estimation of onset of selective sweep rather than the coalescence time of the mutation. A study of extended haplotype homozygosity in HapMap populations estimated that the most intense signals of selection detected in European and East Asian populations are found in haplotypes which extend 0.52 cM on average in length . Assuming a star-shaped genealogy and a generation time of 25 years, the authors dated the peak of these signals to ∼6.6 KYA . They also observed that the second-longest haplotype (1.15 cM) in Europe includes SLC24A5, where rs1426654-A was found to be fixed. Using the same formula used by Voight  to date the average peaks of selection signals in Europe and East Asia, the selective sweep specifically at SLC24A5 in the HapMap European sample can be dated to ∼3 KYA. Besides this, a recent study by Beleza , focusing on analyses of diversity in microsatellite loci, estimated that the selective sweep at SLC24A5 occurred around 11.3 KYA (95% CI, 1–55.8 KYA) and 18.7 KYA (5.8–38.3 KYA) under additive and dominant models, respectively .
Our Bayesian coalescent age estimate of the rs1426654-A allele at ∼28 KYA (95% HPD, 5–58 KYA), as well as the rho-based estimate at 21.7 (±10.3) KYA, are older in their point estimates than both of the above selective sweep date estimates, although these age estimates have broad and overlapping error margins. This finding is not surprising because sweeps can also operate on standing variation. Besides this, both our rho-based point estimate and Bayesian mean age estimate postdate the estimated time of the split between Europeans and Asians calculated by Scally  using a similar mutation rate. Although our confidence intervals cannot rule out entirely the possibility of older dates (>28 KYA), our findings are broadly consistent with the evolutionary model of skin pigmentation proposed in earlier studies , , . It appears that the most plausible scenario is that light skin evolved as an adaptation to local environmental conditions as humans started moving to northerly latitudes, with the initial phase of skin lightening occurring in proto Eurasian populations, while genetic variation in SLC24A5 formed the later phase which led to lighter skin in Europeans and South Asians, but not East Asians. This was followed by a European-specific selective sweep, which favored the rapid spread of this mutation in these populations. Our coalescence age estimates of 28 KYA (95% HPD 5–58 KYA) show wide margins, also evident in the earlier sweep date estimates for the gene . This can be due to the fact that the power of our analysis was limited by the need to reduce our sequence range to a subset of sites from a region with sufficiently high LD around the rs1426654-A allele and very low level of sequence variation. Therefore, we speculate that narrowing down the coalescence age estimates and specifying the geographic source of the rs1426654-A allele will depend rather on the success of ancient DNA studies than on more extensive sequencing.
Evidence for positive selection
Earlier studies have highlighted SLC24A5 as one of the top candidate genes demonstrating evidence for positive selection in Europeans , , , , ,  and in Middle Eastern and Pakistani populations from South Asia ,  on the basis of either FST or extended haplotype homozygosity from genotype data. Here, relying on our previous scans of extended haplotype homozygosity on Indian populations , we note that both XP-EHH and iHS suggest that positive selection has occurred in North Indian (within top 5% and top 1% respectively) but not in South Indian populations. One possible explanation for the regional differences in empirical ranks of the SLC24A5 in India could be the “melanin threshold” hypothesis . According to this hypothesis, natural selection affects the variation in pigmentation phenotype only up to a certain adaptive optimum, beyond which individuals may show variation that is subject to other factors such as admixture, genetic drift etc. However, differently from the expectations of this hypothesis, we do observe high range of melanin indices both in North and South Indian populations of Cohort B (Table S2). Furthermore, the high positive correlation of rs1426654-A allele with the light-green South Asian ancestry component (Figure S2A) advocates that the rs1422654-A allele frequency patterns in India could be also explained by demographic history of the populations in addition to selection. It is also possible that while XP-EHH and iHS tests have increased power to detect selection signatures associated with high allele frequencies, the low ranking position of SLC24A5 in selection scans of South Indians is due to the overall lower frequency of the rs1422654-A allele.
Therefore, the complex patterning of light skin allele in India and its correlation with geography, language, and ancestry component observed in the present study, portrays an interesting interplay between selection and demographic history of the populations. This stands in contrast to Europe where the frequency of the light skin associated allele of SLC24A5 has almost reached to fixation and seems to be attributable solely to natural selection. This aspect of skin pigmentation variation observed in South Asians is pivotal in understanding the different mechanisms that contribute to the global skin pigmentation variation and in further understanding of this complex phenotypic trait.
To summarize, we have provided evidence using a homogeneous cohort that the rs1426654 SNP plays a key role in skin pigmentation variation in South Asia. We have shown that the rs1426654-A allele is widespread in the Indian subcontinent and its complex pattern is a result of combination of processes involving selection and demographic history of populations, influenced by their linguistic and geographic affiliations. Phylogenetic analyses of resequencing data confirm that the rs1426654-A allele in West Eurasian and South Asian populations occurs on the same haplotype background. Both sequence and genome-wide genotype data confirm evidence of positive selection in Europeans, while the latter supports further evidence of selection in populations of Middle East, Pakistan, Central Asia and North India but not in South India. We date the coalescence of the light skin allele (rs1426654-A) to 22–28 KYA (95% CI, 5–58 KYA). However, since this allele has become fixed in many populations across its current distribution, we propose that ancient DNA research might have greater potential to improve our understanding of when and where it first appeared.
Materials and Methods
This study was approved by the Research Ethics Committee of the Estonian Biocentre, Tartu, Estonia and the Institutional Ethical Committee (IEC) of the Centre for Cellular and Molecular Biology, Hyderabad, India. All recruited individuals were >18 years of age and their ethnic origin was determined via personal interviews. Written informed consent was obtained from all participants.
Skin pigmentation was measured using DermaSpectrometer (Cortex Technology, Hadsund, Denmark). Erythema (E) and melanin index (MI) readings were taken from the upper inner arm (medial aspect) . For a subset of individuals, additional measurements were taken from the forehead, representing the most tanned or sun-exposed region of the skin. However, only MI readings from the upper inner arm were used for association analyses. DNA was isolated either from blood or saliva (using Oragene DNA kits, Canada). The study involved three distinct cohorts, A, B and C. Sampling locations of these cohorts are shown in Figure S1.
Cohort A included 1228 randomly recruited individuals from three major agricultural castes (Kapu, Naidu and Reddy) of Andhra Pradesh, India. For all the above individuals, MI readings were taken from the right and left upper inner arm and their mean was calculated to determine each individual's MI. Following the phenotypic screening, thresholds were set for the “low” (MI<38) and “high” (MI>50) MI groups respectively, representing approximately the top and bottom 10% of the MI distribution, for collection of DNA samples (Figure 1A). Eighty-four out of 120 individuals from the low MI group and 102 out of 127 individuals from the high MI group were genotyped successfully. The 10% threshold was implemented after an initial pilot study, following which the values were continuously redefined as the sample collection progressed. Consequently, during the fieldwork, DNA from 56 individuals was collected outside the determined thresholds (MI 38–50). Therefore, in summary, 242 individuals (189 males, 53 females) from this cohort were genotyped for the rs1426654 SNP (Table S1).
Cohort B comprised of 446 individuals, including 10 caste and tribal populations of Tamil Nadu, Maharashtra and Haryana states of India. For each individual, three readings of MI were taken from the right upper inner arm and the values were averaged. Out of these, 277 individuals (246 males and 31 females) were genotyped (Table S2).
Cohort C included 1054 individuals, representing 43 endogamous populations from different ethnic backgrounds, language families (Dravidian, Indo-European, Austroasiatic, Tibeto-Burman speakers), castes, tribes, with their geographical locations covering most of the states. No records for MI were available for this cohort.
In summary, 1573 individuals from 54 distinct tribal and caste populations including all the three cohorts (A, B and C) were assessed for the rs1426654 polymorphism (Table S4 and Figure S1). A detailed description of the geographic location, linguistic affiliation and socio-cultural background of each cohort is given in Tables S1, S2 and S4S4. Populations from Cohort A and Cohort B with MI readings were used for genotype-phenotype analyses and genotyping results from all three cohorts (A, B and C) were used to map the spread of rs1426654-A allele and test its correlation with language, geography and ancestry component.
For the resequencing study, we designed a global panel comprising of 95 individuals. This included 70 subjects from HGDP-Centre d'Étude du Polymorphisme Humain (HGDP-CEPH) worldwide panel , and additionally 3 Europeans, 18 Indians, and 4 Central Asians to cover the underrepresented regions of the CEPH panel. For population-level analyses, these 95 individuals were broadly classified into 8 major groups based on their geography and ethnicity: sub-Saharan Africa (n = 22), North Africa/Middle East (n = 7), Europe (n = 11), North and Central Asia (n = 7), South Asia (n = 23), East Asia (n = 14), Native Americans (n = 4) and Melanesia (n = 7). List of the populations included in the resequencing project, representing these regions is given in Table S9.
Genotyping of the rs1426654 SNP
A 443 bp region of SLC24A5 flanking the rs1426654 SNP was amplified by PCR using s.E3,4F and s.E3,4R primers (Table S15). The cycling protocol consisted of 96°C for 3 min, 32 cycles of 96°C for 30 s, 57°C for 30 s, 72°C for 1 min and final extension at 72°C for 5 min. The PCR product was then either directly sequenced or digested overnight at 37°C using Hin6I restriction endonuclease enzyme. All digested products were run on a 3% agarose gel. The products for sequencing were run on 3730XL DNA Analyzer (Applied Biosystems, Foster City, CA) using Big Dye Terminator sequencing kit (v3.1 Applied Biosystems).
Genotype-phenotype association analyses
The effect of the functional SLC24A5 SNP (rs1426654) on skin pigmentation differences between low (<38 MI) and high (>50 MI) MI groups of Cohort A was tested using a logistic regression model. For this, we compared a model that included sex and population (caste) as predictors to a model in which the genotype was added as an independent variable. An association between SNP and melanin index was tested using a likelihood-ratio test after adjusting for sex and population and, assuming additivity, odds ratio was calculated for the rs1426654-A allele. Furthermore, we calculated the cross-validated Area Under the Curve (AUC) value to quantify how accurately this polymorphism predicts the occurrence of an individual in the low or high MI group, using the R package caret .
To estimate the effect size of the SNP, we used a simulation-based approach known as multiple imputation . This method uses regression models and Bayesian sampling to impute missing values conditional on other predictors. Using random imputations, 1000 complete datasets were generated. The desired analysis was performed on each dataset using methods based on complete data. Results were pooled to derive corrected point estimates and inference , . Using this methodology, we estimated the mean MI for each genotype separately for males and females. We also estimated the coefficient of determination (R2) for the full model which included sex and genotype, and the variation of melanin index that can be explained by rs1426654 SNP alone. We tested the effect of genotypes on melanin index using a generalized linear model (GLM). All the above stated analyses were performed using the R package MICE 2.9 .
For randomly collected samples (Cohort B), similarly, the effect of rs1426654 genotypes on melanin index was assessed using a GLM. Furthermore, the effect of the genotype in the cohort studied was tested using an additive model. All statistical analyses were performed using the R computing package (version 220.127.116.11) ( http://www.r-project.org/ ).
Geospatial distribution of the rs1426654-A allele
To visualize the geospatial pattern of the rs1426654 SNP in South Asia and to compare it with other populations across the world, an isofrequency map was generated using 1446 individuals genotyped across all three cohorts (Table S5) and 2763 subjects from previously published datasets (Table S7). The isofrequency map was drawn using Surfer 8.0 (Golden Software Inc, Golden, Colorado).
To test the distribution of the rs1426654-A allele across different language families and geographical coordinates, all of the individuals genotyped under the three cohorts were grouped into 7 geographical zones and 4 major language families pertinent to India (Table S6). Some populations were regrouped with their geographical neighbors of same ethnicity (Table S5). Populations that could not be grouped and had low sample size (n<15) were excluded. Therefore, data from 1446 individuals representing 40 populations were used for the linguistic and geographical analyses (Table S5 and S7). The rs1426654-A allele frequency was also assayed across the geographical coordinates (absolute latitude and longitude) using Pearson's correlation test. A Mantel test was used to examine the interaction of the allele frequencies with geography and language. For this, the genetic distance matrix (based on FST) was generated in Arlequin 18.104.22.168  and the geographical matrix was calculated from geographic coordinates. For the language matrix, we used the binary approach by coding populations speaking a language from the same language family as 0 and different language family as 1. A Mantel test was performed using Arlequin with 10,000 permutations.
Correlation between rs1426654-A allele frequency and South Asian ancestry proportion
We tested the correlation between the derived rs1426654-A allele frequency and the proportion of the ancestry component that South Asian populations share with West Eurasians (as depicted by the “light green component” in ). For this, genome-wide datasets on Indian populations available from literature – and relevant global reference populations were combined and subjected to structure-like analysis using ADMIXTURE  to determine the proportions of the hypothetical ancestral populations using the methods described by Metspalu . A list of the populations included in the run and their source from the literature is given in Table S8. We ran ADMIXTURE 100 times from K = 2 to K = 9 to monitor convergence between individual runs. Log-likelihood scores suggested that the global maximum was reached at K = 7. Population structure of the studied populations as inferred by ADMIXTURE analysis at K = 7 using 98,189 SNPs is shown in Figure S2A. The proportions of the k5 light green ancestry component (Figure S2A) at K = 7 were then extracted and compared with rs1426654-A allele frequency, for those world and South Asian populations for which the rs1426654 frequency was available, using Pearson's correlation test.
A total of 11.74 kb region of SLC24A5 comprising exons (1617 bp), introns (5797 bp), 5′ flanking (4150 bp), and 3′flanking (177 bp) regions spanning over 25.6 kb (48409019–48434692) was resequenced (Figure 3) in 95 multiethnic individuals using 31 pairs of validated primers (Table S15). PCR products were purified with Exo-SAP prior to sequencing. Bidirectional sequencing for each fragment was performed using Big Dye Terminator sequencing kit (v3.1 Applied Biosystems) and run on 3730XL DNA Analyzer (Applied Biosystems, Foster City, CA). The sequences were then assembled and analyzed by Seqscape ver 2.5 (Applied Biosystems). BIOEDIT 7.1.3 was used to align the sequences to the NCBI Reference Sequence (NG_011500.1; 28421 bp). Variants were annotated with SNPs included in dbSNP build 137, June 2012. All of the variants were confirmed by manual inspection. The sequences were phased using PHASE 2.1.1 implemented in DnaSP 5.10.01 . Sequence diversity measures (π and θ) were computed using DnaSP  and Arlequin 3.5  was used to perform the interpopulation and intrapopulation analyses.
Tests of selection
For resequenced data, we tested for the effects of selection using Tajima's D , Fu and Li's D* and F*  statistics, calculated in DnaSP 5.10.01 . All the tests were performed under the standard assumption of constant population size. However, since these tests are known to be strongly influenced by population history, the significance of the results was also estimated by means of coalescent simulations using the COSI 1.2.1 software with the best-fit population model . We performed 10,000 replicates. Coalescent simulations were conditioned on a specific mutation and recombination rate. We used a mutation rate of 5×10−10 substitutions/site/year, as reported by Scally and Durbin . Estimates for the local SLC24A5 recombination rate were obtained from HapMap Build 37  and the length of simulated sequence matched that of the resequenced region (11741 bp). In the absence of an appropriate demographic model and empirical distribution, we have used the evolutionarily closest population implemented in COSI to assess the significance.
For selection analyses based on genome-wide genotype data, we used a merged data set of Illumina Infinium 650K, 610K and 660K available for 145 Indians and worldwide samples including Bantu (n = 19), Middle East (n = 133), Europe (n = 100), Central Asia (n = 77), Iran (n = 20), Pakistan (n = 165), East Asia (n = 211), Oceania (n = 27) from published datasets. Two haplotype-based selection tests, Cross-Population Extended Haplotype Homozygosity (XP-EHH) and Integrated Haplotype Scores (iHS), were used to assess the empirical rank of the SLC24A5 in the haplotype homozygosity scans performed across the genome in each of the 8 world regions. iHS and XP-EHH statistics were calculated using code by Joseph Pickrell, available at hgdp selection browser ( http://hgdp.uchicago.edu/ ). The analyses were based on a genome scan of 13,274 windows of size 200 kb each. Unphased SNP data were retrieved for the genomic window containing SLC24A5 (chromosome 15:46.2–46.4 Mb (Build 36/hg18) and compared to the empirical distribution of other windows across the genome. Yoruba was used as the reference population in XP-EHH analyses. Data were phased using Beagle 3.1. .
Phylogenetic analysis and coalescent age estimates
We estimated the phylogeny of SLC24A5 haplotypes based on sequences of 11.74 kb for our diverse set of 95 individuals. For this, haplotypes were inferred from the genotype data using PHASE v.2.1.1 . A neighbor-joining phylogenetic tree was constructed from these data using MEGA 5 . A schematic tree representing the eight most common branches of the haplotype tree is shown in Figure S3.
We estimated the age of the rs1426654 mutation using 8837 bp of the SLC24A5 gene. This region was determined by the largest linkage-disequilibrium block identified by the four-gamete rule algorithm, using a minimum D' value of 0.8, as implemented in Haploview 4.2 . Coalescence times were estimated using Bayesian phylogenetic analysis in BEAST 1.7.0 . The analysis was conducted on a dataset of 73 sequences carrying the rs1426654-A allele. We further restricted our dataset to 7837 bp comprising of third codon sites, introns and flanking regions. The F81  nucleotide substitution model was selected as the best-fit model using the Bayesian information criterion in Modelgenerator . The analysis was performed using a strict molecular clock and the Bayesian skyride coalescent model . The molecular clock was calibrated using the mutation rate reported by Scally and Durbin , with a mean of 5×10−10 mutations/site/year and a standard deviation of 5.1×10−11. Posterior distributions of parameters were estimated by Markov chain Monte Carlo simulation, with samples drawn every 1000 steps over a total of 10,000,000 steps. Three independent runs were conducted to check for convergence to the stationary distribution and the first 1000 samples were discarded as burn-in. Sufficient sampling of parameters was evaluated using Tracer 1.5  and samples from independent runs were combined. Sampled posterior trees were summarized to generate a maximum-clade-credibility tree. Statistical uncertainty in age estimates is reflected by the 95% credibility intervals.
We also estimated the coalescent times using the rho statistics  in Network 4.6 ( http://www.fluxus-engineering.com/sharenet.htm ) assuming a rate of 5×10−10 substitutions/site/year  and using sequence length of 8837 bp. The standard deviation was calculated according to Saillard .
Sampling locations for the present study. Map represents location of samples collected from different parts of Indian subcontinent encompassing populations of different ethnic background, language families, castes and tribes. Populations from cohorts A and B, shown in brackets, were assessed for melanin index, while the rest from Cohort C have only genotype information.
Correlation between rs1426654 A allele frequency and ancestry component. (A) Population structure inferred by ADMIXTURE analysis at K = 7. (B) Graphs showing correlation between rs1426654-A allele frequencies and light green (k5) ancestry component of the above analysis using all (North Africa/Middle East, Europe, Caucasus, Central Asia) populations in the left panel, and 27 ethnic groups from South Asia in the right panel (Hazara, Pathan, Burusho, Balochi, Brahui, Makrani, Sindhi, Gujaratis, Bhil and Meghawal, Kashmiri Pandits, Uttar Pradesh (UP) Brahmins, Kshtriya, Chamar, Dharkar, Dusadh, Kanjar, Kol, Uttar Pradesh (UP) low caste, Tharu, Gond, Naidu, Kurumba, Paniya and Malayan, Asur and Ho, Gadaba and Savara, Garo and Naga and Khasi).
Schematic tree representing the phylogenetic relationships among the samples studied in resequencing project, with haplogroup H being defined by the non-synonymous SNP rs1426654. The numbers denote the frequencies of the chromosomes in each haplogroup by the 8 geographical regions studied.
Graph showing nucleotide diversity pi (π) in the 5′ flanking region (4150 bp). Sliding window with length = 100 bp and step size = 25 bp was used to generate the graph.
Haplotype diversity indices (θ) of 8 geographical regions included in the study. The solid lines represent values of diversity indices (θ) according to Table S11. The dashed lines of the same color show standard deviations for the respective estimates.
Sample description of individuals in Cohort A with rs1426654 genotyping results.
Sample description for individuals in Cohort B with rs1426654 genotyping results.
Effect of rs1426654 genotypes on skin pigmentation variation among individuals of Cohort A. (A) Estimated average melanin index (MI) for SLC24A5 rs1426654 genotypes. (B) Difference in estimated mean melanin index (in melanin units) for the rs1426654 genotypes.
Sample description of populations under all the three cohorts (A–C) and their average rs1426654-A allele frequency.
rs1426654-A allele frequency of populations included in geographic and linguistic analyses.
Average rs1426654-A allele frequency according to their linguistic and geographic divisions.
rs1426654-A allele frequency across the world populations based on published datasets.
List of populations included in the ADMIXTURE run along with their geographic region and source of study.
Description of the variants identified in the SLC24A5 resequencing project.
Variation among the resequencing project samples, for a tetranucleotide (GAAA) repeat at genomic position 48412029 in 5′ flanking region of the SLC24A5 gene.
Estimates of SLC24A5 nucleotide diversity measures among and within 8 geographic regions of the world.
Tests of neutrality for 8 geographic regions based on resequencing data.
Genome-wide rankings of the SLC24A5 gene in haplotype homozygosity tests across world populations.
Comparison of the rs1426654 A allele frequencies of the current study with the study published by Pemberton et al. (Pemberton et al. 2008) .
We are extremely grateful to all the participants of the study and to those who assisted us in the field-work and collection of the samples. We would like to thank Prof. K.M.R. Nambiar and Anurag Kadiyan for their kind and crucial assistance in sample collection. We would also like to thank A. Kushniarevich, J. Parik, T. Reisberg, I. Hilpus, G. Arun Kumar, A. Khatri, M. Singh and A. Shah for their technical assistance; G. Jacobs, N. Sharma and B. Yunusbayev for discussion. ADMIXTURE run was carried out in the High Performance Computing Centre, University of Tartu.
Conceived and designed the experiments: CBM FMI TK. Performed the experiments: CBM FMI RT. Analyzed the data: CBM FMI MMö RG GC SYWH TK. Contributed reagents/materials/analysis tools: IGR FC NR MMe CGNMT RP MML LS KT RV. Wrote the paper: CBM FMI TK. Supervised the statistical analyses related to genotype-phenotype association study: MMö. Gave suggestions and advice for the logistic regression analysis: RG CGNMT. Helped in performing the selection tests for sequence data using COSI package: GH. Supervised results and discussion related to BEAST analysis: SYWH. Critical editing of the manuscript: SH SYWH RV.
- 1.Rees JL (2004) The genetics of sun sensitivity in humans. Am J Hum Genet 75: 739–751.
- 2.Thody AJ, Higgins EM, Wakamatsu K, Ito S, Burchill SA, et al. (1991) Pheomelanin as well as eumelanin is present in human epidermis. J Invest Dermatol 97: 340–344.
- 3.Barsh GS (2003) What controls variation in human skin color. PLoS Biol 1: e27.
- 4.Lamason RL, Mohideen MAPK, Mest JR, Wong AC, Norton HL, et al. (2005) SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. Science 310: 1782.
- 5.Candille SI, Absher DM, Beleza S, Bauchet M, McEvoy B, et al. (2012) Genome-Wide Association Studies of Quantitatively Measured Skin, Hair, and Eye Pigmentation in Four European Populations. PloS One 7: e48294.
- 6.Stokowski RP, Pant P, Dadd T, Fereday A, Hinds DA, et al. (2007) A genomewide association study of skin pigmentation in a South Asian population. Am J Hum Genet 81: 1119–1132.
- 7.Miller CT, Beleza S, Pollen AA, Schluter D, Kittles RA, et al. (2007) cis-Regulatory changes in Kit ligand expression and parallel evolution of pigmentation in sticklebacks and humans. Cell 131: 1179–1189.
- 8.Jablonski NG, Chaplin G (2000) The evolution of human skin coloration. J Hum Evol 39: 57–106.
- 9.Loomis WF (1967) Skin-pigment regulation of vitamin-D biosynthesis in man. Science 157: 501.
- 10.Robins AH (2005) Biological perspectives on human pigmentation. Cambridge University Press.
- 11.Juzeniene A, Setlow R, Porojnicu A, Steindal A, Moan J (2009) Development of different human skin colors: a review highlighting photobiological and photobiophysical aspects. J Photochem Photobiol B 96: 93.
- 12.Myles S, Somel M, Tang K, Kelso J, Stoneking M (2007) Identifying genes underlying skin pigmentation differences among human populations. Hum Genet 120: 613–621.
- 13.Pickrell JK, Coop G, Novembre J, Kudaravalli S, Li JZ, et al. (2009) Signals of recent positive selection in a worldwide sample of human populations. Genome Res 19: 826–837.
- 14.Novembre J, Di Rienzo A (2009) Spatial patterns of variation due to natural selection in humans. Nat Rev Genet 10: 745–755.
- 15.Anno S, Ohshima K, Abe T (2010) Approaches to understanding adaptations of skin color variation by detecting geneenvironment interactions. Expert Rev Mol Diagn 10: 987–991.
- 16.Tang K, Thornton KR, Stoneking M (2007) A new approach for using genome scans to detect recent positive selection in the human genome. PLoS Biol 5: e171.
- 17.Ginger RS, Askew SE, Ogborne RM, Wilson S, Ferdinando D, et al. (2008) SLC24A5 encodes a trans-Golgi network protein with potassium-dependent sodium-calcium exchange activity that regulates human epidermal melanogenesis. J Biol Chem 283: 5486.
- 18.Tsetskhladze ZR, Canfield VA, Ang KC, Wentzel SM, Reid KP, et al. (2012) Functional Assessment of Human Coding Mutations Affecting Skin Pigmentation Using Zebrafish. PloS One 7: e47398.
- 19.Norton HL, Kittles RA, Parra E, McKeigue P, Mao X, et al. (2007) Genetic evidence for the convergent evolution of light skin in Europeans and East Asians. Mol Biol Evol 24: 710.
- 20.Voight BF, Kudaravalli S, Wen X, Pritchard JK (2006) A map of recent positive selection in the human genome. PLoS Biol 4: e72.
- 21.Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, et al. (2007) Genome-wide detection and characterization of positive selection in human populations. Nature 449: 913–918.
- 22.Jaswal I (1983) Pigmentary variation in Indian populations. Acta Anthropogenet 7: 75.
- 23.Pemberton TJ, Mehta NU, Witonsky D, Di Rienzo A, Allayee H, et al. (2008) Prevalence of common disease-associated variants in Asian Indians. BMC Genet 9: 13.
- 24.Metspalu M, Romero IG, Yunusbayev B, Chaubey G, Mallick CB, et al. (2011) Shared and unique components of human population structure and genome-wide signals of positive selection in South Asia. Am J Hum Genet 89: 731–744.
- 25.Reich D, Thangaraj K, Patterson N, Price AL, Singh L (2009) Reconstructing Indian population history. Nature 461: 489–494.
- 26.Chaubey G, Metspalu M, Choi Y, Mägi R, Romero IG, et al. (2011) Population genetic structure in Indian Austroasiatic speakers: the role of landscape barriers and sex-specific admixture. Mol Biol Evol 28: 1013–1024.
- 27.Yunusbayev B, Metspalu M, Järve M, Kutuev I, Rootsi S, et al. (2012) The Caucasus as an asymmetric semipermeable barrier to ancient human migrations. Mol Biol Evol 29: 359–365.
- 28.Behar DM, Yunusbayev B, Metspalu M, Metspalu E, Rosset S, et al. (2010) The genome-wide structure of the Jewish people. Nature 466: 238–242.
- 29.Coop G, Pickrell JK, Novembre J, Kudaravalli S, Li J, et al. (2009) The role of geography in human adaptation. PLoS Genet 5: e1000500.
- 30.Izagirre N, García I, Junquera C, De La Rúa C, Alonso S (2006) A scan for signatures of positive selection in candidate loci for skin pigmentation in humans. Mol Biol Evol 23: 1697.
- 31.De Gruijter JM, Lao O, Vermeulen M, Xue Y, Woodwark C, et al. (2011) Contrasting signals of positive selection in genes involved in human skin-color variation from tests based on SNP scans and resequencing. Investig Genet 2: 24.
- 32.Norton HL, Kittles RA, Parra E, McKeigue P, Mao X, et al. (2007) Genetic evidence for the convergent evolution of light skin in Europeans and East Asians. Mol Biol Evol 24: 710.
- 33.Ang KC, Ngu MS, Reid KP, Teh MS, Aida ZS, et al. (2012) Skin Color Variation in Orang Asli Tribes of Peninsular Malaysia. PloS One 7: e42752.
- 34.Quillen EE, Bauchet M, Bigham AW, Delgado-Burbano ME, Faust FX, et al. (2012) OPRM1 and EGFR contribute to skin pigmentation differences between Indigenous Americans and Europeans. Hum Genet 131: 1073–1080.
- 35.Beleza S, Johnson NA, Candille SI, Absher DM, Coram MA, et al. (2013) Genetic Architecture of Skin and Eye Color in an African-European Admixed Population. PLoS Genet 9: e1003372.
- 36.Sapovadia V (2012) Saurashtra: A Language, Region, Culture & Community. Available: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2033685 .
- 37.ArunKumar G, Soria-Hernanz DF, Kavitha VJ, Arun VS, Syama A, et al. (2012) Population differentiation of Southern Indian male lineages correlates with agricultural expansions predating the caste system. PloS One 7: e50269.
- 38.Cordaux R, Saha N, Bentley GR, Aunger R, Sirajuddin S, et al. (2003) Mitochondrial DNA analysis reveals diverse histories of tribal populations from India. Eur J Hum Genet 11: 253–264.
- 39.Jablonski NG (2004) The evolution of human skin and skin color. Annu Rev Anthr 33: 585–623.
- 40.Jablonski NG, Chaplin G (2010) Human skin pigmentation as an adaptation to UV radiation. Proc Natl Acad Sci 107: 8962–8968.
- 41.McEvoy B, Beleza S, Shriver MD (2006) The genetic architecture of normal variation in human pigmentation: an evolutionary perspective and model. Hum Mol Genet 15: R176–R181.
- 42.Beleza S, dos Santos AM, McEvoy B, Alves I, Martinho C, et al. (2012) The timing of pigmentation lightening in Europeans. Mol Biol Evol 30 (1) 24–35.
- 43.Scally A, Durbin R (2012) Revising the human mutation rate: implications for understanding human evolution. Nat Rev Genet 13: 745–753.
- 44.Quillen EE, Shriver MD (2011) Milestone 2. Nat Milestones E5–E7.
- 45.Norton HL, Friedlaender JS, Merriwether DA, Koki G, Mgone CS, et al. (2006) Skin and hair pigmentation variation in Island Melanesia. Am J Phys Anthropol 130: 254–268.
- 46.Shriver MD, Parra EJ, Dios S, Bonilla C, Norton H, et al. (2003) Skin pigmentation, biogeographical ancestry and admixture mapping. Hum Genet 112: 387–399.
- 47.Cann HM, de Toma C, Cazes L, Legrand M-F, Morel V, et al. (2002) A human genome diversity cell line panel. Sci New York NY 296: 261.
- 48.Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28: 1–26.
- 49.Rubin DB (1987) Multiple imputation for non-response in surveys. New York: John Wiley & Sons.
- 50.Buuren S, Groothuis-Oudshoorn K (2011) MICE: Multivariate imputation by chained equations in R. J Stat Softw 45.
- 51.Excoffier L, Laval G, Schneider S (2005) Arlequin (version 3.0): an integrated software package for population genetics data analysis. Evol Bioinforma Online 1: 47.
- 52.Alexander DH, Novembre J, Lange K (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Res 19: 1655–1664.
- 53.Librado P, Rozas J (2009) DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 25: 1451–1452.
- 54.Tajima F (1989) Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123: 585–595.
- 55.Fu Y-X, Li W-H (1993) Statistical tests of neutrality of mutations. Genetics 133: 693–709.
- 56.Schaffner SF, Foo C, Gabriel S, Reich D, Daly MJ, et al. (2005) Calibrating a coalescent simulation of human genome sequence variation. Genome Res 15: 1576–1583.
- 57.Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, et al. (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851–861.
- 58.Browning SR, Browning BL (2007) Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81: 1084.
- 59.Stephens M, Donnelly P (2003) A comparison of bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet 73: 1162–1169.
- 60.Tamura K, Peterson D, Peterson N, Stecher G, Nei M, et al. (2011) MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28: 2731–2739.
- 61.Barrett J, Fry B, Maller J, Daly M (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21: 263–265.
- 62.Drummond AJ, Rambaut A (2007) BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol 7: 214.
- 63.Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17: 368–376.
- 64.Keane TM, Creevey CJ, Pentony MM, Naughton TJ, Mclnerney JO (2006) Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evol Biol 6: 29.
- 65.Minin VN, Bloomquist EW, Suchard MA (2008) Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. Mol Biol Evol 25: 1459–1471.
- 66.Rambaut A, Drummond A (2007) Tracer version 1.5. Available: http://beast . bio. ed. ac. uk.
- 67.Forster P, Harding R, Torroni A, Bandelt H-J (1996) Origin and evolution of Native American mtDNA variation: a reappraisal. Am J Hum Genet 59: 935.
- 68.Saillard J, Forster P, Lynnerup N, Bandelt H-J, Nørby S (2000) mtDNA Variation among Greenland Eskimos: The Edge of theBeringian Expansion. Am J Hum Genet 67: 718.
- 69.Parra EJ, Kittles RA, Shriver MD (2004) Implications of correlations between skin color and genetic ancestry for biomedical research. Nat Genet 36: S54–S60.
- 70.Shriver MD, Parra EJ (2000) Comparison of narrow-band reflectance spectroscopy and tristimulus colorimetry for measurements of skin and hair color in persons of different biological ancestry. Am J Phys Anthropol 112: 17.
Exome Sequencing Identifies SLC24A5 as a Candidate Gene for Nonsyndromic Oculocutaneous Albinism
Oculocutaneous albinism (OCA) is a heterogeneous and autosomal recessive disorder. Its worldwide prevalence is approximately 1:17,000 (Witkop, 1979), indicating that about 1 in 65 people carries the OCA gene. It manifests as a reduction or complete loss of melanin in the skin, hair, and eyes, often accompanied with eye symptoms such as photophobia, strabismus, moderate-to-severe visual impairment, and nystagmus (Gronskov et al., 2007). The characterization of OCA subtypes previously based upon clinical manifestations has evolved into a molecular classification on the basis of the identification of the causative genes (Wei et al., 2010). OCA has been identified as arising from mutations in nonsyndromic OCA genes (TYR, OCA2, TYRP1, and SLC45A2) or syndromic OCA genes (HPS1, AP3B1, HPS3, HPS4, HPS5, HPS6, DTNBP1, BLOC1S3, PLDN, LYST, MYO5A, RAB27A, and MLPH; Li et al., 2006; Wei and Li, 2012). Although several melanosomal protein encoding genes such as RAB38, TYRP2, PMEL, and SLC24A5 are suggestive of candidate OCA genes in the mouse or other species (Sitaram and Marks, 2012), no pathological mutations of these genes have been reported in human OCA patients (Suzuki et al., 2003; Hutton and Spritz, 2008; Gronskov et al., 2009; Sengupta et al., 2010). More candidate OCA genes in humans are likely to be identified as suggested by additional mouse OCA genes (Bennett and Lamoreux, 2003; Li et al., 2006). In addition, unknown loci are implicated in human OCA patients. Recently, a region on 4q24 has been identified as an additional locus (OCA5) for nonsyndromic OCA (Kausar et al., 2012). However, its genetic identity has not been characterized. It is plausible to predict that unknown OCA genes will be uncovered.
We performed molecular screening of 179 OCA patients in the Chinese Albinism Registry by implementing the optimized strategy (Wei et al., 2010, 2011). We did not find any mutation in the commonly occurring OCA genes (TYR, OCA2, TYRP1, SLC45A2, HPS1, and HPS4) in 11 patients after sequencing all the exons and the adjacent exon-intron boundaries of these genes. The percentage of unidentified mutations (6.2%, 11/179) is similar to that observed in other studies (Hutton and Spritz, 2008; Rooryck et al., 2008). This suggests that additional unknown OCA genes may exist. In this study, we adopted the next-generation sequencing method and identified SLC24A5 as the putative causative gene in one OCA6 (named subsequent to the OCA5 locus) family.
Clinical features of the OCA6 patient
In this OCA6 family (Supplementary Figure S1 online), the proband (IV2 in the pedigree) was clinically diagnosed as OCA2 at the age of 3 years (Patient# 3 in Supplementary Table S1 online). Her hair at birth was blonde but darkens to brown at a later age (Figure 1a). Her skin was white and her iris was transparent and brownish (Figure 1b). Nystagmus and photophobia were mild in this proband. Her visual acuity was 20/100. Color fundus photography showed hypopigmentation and underdeveloped macula (Figure 1c). Optical coherence tomography showed disappearance of central fovea of macula (Figure 1d) when compared with her brother. This patient has no history of bleeding tendency or clinical episodes for hemorrhagic diathesis including purpura. The counts for platelets, red blood cells, and white blood cells were within the normal ranges. The percentages of granulocytes, lymphocytes, and monocytes were within normal ranges as well. Measurements of serum IgG, IgA, IgM, C3, and C4 were within the normal ranges. All the coagulation tests including activated partial thromboplastin time, prothrombin time, thrombin time, D-dimer, and fiborinogen were within normal ranges. Trace metals (Cu, Zn, Ca, Mg, and Fe) in peripheral blood were within the normal ranges. Other serum and urine metabolites were within the normal ranges. No other apparent malformation in physical examinations was found. Normal images were shown by chest X-ray, head computerized tomography scan, and abdomen B-ultrasound scan. No neurological symptoms were found. Taken together, this proband has symptoms of nonsyndromic OCA. No consanguineous marriage was claimed by her parents (Supplementary Figure S1 online). OCA symptoms were not apparent in her parents and brother (Figure 1).
Two pathological alleles of the SLC24A5 gene were found in the OCA6 patient
We performed exome sequencing of the patient (or proband) together with her parents. The called variants include single-nucleotide polymorphisms (SNPs) and Indels. SNPs are called by the SOAPsnp program (Li et al., 2009). Indels are called by the Genome Analysis Toolkit (McKenna et al., 2010). The conflicted sites between SNPs and Indels are filtered out. We exclude the variants with quality score less than 20. In our highest priority, the SNPs/Indels in coding regions or splicing sites are put into the next filtering pipelines to exclude variants with synonymous mutations or those in 3′-untranslated region, 5′-untranslated region, intronic, and intergenic regions implemented by the predictions of the SIFT program ( http://sift.jcvi.org/ ). The remaining SNPs/Indels are then filtered against three databases: dbSNP (v130), 1000 Genomes Project database, and HapMap to exclude the variants with an allelic frequency larger than 0.5%. Finally, the filtered targeted SNPs/Indels are classified according to their genotypes for subsequent analyses. The results are summarized in the Supplementary Tables S2 and S3 online.
With this candidate gene list (30 genes in total after filtering by our pipeline as shown in Supplementary Table S3 online), we first focused on whether there are any known pigment genes (96 selected genes from the literature or database) that have been described in human and mice (Bennett and Lamoreux, 2003; Li et al., 2006; Sturm, 2009; Yamaguchi and Hearing, 2009; Ito and Wakamatsu, 2010, 2011; Sitaram and Marks, 2012). That is, we checked whether any of the 96 pigmentary genes appears either in the compound heterozygous state (23 genes) or in the homozygous state (7 genes; Supplementary Table S3 online). We found that only one known pigmentary gene, SLC24A5, appeared at the heterozygous state in this paired match. Two variations in this patient, c.591G>A and c.1361insT, were shown in our exome sequencing data. We further verified these two previously unreported mutations by Sanger sequencing. In this family, the proband inherited the c.1361insT mutation from her father, and the c.591G>A mutation from her mother. Her brother is a heterozygote of the c.591G>A mutation (Figure 2). We also detected the c.591G>A mutation in the maternal side of this family (Supplementary Figure S1 online). We found that III3 is the carrier of the c.591G>A mutation, but II2 and III2 do not carry this mutation. We did not detect the c.1361insT mutation in the remaining paternal family members because of unavailability of samples (Supplementary Figure S1 online). These two mutations of SLC24A5 were not found in 120 unaffected individuals. No SLC24A5 mutation was found in the remaining 10 unidentified OCA patients as shown in Supplementary Table S1 online. This suggests that OCA6 occurs at a relatively lower frequency (0.56%, 1/179) in Chinese OCA patients compared with OCA1, OCA2, OCA4, and HPS1. In a total of 179 OCA patients from the Chinese Han population, OCA1, OCA2, OCA4, and HPS1 account for 64.3%, 11.7%, 15.6%, and 2.2%, respectively (Wei et al., 2010, 2011). The c.591G>A mutation predicts a nonsense mutation, p.W197X, on exon 6. The c.1361insT mutation leads to a truncated protein (p.L454Ffs31X) due to a premature stop codon at 486X on exon 9, instead of the wild-type protein at 501X. By using the PROVEAN algorithm ( http://provean.jcvi.org ), both truncated proteins are predicted to be deleterious to the function of the SLC24A5 transporter.
Less mature melanosomes were present in the skin melanocytes of the OCA6 patient
To investigate whether the SLC24A5 mutations affect pigment synthesis, we examined the morphology of melanosomes in epidermal melanocytes by electron microscope (EM). We observed less mature melanosomes (stage IV) but more immature melanosomes (stages I, II, and III) in both the cell body and bulges extending to keratinocytes (Figure 3a and c) compared with the control sample (Figure 3b and d). This suggests that the SLC24A5 protein is required for the maturation of melanosmes or for the production of pigment in mature melanosomes. The less mature melanosomes likely produce less pigment in the skin and hair. We then measured eumelanin content in the patient’s hair by HPLC. Indeed, the hair eumelanin content is significantly lower in the patient compared with her brother and parents (Figure 3e). This suggests that the brown hair of the patient is likely due to the loss of eumelanin rather than the production of more pheomelanin, a feature existed in the clinically diagnosed OCA2 patients with mutations of two other melanosomal transporters, OCA2 and SLC45A2 (Ito et al., 2011).
The defects in OCA6 melanosomes phenocopy the disruption of melanosome biogenesis in Hermansky–Pudlak syndrome (HPS; Nguyen et al., 2002). HPS, a group of syndromic OCA, affects a group of lysosome-related organelles. The gold standard in diagnosing HPS is the lack of platelet dense granules by EM (Wei and Li, 2012). Our results showed that the number and size of platelet dense granules in the OCA6 patient were normal. In addition, no apparent changes in the dermal structure were noted in the patient (Supplementary Figure S2 online). This further excludes OCA6 as a subtype of HPS.
Steady-state levels of SLC24A5 were reduced in the skin of HPS mouse mutants
SLC24A5 has been reported as a melanosomal protein, which regulates pigment synthesis (Lamason et al., 2005). How this protein is transported into mature melaonsome to function in pigmentation remains unknown. Several melanosomal proteins such as tyrosinase, TYRP1, OCA2, and ATP7A are transported into mature melanosomes mediated by HPS protein–associated complexes such as AP-3, BLOC-1, and BLOC-2 as summarized in a recent review (Wei and Li, 2012). We measured the steady-state levels of SLC24A5 in mouse mutants of BLOC-1 (pallid, pa), BLOC-2 (ruby-eye, ru), BLOC-3 (pale ear, ep), and AP-3 (pearl, pe). Interestingly, the steady-state levels of SLC24A5 are reduced in HPS mutants pa and ru but are not significantly changed in ep and pe, although a reduction tendency was seen in ep (Figure 4). This suggests that BLOC-1 and BLOC-2 may function in mediating the melanosomal targeting of SLC24A5. Deficiency in any of these complexes may lead to the mistargeting of SLC24A5, which may be degraded by lysosomes or proteosomes. Clearly, the underlying trafficking mechanisms of SLC24A5 warrant further investigation.
SLC24A5 (solute carrier family 24, member 5), a putative potassium-dependent sodium/calcium exchanger 5 (NCKX5), has been implicated in hypopigmentation. It may function in the calcium homeostasis of melanosomes or trans-Golgi network, which is required for melanin biosynthesis or melanosomal protein trafficking (Lamason et al., 2005; Ginger et al., 2008). Genome-wide association study of skin pigmentation variation in individuals of the South Asian ancestry has revealed that SLC24A5 SNP rs1426654 (Thr111Ala) is associated with skin pigmentation (Stokowski et al., 2007). The allele p.Ala111Thr of the SLC24A5 gene is nearly fixed in light-skinned Europeans and can be considered as an ancestry informative marker (Soejima and Koda, 2007). Switching Thr for Ala at residue 111 of SLC24A5 reduces its exchange activity and melanogenesis (Ginger et al., 2008). Furthermore, SLC24A5 knockdown reduces melanin production in mouse B16 melanocytes. This suggests that the activity of this transporter regulates melanin synthesis and human skin/hair color with an undefined mechanism.
In Slc24a5-knockout mice, ocular albinism and hypopigmentation features have been reported (Vogel et al., 2008). Similarly, dysmorphic melanosomes are characterized in golden zebrafish mutant (Lamason et al., 2005). These results implicate that the SLC24A5 mutation may cause OCA symptoms in humans. However, analyses of SLC24A5 did not reveal pathological mutations in OCA patients (Gronskov et al., 2009; Sengupta et al., 2010). Recently, in an Indian patient with extreme cutaneous hypopigmentation, a homozygous four-nucleotide insertion (c.569_570insATTA) was found in the SLC24A5 gene. The insertion leads to a premature stop codon at 193X, suggesting a deleterious mutation of the SLC24A5 gene. However, this individual has pinkish-white skin, but dark brown hair and no apparent ocular defects (Mondal et al., 2012), excluding it from a typical form of OCA. A genotype–phenotype effect (i.e., different genotypes of SLC24A5 may produce variation of phenotypes) may exist or other pigmentary genes may influence its manifestations on the pigmentation of hair and eye in this case or in the Indian population. In addition, it requires detail ophthalmological examinations of this Indian patient to see whether ocular albinism occurs. In this study, the antibody used in detecting mouse skin SLC24A5 and several other commercial anti-human SLC24A5 antibodies did not produce a specific signal in normal human skin tissues. This precluded us from evaluating SLC24A5 protein changes in our OCA6 patient. Nevertheless, considering the functional studies and the phenotypes in the Slc24a5-KO mice (Vogel et al., 2008) and golden zebrafish (Lamason et al., 2005), it is likely that the ultrastructural changes of melanosomes in our OCA6 patient result from the observed SLC24A5 mutations in a compound heterozygous state. Therefore, to our knowledge, the deleterious mutation of SLC24A5 causes a previously unreported subtype of nonsyndromic OCA, OCA6.
In our OCA6 patient, more premature melanosomes were noted in epidermal melanocytes (Figure 3), implicating that SLC24A5 is involved in the maturation of melanosomes, mimicking a typical feature of the HPS and the Chediak–Higashi syndrome (Huizing et al., 2008; Wei and Li, 2012). Our results suggest that several HPS protein complexes (BLOC-1 and BLOC-2) may be involved in SLC24A5 trafficking into mature melanosomes. This is a well-defined pathway to mediate the trafficking of TYRP1, OCA2, and ATP7A into mature melanosomes, in which BLOC-2 functions downstream of BLOC-1 (Setty et al., 2007, 2008; Bultema et al., 2012). The assembly of SLC24A5 into melanosomes appears important for melanosomal architecture to ensure that melanin is synthesized properly. Lack of SLC24A5 may lead to the disruption of melanosomal maturation and melanin synthesis. In contrast, deficiency of HPS protein complexes (e.g., BLOC-1 and BLOC-2) may disrupt the targeting of SLC24A5 into melanosomes, a possible explanation to the OCA symptoms in HPS. These findings provide insights into the pathogenesis of nonsyndromic OCA and HPS. Mutational screen of SLC24A5 is recommended in the genetic analyses of OCA patients.
Materials and Methods
We recruited 11 molecularly uncharacterized OCA patients from the 179 unrelated OCA patients in the Chinese Albinism Registry (Wei et al., 2010, 2011) and 120 unaffected subjects from the Chinese Han population. Typically, among clinically diagnosed OCA1 patients, OCA1A presents with white hair that does not change with age, and OCA1B is characterized by white hair at birth that later becomes light yellow or darker. OCA2 is characterized by yellow, brown, or blonde hair at birth that may darken at a later age (Oetting et al., 2003). In all 11 OCA patients, white skin, blue or brown irises, and mild-to-severe nystagmus were observed (Supplementary Table S1 online). Only Patient #1 has consanguineous parents. Patient #3 is described in more detail in the Results. Blood samples were collected from these patients and their family members. This study was approved by the internal review board of the Bioethics Committees of Beijing Tongren Affiliated Hospital of Capital Medical University and the Institute of Genetics and Development Biology (IGDB), Chinese Academy of Sciences (CAS). The study was conducted according to the Declaration of Helsinki Principles. Written informed consent was obtained and 8 ml peripheral blood samples were collected from all subjects participating in this study.
The HPS mouse mutants, pallid (pa, BLOC-1 deficiency), ruby eye (ru, BLOC-2 deficiency), pale ear (ep, BLOC-3 deficiency) and pearl (pe, AP-3 deficiency), and wild-type control C57BL/6J (B6) were originally from the Jackson Laboratory (Bar Harbor, ME) and maintained in Dr Richard Swank’s laboratory at Roswell Park Cancer Institute (Buffalo, New York) and are now maintained at IGDB, CAS. All procedures were approved by the Institutional Animal Care and Use Committee of IGDB. To ensure the genotypes of the mouse mutants, we developed PCR methods for genotyping on the basis of the mutational nature of these mutants (Li et al., 2006).
Total genomic DNA was extracted from blood samples by the routine proteinase K/SDS method. DNA concentration of each sample is more than 200 ng μl−1 in a total of at least 15 μg with OD260/280 from 1.8 to 2.0 for quality assurance. Exons are captured with a specificity over 95% by following the procedures of the Agilent SureSelect Human All Exon Kits (Agilent, Santa Clara, CA). Sequencing was performed on an Illumina Solexa HiSeq 2000 instrument according to Illumina’s protocol (Illumina, San Diego, CA). The total targeted region of each sample is around 50 Mb. The mean depth of the target region is 40–50X. The average read length is around 90 bp. Reads were aligned to the human reference genome sequence UCSC Santa Cruz hg18, build 36.1 via the Burrows–Wheeler Aligner program (Li and Durbin, 2010). We implemented a data analysis framework with the family-based autosomal recessive mutation model, which is described in detail in Results.
Sanger sequencing of PCR products and validation of previously unidentified mutations
Standard PCR amplification procedures were conducted as previously described (Wei et al., 2010, 2011). All PCR product sizes were verified by 1.0–1.2% agarose gel electrophoresis. The primer sequences for the amplification of nine exons and exon/intron boundaries of the human SLC24A5 gene are available upon request. For Sanger sequencing, all purified PCR products were sequenced using the ABI PRISM 3700 automated sequencer (Applied Biosystems, Foster City, CA).
The biopsy samples of skin tissue from abdomen were fixed with 10% neutral-buffered formalin for histological observation or with 2.5% glutaraldehyde in phosphate buffer (pH7.4) for ultrastructural observation (Yang et al., 2012). The formalin-fixed samples were dehydrated, embedded in paraffin, sectioned at 5 μm thickness with a Leica microtome (Leica, Hubloch, Germany). Slides were stained with hematoxylin and eosin, and then observed under a light microscope (Nikon, Tokyo, Japan).
For EM, the glutaraldehyde-fixed samples of skin tissue were postfixed in 1% OsO4 for 2 hours at room temperature. These samples were dehydrated through a graded series of acetone incubations and embedded in Spurr’s resin, polymerized at 70 °C for 12 hours. Tissues were sectioned to 65 nm with a diamond knife (Diatome, Biel, Switzerland) on a Leica UC7 ultramicrotome (Leica) and mounted on Formvar-coated copper grids. After sequential staining with uranyl acetate and lead citrate, the ultrastructure was examined on a transmission EM (JEM-1400, JEOL, Tokyo, Japan) operated at 80 KV and digital images were captured with a 4 k × 2.7 k pixels CCD camera (Gatan, Tokyo, Japan).
To observe the dense granules in platelets, venous blood samples from the patient and her brother were collected with vacuum blood collection tubes containing EDTA-K2. The blood samples were centrifuged at 150 g for 10 minutes at room temperature to enrich platelets. A drop of platelet-rich plasma was placed on a carbon-coated grid and, 30 seconds later, the liquid was blotted carefully with Whatman#3 filter paper (PGC Scientifics, Palm Desert, CA). Grids were viewed without staining on the JEM-1400 EM.
Hair eumelanin measurement by HPLC
The quantification of eumelanin was based on the formation of the specific degradation product, pyrrole-2,3,5-tricarboxylic acid by alkaline H2O2 oxidation of eumelanin (Ito et al., 2011; Yang et al., 2012). In brief, 1.0–1.5 mg of hair cut at the length of 2–3 mm was placed in 2 ml screw-capped test tubes, to which 375 μl of 1 mol l−1 K2CO3 and 25 μl of 30% H2O2 (final concentration: 1.5%) were added. Samples were mixed at room temperature for 20 hours. Residual H2O2 was decomposed by the addition of 50 μl of 10% Na2SO3 and the mixtures were then acidified with 140 μl of 6 mol l−1 HCl. After vortex-mixing, the reaction mixtures were centrifuged at 4,000 g for 1 minutes and aliquots (20 μl) of the supernatant fluids were directly injected into the HPLC system (Waters 1525, Waters, Dublin, Ireland) using a Symmetry C18 column (4.6 mm × 250 mm, 5 μm, Waters). The mobile phase was 0.1 mol l−1 potassium phosphate buffer (pH 1.9): MeOH, 99: 1 (v/v). Analyses were performed at 35 °C at a flow rate of 0.7 ml min−1. Absorbance of the eluent was monitored at 269 nm with a UV detector. The pyrrole-2,3,5-tricarboxylic acid standard was a gift from Dr. Kazumasa Wakamatsu.
Steady-state level of protein by immunoblotting
Tissues from wild-type or mutant mice at a 3-month age were homogenized in a lysis buffer containing 50 mM Tris-HCl, pH 7.4, 150 mM NaCl, 1% Triton X-100, supplemented with proteinase inhibitor cocktail (Sigma-Aldrich, St Louis, MO). The extracts were centrifuged at 15,000 g for 15 minutes, and the concentration of the solublized protein was determined with the Protein Assay (Bio-Rad, Hercules, CA). For detection of various proteins in tissue homogenates, equal amounts of tissue homogenates were separated in SDS-PAGE gels and immunoblotted by a standard procedure, probed with goat anti-mouse SLC24A5 antibody (Abcam, Cambridge, MA), and visualized by ECL+ staining (Amersham Biosciences, Piscataway, NJ). Loading controls were probed with anti-β-actin antibody (Sigma-Aldrich).
For melanin measurements, statistical significance was determined by performing the Student’s t-test. For semiquantitative analysis of immunoblots, protein bands detected by ECL+ were scanned into Adobe Photoshop, and intensities were analyzed using NIH Image J and the Student’s t-test. P values <0.05 were considered as significant.
We thank Yueming Liu for his critical review of our ophthalmological data. We thank Richard Swank for providing the HPS mice and for his critical reviewing and proofreading this manuscript. This work was partially supported by grants from the National Natural Science Foundation of China (31230046, 81101182, 31071252), from the National Basic Research Program of China (2013CB530605), and from The State Key Laboratory of Molecular Developmental Biology, China.
Supplementary material is linked to the online version of the paper at http://www.nature.com/jid
The authors state no conflict of interest.
These authors contributed equally to this work.
Why Am I Neanderthal?
When our ancestors first migrated out of Africa around 70,000 years ago, they were not alone. At that time, at least two other species of hominid cousins walked the Eurasian landmass—Neanderthals and Denisovans. As our modern human ancestors migrated through Eurasia, they encountered the Neanderthals and interbred. Because of this, a small amount of Neanderthal DNA was introduced into the modern human gene pool.
Everyone living outside of Africa today has a small amount of Neanderthal in them, carried as a living relic of these ancient encounters. A team of scientists comparing the full genomes of the two species concluded that most Europeans and Asians have approximately 2 percent Neanderthal DNA. Indigenous sub-Saharan Africans have none, or very little Neanderthal DNA because their ancestors did not migrate through Eurasia.
On one level, it’s not surprising that modern humans were able to interbreed with their close cousins. According to one theory, Neanderthals, Denisovans, and all modern humans are all descended from the ancient human Homo heidelbergensis. Between 500,000 to 600,000 years ago, an ancestral group of H. heidelbergensis left Africa and then split shortly after. One branch ventured northwestward into West Asia and Europe and became the Neanderthals. The other branch moved east, becoming Denisovans. By 250,000 years ago H. heidelbergensis in Africa had become Homo sapiens. Our modern human ancestors did not begin their own exodus from Africa until about 70,000 years ago, when they expanded into Eurasia and encountered their ancient cousins.
The revelation that our ancient ancestors mated with one another could help explain one of the great mysteries in anthropology: Why did the Neanderthals disappear? After first venturing out of Africa, Neanderthals thrived in Europe for several hundred thousand years. But they mysteriously died out about 30,000 years ago, roughly around the same time that modern humans arrived in Europe.
Some scientists have suggested modern humans out-competed or outright killed the Neanderthals. But the new genetic evidence provides support for another theory: Perhaps our ancestors made love, not war, with their European cousins, and the Neanderthal lineage disappeared because it was absorbed into the much larger human population.
Even though Neanderthals and Denisovans are both extinct, modern humanity may owe them a debt of gratitude. A 2011 study by Stanford University researchers concluded that many of us carry ancient variants of immune system genes involved in destroying pathogens that arose after we left Africa. One possibility is that these gene variants came from other archaic humans.
These kinds of studies worry me a bit. All this studying whites are doing to trace some European within the DNA of Afrikan's makes me think they are trying to find a means to stack claim over Afrika even more, through their "scientific findings". Even though they have already colonized the continent multiple times over, when we finally decide to rise up and let it be known that their time within the boarders of our homeland needs to come to an end ; do you think they might come back to studies like this one and say they have every reason to be in Afrika? They already claim "everyone is Afrikan" but would they use this to further their arguments? I also hate how they say we "cozied" up to these other non hue-man species. if the euarsian is naturally violent I'm certain no cozying up to them took place but rather the same forceful brutality and rape we have always known them to display towards us.
- Agreed. I also feel like they enjoy the "fake it till you make it" strategy. Why we've had African scholars who've debunked their scientific rhetoric that's attempted to classify Kemet as non African for instance. But I definitely agree, they can't have the power to define everything.