I wished there was a book with fun facts about genetics. Actually, there might be. I have not even looked out of laziness. Nevertheless, this is still a good reason to talk about fun facts. Let me start from the beginning.
Since the completion of the human genome project, every once in while the Genome Reference Consortium updates what is called the human reference genome. Each one of us carries two sets of chromosomes, each made of about 3.2 billions nucleotides. Now, if you wanted to store that information, you could do so by writing down all the nucleotides. You probably know that most of the genome is the same across people, and it is mostly the same across great apes and humans. On average, one nucleotide out of 100 is different between a chimpanzee and a human being. So it is much more efficient to have a reference sequence and to store, for each one of us, the differences between our genome and a consensus reference sequence.
The consensus reference sequence was the goal of the human genome project and, even if it is mainly complete, it gets updated every now and then. In the last update, hg19, 9 regions have been marked as special, because they have what is called an alternate assembly. What happened is that, for different reasons, nine long sequences have diverged so much that they have irrevocably parted and cannot recombine with each other anymore. So, the reference genome takes into account the possibility, at this nine loci, for these alternative sequences.
One of these caught my interest today. It is a two million nucleotides long sequence on chromosome 17. It is thought that at some point, around two million years ago, an inversion event took place, that is, these two million base pairs got inverted in direction. This does not affect the functionality of the sequence, since the cells have no predilection for sequence direction. Although, it affects the ability of the sequence to recombine with the version with the inverted version. Among the genes in this sequence, the most famous is MAPT. It is known that mutations at this gene are responsible for some neurological disorders.
The interesting part of the story is that the inverted version of the sequence is present only in Europeans, with a prevalence between 20% and 30%. It is not clear why and some hypothesize that it might be the legacy of the Neanderthal people, who, before extinguishing, managed to interbreed with homo sapiens. That would explain why the two versions of the sequence are so different from each other and have a coalescence time, that is, an expected age from their most recent common ancestor, of about two million years.
Some of you might already understand why this is a touchy subject. It has all the ingredients for racial discrimination. The sequence is known to influence the brain, in some unknown way, and one version of it is present only among Europeans. I do not want to debate this aspect, although I expect that eventually scientist will perform the due experiments to unveil any possible hypothesis. In the meanwhile, be assured that the Neanderthal project will indeed unveil if the alternative sequence on chromosome 17 was contributed by Neanderthal people or not. Even if it is not, though, it will not rule out that it was instead contributed by home Erectus in Asia or some other unknown extinguished hominid for which no fossils are known.
In the meanwhile, I had to know, as a European, what do I have? It turns out that with 23andme you can check, if you look at the right SNP. In particular SNP rs1864325 is a predictor for the two sequences. A C at the SNP corresponds to the typical African haplotype, while a T at the SNP corresponds to the mysterious European haplotype. It turns out that I have a C and a T. So, whatever it means, I know that I have one of each sequences in my pair of chromosomes 17.
In a more romantic way, I can say to be one of those many million of Europeans within which the two sequences, after million of years of separation, have come together to shape me. Whatever that means. But I am confident I will not have to wait long to know.