And this is the looong explanation:
It is now well known that the human chromosome 2, the longest chromosome in humans, arises from the end-to-end fusion of two ancestral chromosomes which in chimpanzees remain separate as chromosomes 2A and 2B. However, this conclusion has been challenged by the Young Earth Creationist Jeffrey Tomkins in an article which he wrote for Answers in Genesis.
The article can be accessed here. Most of the points he raised in his analysis are easily rebutted but in researching one of them I found that the point he makes actually rebounds against him in a curious way, and ends up being evidence for evolution theory, and in particular for the generation of new functionality. I want to discuss these ideas in this thread.
Tomkins makes several points, but his central claim (encapsulated in the title of his article) is that the fusion site lies inside an active and functional gene and that there is evidence of transcription factor binding at the fusion site. How is it possible, Tomkins asks, that a gene can span the fusion site, since that would imply that different parts of the gene would lie on different chromosomes in the ancestral case. Furthermore, he claims that there is evidence for transcription factor binding at the fusion site itself. (This last point is a somewhat separate issue, but the fusion site has the expected DNA motifs in support of the fusion hypothesis)
The gene in question belongs to a family of genes known as DEAD/H-box helicase like genes (the helicases remodel DNA or RNA strands, for example by separating the two strands of DNA during transcription). The specific gene is DDX11L2 and it is one of several highly homologous copies of the gene found near the end caps of chromosomes in humans. In addition to the DDX11 gene itself, there are as many as 16-18 DDX-like genes in humans. DDX11L2 is an expressed pseudogene which means that it is transcribed to mRNA and spliced but is not translated to protein.
DDX11L2 is also a gene with alternative splices. What does this mean? A gene is made up of several exons (the coding part of the gene), separated by introns (sequences which do not code for proteins). Splicing is the process by which the introns are excised, and the exons are spliced together to form a single functional sequence. A gene which has alternative splices is one in which some parts of the gene are spliced into the sequence in some variants and not in others. For a protein coding gene this means that one gene can make several different proteins. Although DDX11L2 is not a protein coding gene, it can be transcribed to mRNA with two distinct alternatively spliced sequences. Let’s consider them.
DDX11L2 variant 2 is a three-exon transcript which spans 2549bp and which creates a 1668bp spliced product. Don’t take my word for it – the NCBI accession of this gene variant is NR_024005.2 so you can look it up yourself. DDX11L2 variant 2 does not cover the fusion site (the span of the transcript is 113,599,028 to 113,601,576 and the fusion site lies at, or close to 113,602, 932). So, this is
not the variant that Tomkins claims spans the fusion site.
DDX11L2 variant 1 (NCBI accession NR_024004.1) is also a three-exon transcript where exon 3 consists of exons 2 and 3 of variant 2 (including the intron between variant 2’s exons 2 and 3) and a bit more besides; exon 2 is part of variant 2’s exon 1; and exon 1 lies some 2,300bp beyond exon 2. Variant 1 spans 4690bp and splices to 2158bp. This
is the variant that is grist for Tomkins’ mill as the fusion site lies within this variant in the intron between exons 1 and 2.
Now Tomkins’ claim is only valid if it can be shown that DDX11L2 variant 1 is expected to have predated the fusion event. So, I did BLAST alignments on the two variants of the gene against the whole human genome. The BLAST tool compares a particular sequence against part or all of the its own genome or the genome of another species. The results were unmistakeable. Variant 2 (the variant which does not span the fusion site) aligns with multiple sequences in the human genome that lie on other human chromosomes. Take for example DDX11L1 which is on chromosome 1. It has the same three exons as the gene we are discussing, and the sequence similarity is 99%. Another example is DDX11L9 which lies on chromosome 15 with a sequence similarity of 98%. DDX11L10 lies on chromosome 16 and has 98% similarity. And so on. All these paralogous genes align with all three exons of variant 2 and are almost identical.
What of variant 1, the variant in which Tomkins claims the fusion site lies? Variant 1 also aligns with genes in the human genome, but, critically, there is no alignment with its exon 1. The alignment is restricted to its exons 2 and 3. Variant 1 also incorporates the intron between variant 2’s exons 2 and 3, and there is no alignment for this sequence in other genes either.
How can we interpret this? Variant 2 is clearly a member of a gene family which arises as a result of gene duplication. Much of that duplication must have pre-dated the chromosome fusion, because the other almost identical genes are located on the sub-telomeres of other chromosomes. DDX11L2 variant 2 lies near the fusion site on chromosome 2 in what was the sub-telomere of one of the ancestral chromosomes before they fused.
Exon 1 of variant 1, however, the variant which Tomkins claims spans the fusion site, has no homologous sequence in the human genome. It is clear then that variant 1 arose de novo after the fusion event, by incorporating the entire variant 2 sequence and adding the sequence of exon 1 which lies on the far side of the fusion site from the rest of the gene. (It also incorporates the intron between exons 2 and 3 of variant 2).
So, not only is Tomkins’s objection to the fusion hypothesis explained, but this is an example of previously non-functional sequence being co-opted to create new functionality, something that creationists claim cannot happen.