A more valid explanation for the telomere-like features present at the putative fusion site is that they may represent some form of a distinct genomic motif. To test this idea, a 798-bp fragment (figure 2) encompassing the fusion site and the region where the telomeric motifs are more densely populated was used as a query subject in a BLATGenome Res.
12:656–664, 2002." data-offset="-10" data-variation="small wide">17 search on the most recent build of the human genome (v 37.1;
www.genome.usc.edu) with masking disabled. The results revealed a total of 159 significantly placed hits throughout the genome on human chromosomes 1–11, 15, 18–20, X and Y. The homologous regions for these hits included areas near telomeres, pericentric areas, and a wide variety of internal euchromatic sites. Identity values ranged from 80.5 to 100%, supporting the conclusion that the telomere fusion site core sequence is not unique to its pericentric location on chromosome 2, and instead represents a sequence feature (motif) scattered throughout the human genome.
Figure 3. BLASTN results against the most recent builds of the human and chimpanzee genomes using a 798 bp human query sequence representing the core of the chromosome 2 fusion region.
To verify the BLAT results and to identify homologous sites in the chimpanzee genome, the BLASTN algorithm was used (with no masking or gap extension) for comparisons between the 798-bp core 2qfus sequence and the most recent builds of the human (v 37.1) and chimp (v 2.1) genomes maintained at NCBI (
www.ncbi.nlm.nih.gov/). Although the BLASTN query against the human genome was more data intensive than the index-based BLAT search, the results produced a total of 85 significantly placed hits on all human chromosomes except chromosomes 13, 16 and 17 (1–12, 14, 15, 18–22, X and Y). While the number of hits was reduced, compared to BLAT, more chromosomes with homologous sites were identified with the BLASTN search because of the more direct nature of the algorithm (figure 3). Interestingly, human chromosomes 2, 16, 21 and 22 were peppered with the ‘fusion site’ sequence over the length of their entire euchromatic landscape (figure 3).
When the 798-bp core fusion sequence was BLASTN queried against the chimpanzee genome, the significantly placed hit count was reduced to 19, only 22% of the amount observed in the human genome. This is a startling find in light of the wide-spread claims that the human and chimpanzee genomes contain DNA sequence that is supposedly 96 to 98% similar, a claim perhaps related to the fact that the human genome was used as a scaffold to build the chimpanzee genome.8 In addition, the human-chimp hit locations did not show strong synteny, as only 13 of the 19 hits (68%) shared visually similar locations in the genome (on chimpanzee chromosomes 1, 2B, 8, 9, 12, 14, 15, 18, 20 and 22).
The most startling outcome of this analysis is that the fusion site did not align with chimp chromosome 2A, one of the supposed pre-fusion precursors. Furthermore, the alignment at two locations on chromosome 2B, an internal euchromatic site and the telomere region of its long arm, did not match predicted fusion-based locations based on the fusion model. If the fusion model was credible, this should have produced an alignment with the telomeric region on chimpanzee 2B on the short arm.
The alignment data also severely calls into question claims of high overall sequence similarity of 96 to 98% between the genomes.
There is, therefore, no real evidence for DNA homology between human and chimpanzee for the 798-bp core fusion sequence. The alignment data also severely calls into question claims of high overall sequence similarity of 96 to 98% between the genomes. Our results are indirectly supported by the exceptionally high levels of dissimilarity observed in a recent study of a section of the Y chromosome landscape between human and chimpanzee.
Examining DNA sequence for a cryptic centromere
Following the supposed head-to-head telomere-based fusion of two smaller chromosomes, two centromeres would have had to exist in the newly formed chimeric chromosome, one from each of the two fused chromosomes. According to the evolutionary model, sequence degeneration plus selection would continue until the second centromere was completely non-functional. The DNA evidence in question is based on the fact that human, great-ape, and other mammalian centromeres are composed of a highly variable class of DNA sequence that is repeated over and over called alpha-satellite or alphoid DNA.et al., Genome-wide characterization of centromeric satellites from multiple mammalian genomes,
Genome Research 21:137–145, 2011." data-offset="-10" data-variation="small wide">18 Alphoid DNA, although found in centromeric areas, is not unique to centromeres and is even highly variable between homologous regions throughout the same mammalian genome.18
Figure 4. PhyML result with tree rendering by TreeDyn involving the nine Chromosome 2 alphoid sequences (prefix = AF) identified by accession numbers submitted to GenBank by Lonoce
et al. (2000; unpublished—see genbank accessions at
www.ncbi.nlm.nih.gov). The 171 bp consensus alphoid is included as a monomer and as repeats (2X, 3X, 4X). Two human alphoid sequences representing functional centromeric fragments identified by accession number are also included (prefix = M).
The basic human alphoid monomer is a 171-base motif represented by a patented synthetic consensus sequence in Genbank (Acc. # CS444613). There also exists two small sequenced clones representing alphoid repeats with proven cellular centromere function.et al., Chromosome instability associated with human alphoid DNA transfected into the chinese hamster genome,
Molec Cell. Biol. 8:3611–3618, 1988." data-offset="-10" data-variation="small wide">19 Nine different alphoid fragments in the cryptic centromere site associated with the purported chromosome 2 fusion event were also sequenced and submitted to GenBank by an Italian laboratory (see figure 4 for accession numbers). In total, we downloaded and analyzed all 12 of these sequences for similarity to each other and individually for genome-wide homology.
Using the BLAT tool on the most recent version (v 3.7) of the human genome assembly, the nine italian lab alphoid sequences elicited the strongest hits at the chromosome 2 putative cryptic centromere site for all accessions. This confirmed that they were cloned from this region of the genome. The consensus 171-bp alphoid sequence aligned at the cryptic centromere site with 90.6% identity, supporting the conclusion that the site contains alphoid-like sequences.
However, the concern is not if this location contains alphoid sequences that are known to be ubiquitous in the human genome, but how similar these sequences are to each other and to known functional centromeric alphoid repeats. Alphoid sequences located at centromeres form long series of repeat patterns that are very homogeneous in their repetitive structure, producing distinctive higher-order patterns. Alphoid regions that are non-centromeric are more diverse in their monomer content and form higher order patterns with different characteristics compared to centromeres.et al., Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data,
PLoS Comput Biol 3:1807–1818, 2007." data-offset="-10" data-variation="small wide">20 At present, there are five known supraclasses of human alphoid monomers that combine in various combinations.Nucleic Acids Res
34:1912–1924, 2006." data-offset="-10" data-variation="small wide">21 There is also evidence from research in progress that alphoid monomer classes themselves can be broken down further into specific subfragments that may be present in the genome by themselves or as a sub-fragment in an alphoid repeat region (Tomkins, unpublished data).
In a human alphoid multiple-sequence alignment analysis, we combined the two functional centromeric alphoid sequences with the set of nine Italian alphoid sequences along with the consensus 171-base alphoid sequence in our data set (figure 4). We also created tandem repeats of the consensus 171-base alphoid sequence representing repeats of 2X to 4X in length as individual sequences. Alignments were conducted using the MUSCLE software packageNucleic Acids Res.
32:1792–1797, 2004." data-offset="-10" data-variation="small wide">22 then refined using the Gblocks program.Systematic Biol.
56:564–577, 2007." data-offset="-10" data-variation="small wide">23
The human alphoid alignments clearly revealed dissimilarity between alphoid sequences and distinct patterns of clustering. Patterns of similarity were computationally evaluated using PhyML
et al., PHYML Online—a web server for fast maximum likelihood-based phylogenetic inference,
Nucleic Acids Res. 33:W557–W559, 2004." data-offset="-10" data-variation="small wide">24 with tree rendering performed by TreeDyn (figure 4).et al., TreeDyn: towards dynamic graphics and annotations for analyses of trees,
BMC Bioinformatics 7:439, 2006." data-offset="-10" data-variation="small wide">25 Four major groups were distinguished by the PhyML analysis with the functional centromere sequences clustering by themselves and not with the alphoid sequences located at the purported cryptic centromere site on chromosome 2. The sequences at the cryptic centromere site are clearly a diverse mixture of alphoid monomers, forming three separate groups and not distinctly representative of functional centromeric DNA. In a structural comparison of both the functional centromere and cryptic centromere sites on chromosome 2 with the genome visualization tool, Skittle,BMC Bioinformatics
10:452, 2009." data-offset="-10" data-variation="small wide">26 the putative cryptic centromere site was considerably more sequence-diverse and structurally unordered compared to the functional centromere on chromosome 2 (data not shown). The complex higher-order architecture of this Alphoid-diverse site is clearly unique and not characteristic of a silenced degenerate centromere.
Multiple reports involving both hybridization and sequence-based research of alphoid/centromere similarity between humans and apes have found virtually no apparent evolutionary homology, except for moderate similarity on the X-chromosome centromere.20,et al., Comparative mapping of human alphoid sequences in great apes using fluorescence
in situ hybridization,
Genomics 25:477–484, 1995." data-offset="-10" data-variation="small wide">27 Baldini
et al. found that the “highest sequence similarity between human and great ape alphoid sequences is 91%, much lower than the expected similarity for selectively neutral sequences.”et al., An alphoid DNA sequence conserved in all human and great ape chromosomes: evidence for ancient centromeric sequences at human chromosomal regions 2q21 and 9q13,
Human Genetics 90:577–583, 1993." data-offset="-10" data-variation="small wide">28 Alphoid regions, in contrast to many classes of DNA sequences, are not well-conserved among taxa and even show high levels of diversity between chromosomes in the same genome.18 When the human alphoid sequences in our data set were queried against the chimpanzee genome using both BLAT and BLASTN, we were unable to obtain a single significant hit, verifying the extreme dissimilarity observed in alphoid motifs between taxa. These data corresponded well with several decades of previous research by multiple labs, discussed above.
Summarized findings
- The reputed fusion site is located in a peri-centric region with suppressed recombination and should exhibit a reasonable degree of tandem telomere motif conservation. Instead, the region is highly degenerate—a notable feature reported by a previous investigation.
- In a 30 kb region surrounding the fusion site, there exists a paucity of intact telomere motifs (forward and reverse) and very few of them are in tandem or in frame.
- Telomere motifs, both forward and reverse (TTAGGG and CCTAAA), populate both sides of the purported fusion site. Forward motifs should only be found on the left side of the fusion site and reverse motifs on the right side
- The 798-base core fusion-site sequence is not unique to the purported fusion site, but found throughout the genome with 80% or greater identity internally on nearly every chromosome; indicating that it is some type of ubiquitous higher-order repeat.
- No evidence of synteny with chimp for the purported fusion site was found. The 798-base core fusion-site sequence does not align to its predicted orthologous telomeric regions in the chimp genome on chromsomes 2A and 2B.
- Queries against the chimp genome with the human alphoid sequences found at the purported cryptic centromere site on human 2qfus produced no homologous hits using two different algorithms (BLAT and BLASTN).
- Alphoid sequences at the putative cryptic centromere site are diverse, form three separate sub-groups in alignment analyses, and do not cluster with known functional human centromeric alphoid elements.
Materials and Methods
DNA sequences described in this paper were downloaded from the National Center for Biotechnology (NCBI) web site in FASTA format text files.29 Results from online BLAT (Blast-Like Alignment Tool)17 searches were downloaded from the Genome Browser at the UCSC Genome Bioinformatics web site (genome.ucsc.edu/) as plain text files and parsed using a POSIX shell script written by J.P. Tomkins. Analyses for telomere motif occurrence and GC content were performed using a Perl script written by J.P. Tomkins. Bioinformatic scripts developed and utilized in this study may be requested by contacting author Tomkins at
jtomkins@icr.org. Figures depicting genome-view BLASTN (nucleotide BLAST) alignments were obtained using online software available at NCBI. For alphoid sequence alignments, the MUSCLE (Multiple Sequence Comparison by Log-Expectation)22 program (v 3.7;
www.ebi.ac.uk/Tools/muscle/index.html) followed by curation with Gblocks (v 0.91b; molevol.cmima.csic.es/castresana/Gblocks.html)23 was used to evaluate alignments and select conserved blocks for analysis with PhyML (v 3.0; atgc.lirmm.fr/phyml/).24 Tree data from PhyML was rendered with TreeDyn (v 198;
www.treedyn.org/).25 Sequence visualization of repeats and motif patterns were performed using the genome viewer software program Skittle26 and the entire consensus sequence of human chromosome 2 downloaded as a compressed fasta file from NCBI.