Single-base mismatches: median 3, mean 4.0 per 300 bp slice. Based on a mean substitution divergence of 1.3%, you'd expect a mean of 3.9.
That's about what I was seeing doing some random manual searches. The reason I ask is because of another article written by Tomkins found here:
Genome-Wide DNA Alignment Similarity (Identity) for 40,000 Chimpanzee DNA Sequences Queried against the Human Genome is 86â89% - Answers in Genesis
In it he claims:
" Depending on the BLASTN parameter combination, average sequence identity for the 30 separate experiments between human and chimp varied between 86 and 89%."
Once again, the difference is to to not allowing for gaps. As Tomkins explains:
"Gapping was disallowed for a variety of reasons. First, Altschul et al. (1990) determined that the addition of gapping strategies for alignments designed to locate regions of local similarity using BLAST was negligible. Secondly, an objective comparison among all queries negates the use of gapping with the algorithm. Finally, the top local pair-wise alignments that were obtained involved a variety of very liberal to very stringent matching parameters for word size and e-value."
Those are very, very poor arguments for excluding gapping strategies, as I am sure you are aware.
So when gapped strategies are used the identity goes from the high 80's to the high 90's as reported by the chimp genome paper. Go figure. What Tomkins was trying to do is find a strategy that returned the numbers he wanted no matter how dishonest the comparison really was.
Looking more closely at the results, I see another three matches with good scores but large numbers (> 40) of mismatches. These are probably bogus matches as well -- probably matching to the wrong repetitive sequence. The fact that the analysis is done with unmasked sequence (i.e. no filtering to remove repetitive elements) means that everything should be taken with a large grain of salt. But if you really want to do the job right, you don't use BLAST at all, but instead use alignment code tuned for the purpose, and you do a lot of work to make sure you're making sensible comparisons -- which is of course what was done in the chimpanzee genome paper. Reading the AiG paper reminds me of judging high school science fair projects. They really have no clue what they're doing.
That was my assessment as well. My one-by-one manual blastn searches did find regions of homology between multiple chromosomes, so I assumed they were repeats or other conserved features. Obviously, the honest approach is to compare orthologous DNA, or at least try to determine where recombination events have happened. I doubt that Tomkins is interested in such a honest approach.
Upvote
0