Starting today August 7th, 2024, in order to post in the Married Couples, Courting Couples, or Singles forums, you will not be allowed to post if you have your Marital status designated as private. Announcements will be made in the respective forums as well but please note that if yours is currently listed as Private, you will need to submit a ticket in the Support Area to have yours changed.
This is wrong -- Mark has misunderstood something he read. Within the modern human population, one particular class of indels (transposons) accounts for 10-20% as many variable bases as single base substitutions (SNPs). Transposons are, however, only 0.5% of all indels. Combined, all indels account for many more variable bases than do SNPs.
As for the total number of bases involved in indels, we still do not have a very good estimate.
The chimpanzee genome paper found that indels accounted for 2.3 - 2.6 times as many bases (different between the two species) as single-base substitutions.
The indel map cited above found in modern human variation that indels accounted for 1.3 times as many bases as single-base substitutions. That study was limited to indels shorter than 10,000 base pairs, however, and is therefore a lower limit on the total number. Indels are known to be as long as hundreds of thousands of base pairs; even small numbers of these large indels contribute enormously to the total number of bases affected by indels. When a complete survey can be done, finding that indels contribute at least 2.5 times the variation (in base pairs) as SNPs will be consistent with everything we've seen so far. (In fact, I suspect we will find that indels contribute a larger fraction to modern variation than they do to between-species differences, since they are somewhat more likely to be deleterious and thus not to stick around.)
In any case, it is clear that the number of bases that vary between humans as a result of indels is larger than the number that vary because of SNPs. In short, Mark's entire line of argument here is wrong.
I find it quite sad, myself.I just love how I'm allways misunderstanding this stuff when it's not really all that hard:
Yes. And . . . what?"Transposons and transposon-like repetitive elements collectively occupy 44% of the human genome sequence. "
It's in fact quite ambiguous, and badly written. As the group's own numbers show, SNPs account for the majority of genetic variation as measured by events, not as measured by base pairs.In terms of base pairs SNPs amount to much more of the variation.
"Therefore, with 10 million bases of variation, SNPs account for the majority of common human genetic variation, followed by indels and then transposon insertion polymorphisms."
That is not ambiguise and it does not take a PHD to understand it. SNPs are the majority of variation.
10-20% of the base pair variation is caused by transposons. Transposons are only one half of one percent of all indels. How many base pairs do the other 99.5% of indels represent, Mark? Just answer that question."On the other hand, if we assume that the average transposon polymorphism in humans is ?5001000 bp in length, then the total amount of variation caused by common transposon insertions is 12 million base pairs (equivalent to 1020% of the base pair variation caused by SNPs). "
10-20% of the base pair variation the 1-2 million base pair insertions.
Since transposons continue to be only a tiny fraction of all indels, your comment does not respond to my point. Transposons have been well measured. Long indels (which are much longer than transposons) have not been well measured. Therefore the total amount of base pairs introduced by indels has not been well measured. Even with the measurements that we do have, however, we can set a lower limit on it that is more than ten times higher than you claim is the case.The paper suggests otherwise:
"Our method was highly efficient and led to the identification of 605 nonredundant transposon insertion polymorphisms in 36 diverse humans. We estimate that this represents 2535% of ~2075 common transposon polymorphisms in human populations. "
I don't give a fig whether you are convinced by this or by anything else. Believe that your head is a turnip, for all I care. I only respond to you so that readers won't be confused by the falsehoods you post. My advice as a geneticist is for the casual reader to ignore anything and everything that Mark says about genetics.I don't know that I am totally convinced of this any more then I am that mutation rates are measured in terms of base pairs. I'm not convinced that there is a point being made here, looking at the paper itself would be the best advice I could give the casual reader.
You've gone off the track here. They started with an algorithm that could detect indels 16 bp long or shorter, and then added a second algorithm to find longer ones. The total that they found from both methods was 600,000.So, to summarize what we have so far:
1. The team searched out all indels with length more than (not sure about "and equal to") 16bp.
2. They found 606,093 such indels.
I find it quite sad, myself.
Yes. And . . . what?
It's in fact quite ambiguous, and badly written. As the group's own numbers show, SNPs account for the majority of genetic variation as measured by events, not as measured by base pairs.
10-20% of the base pair variation is caused by transposons. Transposons are only one half of one percent of all indels. How many base pairs do the other 99.5% of indels represent, Mark? Just answer that question.
To answer it, you should look at the paper I quoted. It's from the same group that produced the transposon paper you're quoting from.
First they wrote a paper on the transposons, which contribute 10-20% as much as SNPs to base pair variation, and then they wrote a second paper about the other 99.5% of indels. The second paper makes it clear that indels taken as a whole contribute more base pairs of variation than do SNPs. Those are the facts. Interpret them however you wish, but stop getting the facts wrong.
Since transposons continue to be only a tiny fraction of all indels, your comment does not respond to my point.
Transposons have been well measured. Long indels (which are much longer than transposons) have not been well measured. Therefore the total amount of base pairs introduced by indels has not been well measured. Even with the measurements that we do have, however, we can set a lower limit on it that is more than ten times higher than you claim is the case.
I don't give a fig whether you are convinced by this or by anything else.
Believe that your head is a turnip, for all I care.
I only respond to you so that readers won't be confused by the falsehoods you post.
My advice as a geneticist is for the casual reader to ignore anything and everything that Mark says about genetics.
No, the 1-2 million bases is the proportion of bases due to transposons, not the proportion due to indels. It says so right in the part of the paper you keep quoting. Just reaad what you quote. The other 99% of indels are all of the classes I listed from their second paper (single-base indels, multimeric repeat expansions and "other"), which together occur 100 times as often.The 1-2 million bases out of 10 million is proportion of indels to SNPs. If you have another 99% somewhere else I would love to hear about them because I know how this 99% could mean just about anything.
They are looking at one small class of indels that are (summed up) 1/10 the size of SNPs. They are not (in this paper) looking at the other indels.So even though they are looking at indels that are 1/10 the size of SNPs, they are less then 1% of all the indels. Gotcha
Mostly it's because transposons are fifty or 100 times as big as the average indel. But it is also true that there have been relatively few active transposons in recent human evolutionary history.Transposons are close to half the genome but a small fraction of the indels contributing to genetic variation. This would seem to be due to the fact that the transposons did their thing 35 million years ago and somehow stopped working.
I was going to wait until Mark does his "expositive review of six papers" argument and go through them one by one but since sfs jumped the gun I might as well jump in. If you take a careful look at the paper it's obvious that in terms of base pairs, indels outnumber SNPs in their contribution to human natural genetic variability.
First, the section actually cited by mark:Together with previous studies, our analysis indicates that SNPs, indels, and transposon insertion polymorphisms represent significant sources of genetic variation in humans. Human populations are estimated to harbor ∼10 million common SNPs (Judson et al. 2002), ∼2 million common indels (our unpublished data), and ∼2000 common transposon insertion polymorphisms (this study). Therefore, with 10 million bases of variation, SNPs account for the majority of common human genetic variation, followed by indels and then transposon insertion polymorphisms. On the other hand, if we assume that the average transposon polymorphism in humans is ∼500–1000 bp in length, then the total amount of variation caused by common transposon insertions is 1–2 million base pairs (equivalent to 10–20% of the base pair variation caused by SNPs). Thus, in terms of the number of base pairs, common transposon insertions cause significant levels of human genetic variation. Moreover, humans also are likely to harbor >10 million rare private transposon insertions (cases in which only one or a few individuals have the insertion). Therefore, transposon insertion polymorphisms cause significant levels of human variation.(emphasis added)
Pay attention to the bolded phrase: it signals that the argument has changed. "In terms of the number of base pairs", common transposon insertions cause significant levels of human genetic variation - even though SNPs outnumber transposon insertions roughly 5,000 to 1. Clearly this means that the preceding analysis ("SNPs account for the majority of common human genetic variation" etc.) was not in terms of the number of base pairs but in terms of the number of events.
This is semantics, mark, so criticize it as semantics if you willbut there's more. This requires a close look at the methodology section (in which I get lost too). But first, a look at the abstract: We began by identifying 606,093 insertion and deletion (indel) polymorphisms in the genomes of diverse humans. We then screened these polymorphisms to detect indels that were caused by de novo transposon insertions.
"606,093 indel polymorphisms", aight? Just how long were those indels?After the traces were successfully mapped to unique genomic locations, they were unmasked and aligned to their assigned genomic locations using the Bl2Seq program (NCBI). The Bl2Seq program allowed for as much as a 16-base gap in the alignments and led to identification of indels as large as 16 bases in length.
A new algorithm also was developed to identify indels that were >16 bp in length. Our strategy was designed to split trace data into two blocks upon encountering a region in the pairwise alignment that no longer matched the query. The first block of sequence that matched was maintained in the correct position, and the nonmatching sequence was moved over as a block, 1 base at a time, until a match was obtained. The Perl program that was developed to accomplish this task moved the nonmatching block until it detected either a perfect alignment or a distance of 10,000 bases (the maximum distance allowed by the program). The 5 bases on each side of an indel candidate were required to have Phred scores of ≥20 to ensure that high-quality bases were being used to locate the indel junctions. Indel candidates were deposited into dbSNP under accession nos. ss8029278–ss8176133, ss8475737–ss8484870, ss14926095–ss15354938, and ss15357378–ss15378640.
Now, I'm a bit fuzzy on the Bl2Seq program (I'm an armchair geneticist) but clearly their settings enabled them to identify indels with length 16 base pairs and above. The candidate indels were further winnowed with the Perl program mentioned in the second paragraph to find genuine and extremely convincing indels - this means that conceivably there were more indels longer than 16bp than detected. The indel sequences were catalogued, and if you look at the accession numbers you will note that 606,093 entries were added to the database - matching the number of indels noted in the abstract.
So, to summarize what we have so far:
1. The team searched out all indels with length more than (not sure about "and equal to") 16bp.
2. They found 606,093 such indels.
What does this tell us? 606,093 indels x 16 bp/indel gives us a minimum of 9.7 million base pairs. In other words, indels contribute at least 9.7 million base pairs of variation to the human genome. The ~10 million SNPs only contribute ~10 million base pairs, so this leaves indels and SNPs almost neck-to-neck.
Is this over? No way. Jumping down to the conclusion:Together with previous studies, our analysis indicates that SNPs, indels, and transposon insertion polymorphisms represent significant sources of genetic variation in humans. Human populations are estimated to harbor ∼10 million common SNPs (Judson et al. 2002), ∼2 million common indels (our unpublished data), and ∼2000 common transposon insertion polymorphisms (this study). Therefore, with 10 million bases of variation, SNPs account for the majority of common human genetic variation, followed by indels and then transposon insertion polymorphisms.
Note ~2 million common indels, out of which this study publishes the identification of roughly 600k. This means that there are still 1.4 million common indels with length < 16bp. And that's a lot. Assuming that the mean length of these indels is 8bp, the 1.4 million indels add up to 11.2 million base pairs - meaning that indels, giving in total 20.9 million base pairs, outnumber SNPs' 10 million base pairs in terms of base pairs 2 to 1.
And 20.9 million base pairs is something like an absolute minimum, remember that for the 600k indels mentioned in the study their minimum length is 16bp, not their mean length, which means that it's likely that the overall indel contribution is far higher.
2:1 is still relatively low compared to the chimp/human divergence ratio of ~4:1. But it's very different from the 1:10 that you've been trying to show and it completely invalidates this particular line of argument against human evolution.
I didn't say you were telling lies. I said you were repeating falsehoods. I have no doubt that you think your misunderstandings are true. Just because it's an explicit statement doesn't mean you can't misunderstand it. Take the following:So taking an explict statement at face value is now a lie?
Of course bp means base pair; I have never suggested otherwise. The misunderstanding on your part is in thinking that this means that mutations are measured in base pairs. They're not. The sequence that's doing the mutating is measured in base pairs.Tell me Steve, does the bp in 2 x 10^-9/bp/generation mean base pair yet or is that just insane?
No, the 1-2 million bases is the proportion of bases due to transposons, not the proportion due to indels. It says so right in the part of the paper you keep quoting. Just reaad what you quote. The other 99% of indels are all of the classes I listed from their second paper (single-base indels, multimeric repeat expansions and "other"), which together occur 100 times as often.
They are looking at one small class of indels that are (summed up) 1/10 the size of SNPs. They are not (in this paper) looking at the other indels.
Mostly it's because transposons are fifty or 100 times as big as the average indel. But it is also true that there have been relatively few active transposons in recent human evolutionary history.
Your impression is wrong. Well, sort of. sfs just explained HOW this is wrong, but you went on to repeat yourself (maybe you hadn't gotten to that post yet). I just want to emphesize this because I keep seeing you say this over and over. SNPs are the vast majority of mutations (events) and a minority in terms of differing base pairs. When they use the term "genetic variation" they're talking about EVENTS here. sfs (the resident geneticist) says this is poorly written, so I assume it's not some sort of field standard or anything, just a poor choice of words.mark kennedy said:I had hoped that there was a way to seperate the supposition from the actual science. I never really had a problem with someone coming to an informed opinion based on the actual evidence. The whole problem is that the measure of genetic variation is indicating that indels are about 1/10 of the human genetic variation. I was under the impression that some 10 million SNPs represented the the majoritiy of genetic variation between humans. Now there are these enigmatic indels being added to Hapmaps which should prove interesting over time.
And what's this obsession with the percentage of the genome? Everybody agrees that the magazine article you keep bringing up was poorly written in that they didn't give units to their 99%. You keep contrasting this 99% with the difference in terms of base pairs, but nobody here argues that the base pairs in chimps and humans ARE 99% the same. I doubt the magazine article you so hate said that either, but it's impossible to tell because they didn't give units.sfs said:It's in fact quite ambiguous, and badly written. As the group's own numbers show, SNPs account for the majority of genetic variation as measured by events, not as measured by base pairs.
Given Mark's arguments here and his recent tirade about evolution in schools (see Creationist subforum), I have to ask: Mark, what is your formal education and training as far as genetics and evolution go?
Given Mark's arguments here and his recent tirade about evolution in schools (see Creationist subforum), I have to ask: Mark, what is your formal education and training as far as genetics and evolution go?
Your impression is wrong. Well, sort of. sfs just explained HOW this is wrong, but you went on to repeat yourself (maybe you hadn't gotten to that post yet). I just want to emphesize this because I keep seeing you say this over and over. SNPs are the vast majority of mutations (events) and a minority in terms of differing base pairs. When they use the term "genetic variation" they're talking about EVENTS here. sfs (the resident geneticist) says this is poorly written, so I assume it's not some sort of field standard or anything, just a poor choice of words.
Since they gave units for each of their numbers, however, there's no reason to misunderstand this basic point based on this poorly written sentence!
I think sfs said it better in post 43:
It's in fact quite ambiguous, and badly written. As the group's own numbers show, SNPs account for the majority of genetic variation as measured by events, not as measured by base pairs
And what's this obsession with the percentage of the genome? Everybody agrees that the magazine article you keep bringing up was poorly written in that they didn't give units to their 99%. You keep contrasting this 99% with the difference in terms of base pairs, but nobody here argues that the base pairs in chimps and humans ARE 99% the same. I doubt the magazine article you so hate said that either, but it's impossible to tell because they didn't give units.
I mean, you can keep bringing it up if it makes you feel better, but nobody's claiming that there's less than 1% diveregence in terms of base pairs, so it's a bit of a straw man if you're trying to show that somebody on this board is wrong about something.
If the average bp per mutation is 9 (didn't we calculate about 7.7 WITHOUT the largest?) then the average number of base pairs could easily be 90M with a NUMBER of mutations of 5M. What about this doesn't make sense?mark kennedy said:Now I would like you to do me the courtesy of considering this common measure of mutations as applied to the known divergance between chimpanzees and humans. Then I would like for you to give yourself a minute and think about 125 million base pairs in 5 million years because it comes to 25 per year for 5 million years or 500 base pairs per generation (estimated at 20 years).
This simply does not happen and what sfs is trying to do is convince everyone that the 90 million base pairs are the same as 5 million mutation events. If you do that then the formula I just gave you is absolutly meaningless.
did you miss sfs repeatedly reminding you that transposons are a small fraction of all indels? He noted this in post 43 and 48 and at least one other place somewhere between there and here. Why do you keep claiming that this quote says that indels are more prominent (in terms of base pairs) when this quoted paper CLEARLY only addresses transposons (a small subset of indels)?mark kennedy said:Now for the poorly written statement that comes right out and makes the comparison:
Therefore, with 10 million bases of variation, SNPs account for the majority of common human genetic variation, followed by indels and then transposon insertion polymorphisms. On the other hand, if we assume that the average transposon polymorphism in humans is ~500–1000 bp in length, then the total amount of variation caused by common transposon insertions is 1–2 million base pairs (equivalent to 10–20% of the base pair variation caused by SNPs).
There is a reason that the actual base pairs involved is critical. In the comparison of the Chimpanzee genome and the human genome the indels dwarfed the size of the single nucleotide insertions. In this comparison it was quite the opposite. I'm being told that I just misunderstood but I'm still not buying it.
Mutation rates are almost allways given in terms of base pairs, I have yet to see an exception.
"RATES of spontaneous mutation per replication per measured target vary by many orders of magnitude depending on the mutational target size (from 1 to >1010 b, where b stands for base or base pair as appropriate)
...With 6.4 x 10^9 base pairs in the diploid genome, a mutation rate of 10^-8 means that a zygote has 64 new mutations. It is hard to image that so many new deleterious mutations each generation is compatible with life, even with an efficient mechanism for mutation removal. Thus, the great majority of mutations in the noncoding DNA must be neutral."
Table 5. Mutation rates estimated from specific loci in higher eukaryotes
Rates of Spontaneous Mutation
Uh, mark, do you read what you cite? The abstract reads (formatted for clarity) :
Look at the units used for mutation rates (underlined) :
1 per genome per replication
0.1 per genome per replication
1/300 per genome per replication
0.1-100 per genome per sexual generation
1/300 per cell division per effective genome
None of these units for mutation rates involve base pairs in any way.
Mutation rates in higher eukaryotes are roughly 0.1/100 per genome per sexual generation but are currently indistinguishable from 1/300 per cell division per effective genome
RATES of spontaneous mutation per replication per measured target vary by many orders of magnitude depending on the mutational target size (from 1 to >1010 b, where b stands for base or base pair as appropriate),
the average mutability per b (from 10-4 to 10-11 per b per replication),
and the specific mutability of a particular b (which can vary by >10^4-fold).
The only things described in terms of base pairs (italicized) are genome sizes and mutation lengths, not mutation rates or number of mutations.
Exceptions, right there.
Keep your eyes on that while I trot out an analogy.
Let's say the accident rates in KL are 2 x 10^-8 per car per day.
Now, if there are ten million cars in KL on any given day, and there are 365 days in a year, can I then say that only 73 cars will be affected by accidents every year? That's not true. What I know is that 73 accidents will happen every year:
(2 x 10^-8 / car / day) x (10 million cars) x (365 days)
= 73
The resultant "73" is unitless,
In the same way, their calculation is:
No of mutations =
10 ^ -8 per base pair per generation x 6.4 x 10^9 base pairs x 1 generation
= 64.
We use cookies and similar technologies for the following purposes:
Do you accept cookies and these technologies?
We use cookies and similar technologies for the following purposes:
Do you accept cookies and these technologies?