Molecular Biologist, Zachary Williams, of Tufts University says “There are around 30 partial or full length human specific ERVs, and somewhere between 100–200 human specific solo LTRs (ERVs where the flanking LTRs have recombined, excising the coding sequence in between). Almost all of these are in the HERV-K (HML-2) clade, so they are very similar in size and sequence; a solo LTR is about 1 kilobase long, and a full length HML-2 provirus is about 10 kb. 30x10kb+200kb=500 kb, so there’s a max of about 500,000 base pairs of human specific ERV sequence; the vast, vast majority of the rest is also found in chimps.
~8% of the human genome is composed of ERVs (and their relatives the LTR retrotransposons), which would be 240 million bp (8% of 3 billion bp). 500,000 bp is 0.2% of that, so about 99.8% of human ERVs are also found in chimps. We can round down to 99% if you want to be super conservative.
I don’t know how many individual ERVs are in the human genome, and I’m not sure it’s possible to accurately determine. It’s highly dependent on what counts as an individual ERV. Do fragments that originated from the same provirus count as separate fragments? Can you reliably determine which fragments belong together? These are non-trivial problems to overcome, and the answer itself is not particularly useful information.
We can guesstimate a rough minimum by assuming all ERVs are full length (~10kb) and dividing the total amount of ERV sequence (~240,000kb) by that, to get 24,000 ERVs. The vast majority of ERVs are much smaller than 10 kb, so the real number has to be significantly larger.”
So what we really know is that many of the alleged ERVs are SIMILAR (not the same), in some locations there are only small sections and in these sectios there are some differences as to BPs and function, some are located in different sections of each specie’s respective genomes, and some may not actually be ERVs at all (though many others can be stated to be actual ERVs). To obtain a rough estimate (again not exact) this microbiologist admits we must “guesstimate” (use a line of best guess) from among possibilities.
Lines of best guess were often used in lab determinations when I worked in Biotech. A number of different outcomes or probabilities are plotted and then a midlan line or median is assumed (we see the same thing when they dated the Boxgrove bones...different methods, different size and quality of samples, etc., yielded different results...the final number of years old settled for, conveniently (ah-hem!) fall in the range they presupposed to find and this is what is taught in textbooks as the truth)
~8% of the human genome is composed of ERVs (and their relatives the LTR retrotransposons), which would be 240 million bp (8% of 3 billion bp). 500,000 bp is 0.2% of that, so about 99.8% of human ERVs are also found in chimps. We can round down to 99% if you want to be super conservative.
I don’t know how many individual ERVs are in the human genome, and I’m not sure it’s possible to accurately determine. It’s highly dependent on what counts as an individual ERV. Do fragments that originated from the same provirus count as separate fragments? Can you reliably determine which fragments belong together? These are non-trivial problems to overcome, and the answer itself is not particularly useful information.
We can guesstimate a rough minimum by assuming all ERVs are full length (~10kb) and dividing the total amount of ERV sequence (~240,000kb) by that, to get 24,000 ERVs. The vast majority of ERVs are much smaller than 10 kb, so the real number has to be significantly larger.”
So what we really know is that many of the alleged ERVs are SIMILAR (not the same), in some locations there are only small sections and in these sectios there are some differences as to BPs and function, some are located in different sections of each specie’s respective genomes, and some may not actually be ERVs at all (though many others can be stated to be actual ERVs). To obtain a rough estimate (again not exact) this microbiologist admits we must “guesstimate” (use a line of best guess) from among possibilities.
Lines of best guess were often used in lab determinations when I worked in Biotech. A number of different outcomes or probabilities are plotted and then a midlan line or median is assumed (we see the same thing when they dated the Boxgrove bones...different methods, different size and quality of samples, etc., yielded different results...the final number of years old settled for, conveniently (ah-hem!) fall in the range they presupposed to find and this is what is taught in textbooks as the truth)
Last edited:
Upvote
0