Now, returning to the most salient point regarding expected genomes and proteomes, if naturalistic processes were in fact responsible for their origins:
The genetic code allows making some theoretical predictions about average protein size and frequency distribution [1,2,3]. Since stop codons can appear stochastically after any start codon, then larger proteins should always be less frequent than smaller proteins. The most frequent protein sizes should be 1 aa in length [4].[5]
But this is not at all consistent with empirical data. Furthermore, calculations can be made of the expected amount of junk DNA within cells, IF naturalistic processes were the only ones involved in their genesis.
Where is the mechanism within cells, to clean up the 99.9999% of junk that would naturally accompany the random search for necessarily lengthy genes? Do the math on ONE gene of 2500 codons in length. How much junk would occur under random naturalistic chemical processes? What amount of molecular resources do you think were really present in the hypothetical primordial ooz? It is irresponsible to assume unlimited resources.
1. Zhang JZ: Protein-length distributions for the three domains of life. Trends Genet 2000, 16(3):107-109.
2. Brocchieri L, Karlin S: Protein length in eukaryotic and prokaryotic proteomes. Nucleic Acids Res 2005, 33(10):3390-3400.
3. Jukes TH, Holmquist R, Moise H: Average proteins and genetic code. Science 1976, 194(4265):642-643.
4. Oliver JL, Marin A: A relationship between GC content and coding sequence
length. J Mol Evol 1996, 43(3):216-223.
5. Tiessen et al.: Mathematical modeling and comparison of protein size distribution in different plant, animal, fungal and microbial species reveals a negative correlation between protein size and protein number, thus providing insight into the evolution of proteomes. BMC Research Notes 2012 5:85.