Come on, Barbarian. You know what else works? The internal combustion engine. But just like Shanon's theorems, it has nothing to do with information. Shannon is a red herring.
And that has nothing to do with the common understanding of information. I believe Shanon himself later wished he used a different word than "information," because it's an inaccurate descriptor.
I humbly disagree, as would most communications engineers. Shannon information, for example, is one of the key ideas behind how you would allocate different pipelines to different Internet transmissions - and it doesn't get more "informational" than that.
Shannon information says that the more choices available to the "source", the more information will be contained in the "signal". That's actually fairly self-evident. It explains, for example, why we use English (and other languages) to communicate, and not binary.
Let's say you're reading an English book. Treat it as a signal from a source: an author pulling out sequences of English letters that s/he wishes to communicate to you. You know beforehand that (disregarding spaces and punctuation) the contents of this signal will be a series of English letters, of which there are 26.
Okay, enough mathnerding. Start reading. The first letter is "A". Wow! Before you read that, there were 26 possible choices that letter could have been, each with its varying probabilities. But your reading it narrowed the possibilities down from 26 to 1. The second letter is "n". Again, that letter cuts down 26 possibilities to 1. And so on.
Now let's say you're, for some obscure reason, reading binary code. Brace yourself. The first character ... is 1! But that's not too exciting. It was only going to be 0 or 1, anyhow. You haven't really learned very much, as opposed to when you found that the first letter of the book was "A". The second character is also 1. So is the third. The fourth is a zero, but the fifth is a 1. Yet each of those symbols doesn't reveal a lot.
Why English, not binary? Because there are a lot more possible permutations of a larger symbol set, and so any one word carries a lot of information. If I wanted to construct a language in binary, I would be hard-pressed: there are only 255 possible binary "words" of eight "letters" or fewer. In contrast, how many "words" can you form out of combinations of eight or less English letters? 217 trillion.
The wider the distribution of characters available to the source, the more information it can convey in a given signal.
(Though yes, Shannon information is quite a red herring to the evolution discussion. But not for the reasons you make out.)
A frameshift does not necessarily produce randomness. As with the written language, DNA has built-in redundancy. In the case of the frameshift, it was in a repetitive section.
What mechanisms, exactly, exist in DNA that allow it to retain function in the event of a frameshift? I want to have a clearer answer before I respond to this claim.