Lossless compression is proof that we live in the Matrix
- the Black Monarch
- Joined: Tue Jul 09, 2002 1:29 am
- Location: The Stellar Converter on Meklon IV
Lossless compression is proof that we live in the Matrix
Either I've been drinking WAY too many slurpees lately (most likely), or we all live in the Matrix. Here's why.
Let's say you have one byte (8 bits) of data that you want to compress. There are 256 different possible combinations of ones and zeroes that this byte could be. If you add up the number of different possible combinations of all smaller increments of data put together (7 bits + 6 bits + 5 bits + etc.), you'll find that there are only 254 combinations.
If you have a file that is exactly one gigabye in size, there are 2^8589934592 different possible combinations for that file, yet the number of all smaller combinations put together is two less than 2^8589934592.
Thus, if you try to compress a file losslessly, half of the time it would only compress by 1 bit; 1/4 of the time, it would only compress by 2 bits; 1/8 of the time, by 3 bits; etc. until you're left with two combinations that cannot be compressed at all because every possible smaller combination has been taken.
So how the hell can you zip something to 3/4 of its original size reliably?
And why don't we have holes in our skulls where our ears are? And how are babies born with digestive bacteria in their gut?
:paranoid:
Let's say you have one byte (8 bits) of data that you want to compress. There are 256 different possible combinations of ones and zeroes that this byte could be. If you add up the number of different possible combinations of all smaller increments of data put together (7 bits + 6 bits + 5 bits + etc.), you'll find that there are only 254 combinations.
If you have a file that is exactly one gigabye in size, there are 2^8589934592 different possible combinations for that file, yet the number of all smaller combinations put together is two less than 2^8589934592.
Thus, if you try to compress a file losslessly, half of the time it would only compress by 1 bit; 1/4 of the time, it would only compress by 2 bits; 1/8 of the time, by 3 bits; etc. until you're left with two combinations that cannot be compressed at all because every possible smaller combination has been taken.
So how the hell can you zip something to 3/4 of its original size reliably?
And why don't we have holes in our skulls where our ears are? And how are babies born with digestive bacteria in their gut?
:paranoid:
Ask me about my secret stash of videos that can't be found anywhere anymore.
-
- is
- Joined: Tue Jul 23, 2002 5:54 am
- Status: N͋̀͒̆ͣ͋ͤ̍ͮ͌ͭ̔̊͒ͧ̿
- Location: N????????????????
Re: Lossless compression is proof that we live in the Matrix
You're not considering how we can take advantage of redundancy in data.the Black Monarch wrote: So how the hell can you zip something to 3/4 of its original size reliably?
Two examples:
Huffman encoding algorithm, from NIST
Shannon-Fano encoding algorithm, from the ICS department at UC Irvine
The gist of the Huffman algorithm is that frequently-occurring sequences in a file receive a short code; seldomly-occurring sequences receive longer codes. It's obviously much more complicated in practice: for example there's the problem of code collision -- say you have one code that's 1001, and one code that has that as a proper prefix.
- the Black Monarch
- Joined: Tue Jul 09, 2002 1:29 am
- Location: The Stellar Converter on Meklon IV
That sounds a lot like one of those ideas that's cool in theory but doesn't work so well in pratice, like trying to lift yourself off the floor by tugging really hard on your shoelaces. But then again, I have absolutely no idea what I'm talking about, so you're probably right. I just don't see how it's mathematically possible.
Ask me about my secret stash of videos that can't be found anywhere anymore.
-
- is
- Joined: Tue Jul 23, 2002 5:54 am
- Status: N͋̀͒̆ͣ͋ͤ̍ͮ͌ͭ̔̊͒ͧ̿
- Location: N????????????????
It works. Really.the Black Monarch wrote:That sounds a lot like one of those ideas that's cool in theory but doesn't work so well in pratice, like trying to lift yourself off the floor by tugging really hard on your shoelaces. But then again, I have absolutely no idea what I'm talking about, so you're probably right. I just don't see how it's mathematically possible.
Tons of Web sites use the Lempel-Ziv algorithm to reduce the size of transmitted data (mod_gzip: freshmeat, sourceforge). animemusicvideos.org probably does that too, though I can't tell for sure without looking at the HTTP headers, which I'm too lazy to do at the moment.
http://www.ics.uci.edu/~dan/pubs/DC-Sec1.html
The text assumes some knowledge in discrete mathematics -- in particular, languages and strings, but if you have that it's quite readable. It also goes further into the implementation and theory behind the Huffman, Shannon-Fano, arithmetic, and Lempel-Ziv schemes.
- klinky
- Joined: Mon Jul 23, 2001 12:23 am
- Location: Cookie College...
- Contact:
- klinky
- Joined: Mon Jul 23, 2001 12:23 am
- Location: Cookie College...
- Contact:
- Scintilla
- (for EXTREME)
- Joined: Mon Mar 31, 2003 8:47 pm
- Status: Quo
- Location: New Jersey
- Contact:
- Sir_Lagsalot
- Joined: Mon Jun 23, 2003 6:42 pm
You can compress something to 3/4ths the size reliably because meaningful files typically have lots of redundancy, ie byte squences that repeate, bytes that appear more often than others, etc. The more random a file is, the worse it compresses (try compressing a zip file), and it is mathmatically impossible to compress truely random data. Also, any compression program must invariably expand certain files.
- the Black Monarch
- Joined: Tue Jul 09, 2002 1:29 am
- Location: The Stellar Converter on Meklon IV