Lossless compression is proof that we live in the Matrix

Locked
User avatar
the Black Monarch
Joined: Tue Jul 09, 2002 1:29 am
Location: The Stellar Converter on Meklon IV
Org Profile

Lossless compression is proof that we live in the Matrix

Post by the Black Monarch » Fri Mar 05, 2004 8:13 pm

Either I've been drinking WAY too many slurpees lately (most likely), or we all live in the Matrix. Here's why.

Let's say you have one byte (8 bits) of data that you want to compress. There are 256 different possible combinations of ones and zeroes that this byte could be. If you add up the number of different possible combinations of all smaller increments of data put together (7 bits + 6 bits + 5 bits + etc.), you'll find that there are only 254 combinations.

If you have a file that is exactly one gigabye in size, there are 2^8589934592 different possible combinations for that file, yet the number of all smaller combinations put together is two less than 2^8589934592.

Thus, if you try to compress a file losslessly, half of the time it would only compress by 1 bit; 1/4 of the time, it would only compress by 2 bits; 1/8 of the time, by 3 bits; etc. until you're left with two combinations that cannot be compressed at all because every possible smaller combination has been taken.

So how the hell can you zip something to 3/4 of its original size reliably?

And why don't we have holes in our skulls where our ears are? And how are babies born with digestive bacteria in their gut?

:paranoid:
Ask me about my secret stash of videos that can't be found anywhere anymore.

trythil
is
Joined: Tue Jul 23, 2002 5:54 am
Status: N͋̀͒̆ͣ͋ͤ̍ͮ͌ͭ̔̊͒ͧ̿
Location: N????????????????
Org Profile

Re: Lossless compression is proof that we live in the Matrix

Post by trythil » Fri Mar 05, 2004 8:40 pm

the Black Monarch wrote: So how the hell can you zip something to 3/4 of its original size reliably?
You're not considering how we can take advantage of redundancy in data.

Two examples:
Huffman encoding algorithm, from NIST
Shannon-Fano encoding algorithm, from the ICS department at UC Irvine

The gist of the Huffman algorithm is that frequently-occurring sequences in a file receive a short code; seldomly-occurring sequences receive longer codes. It's obviously much more complicated in practice: for example there's the problem of code collision -- say you have one code that's 1001, and one code that has that as a proper prefix.

User avatar
the Black Monarch
Joined: Tue Jul 09, 2002 1:29 am
Location: The Stellar Converter on Meklon IV
Org Profile

Post by the Black Monarch » Fri Mar 05, 2004 9:03 pm

That sounds a lot like one of those ideas that's cool in theory but doesn't work so well in pratice, like trying to lift yourself off the floor by tugging really hard on your shoelaces. But then again, I have absolutely no idea what I'm talking about, so you're probably right. I just don't see how it's mathematically possible.
Ask me about my secret stash of videos that can't be found anywhere anymore.

trythil
is
Joined: Tue Jul 23, 2002 5:54 am
Status: N͋̀͒̆ͣ͋ͤ̍ͮ͌ͭ̔̊͒ͧ̿
Location: N????????????????
Org Profile

Post by trythil » Fri Mar 05, 2004 9:16 pm

the Black Monarch wrote:That sounds a lot like one of those ideas that's cool in theory but doesn't work so well in pratice, like trying to lift yourself off the floor by tugging really hard on your shoelaces. But then again, I have absolutely no idea what I'm talking about, so you're probably right. I just don't see how it's mathematically possible.
It works. Really.

Tons of Web sites use the Lempel-Ziv algorithm to reduce the size of transmitted data (mod_gzip: freshmeat, sourceforge). animemusicvideos.org probably does that too, though I can't tell for sure without looking at the HTTP headers, which I'm too lazy to do at the moment.


http://www.ics.uci.edu/~dan/pubs/DC-Sec1.html

The text assumes some knowledge in discrete mathematics -- in particular, languages and strings, but if you have that it's quite readable. It also goes further into the implementation and theory behind the Huffman, Shannon-Fano, arithmetic, and Lempel-Ziv schemes.

User avatar
klinky
Joined: Mon Jul 23, 2001 12:23 am
Location: Cookie College...
Contact:
Org Profile

Post by klinky » Fri Mar 05, 2004 9:48 pm

You could think of it like writing:

"OOOOOOOO" as "8 Os"

;D Look at that %50 compression!!

User avatar
klinky
Joined: Mon Jul 23, 2001 12:23 am
Location: Cookie College...
Contact:
Org Profile

Post by klinky » Fri Mar 05, 2004 9:52 pm

PHPbb is being ghey and not liking Os @ all... so let's try Xs instead :/

write "XXXXXXXX" as "8 Xs"

User avatar
Scintilla
(for EXTREME)
Joined: Mon Mar 31, 2003 8:47 pm
Status: Quo
Location: New Jersey
Contact:
Org Profile

Post by Scintilla » Fri Mar 05, 2004 11:42 pm

klinky wrote:PHPbb is being ghey and not liking Os @ all...
No, it's just this forum... as I think somebody mentioned in another thread just today.
ImageImage
:pizza: :pizza: Image :pizza: :pizza:

User avatar
Sir_Lagsalot
Joined: Mon Jun 23, 2003 6:42 pm
Org Profile

Post by Sir_Lagsalot » Sun Mar 07, 2004 10:43 pm

You can compress something to 3/4ths the size reliably because meaningful files typically have lots of redundancy, ie byte squences that repeate, bytes that appear more often than others, etc. The more random a file is, the worse it compresses (try compressing a zip file), and it is mathmatically impossible to compress truely random data. Also, any compression program must invariably expand certain files.

User avatar
the Black Monarch
Joined: Tue Jul 09, 2002 1:29 am
Location: The Stellar Converter on Meklon IV
Org Profile

Post by the Black Monarch » Tue Mar 09, 2004 6:06 am

Aha! That is exactly the answer that I needed! Everything makes sense now...
Ask me about my secret stash of videos that can't be found anywhere anymore.

Locked

Return to “Video & Audio Help”