We are currently aware of slow responsiveness from the site & forum. This is due to our new server migration efforts and the presence of an overly aggressive bot / web crawler. Both issues should be resolved soon.

Quantisation and Quantisation Matrices: Redux

User avatar
AbsoluteDestiny
Joined: Wed Aug 15, 2001 1:56 pm
Location: Oxford, UK
Contact:
Org Profile

Quantisation and Quantisation Matrices: Redux

Post by AbsoluteDestiny » Sun Feb 06, 2005 7:14 am

I've been digging around in the old Phorum tables for my old posts and here's one I wrote wayyyyy back about what quantisation is. Have fun:



Hungry? Well, it's time for a feast.

You might want to make some sandwiches and prepare a flask of tea or somthing cause this may take some time.


In order to explain how the quanitzer matrices work, I need to explain how mpeg (and jpeg) images are compressed. The mathematical stages of compression are:

-Discrete Cosine Transform
-Quantisation
-Runlength encoding
-Binary encoding (Huffman, arithmetric)

The first two are the ones that are important for this discussion.


<b>Discrete Cosine Transform:</b>

The image is broken up into 8x8 blocks of pixels (four of these microblocks will later be stored as one Macroblock, which you've probably heard of).

In RGB, Red Green and Blue components will be analysed individually and combined later, so imagine we are just looking at one component.

Now, this block of pixels can be described mathematically with an infinite number cosine waves, similar to the Fourier series (if you knwo anything about that). However, you can get a pretty good estimation of an image just by using a pre-set amount of cos waved (in this case 64). So, each block can be recreated by combining a variety of cos waves (referred to from now on as spatial frequencies) combined with their measured intensities.

Each spatial frequency under the inverse transform looks as follows:

Image

OK, all you have to know is that each image will be made with these spatial frequencies and an intesity value.


Now have a look at these tables:

Image

a) Is the intesity values of the channel (from 0 to 255)

b) are intensities of each spatital frequency or Direct Cosine Transfer component. These numbers correspond to the pictures I've just shown you above - the grid (0,0) on the DCT relates to grid (0,0) on the pictures.

You will notice that the top-left block in b) is the higest value. This is simple - imagine the image is being compressed a great deal, you will notice that the instesity values of the original image are very similar, and because of this the majority of the image can be represented by the first spatial frequencty (the top left one) from the pictures I posted above - i.e. most of the image can be represented by flat colour.

To explain some of the other values, if you look at the number in grid (0,1) in b) and compare that to (0,1) in the frequency pictures you will see that ""-22.6"" is telling you that there is a part of the image which can be represented by a negative version of that block - and if you look at the original you can see that the image gets more intense at the bottom which is the opposite of the spatial frequency in (0,1) hence the number -22.6

Do you follow so far?

OK, now we move on to the interesting part.


<b>Quantisation:</b>

The goal of quantisation is compression... and the way this is acheived is by expressing things as simply as possible.

It is known that the human visual system is generally not particularly sensitive to the compnents in the high spatial frequencies, so the goal is to use as many low spatial frequencies (the first blocks in the picture grid) as possible.

Now look at the table and I'll talk you through it:

Image


In a) we have the original block
in b) we have the spatial frequencies and their intensities (DCT coefficients)
and c) is the quantisation tables

The quantisation tables will be stored in the jpeg or mepg file for the decoder to use and they are the tool we use for simplifiying the data.

What happens is that each value in b) is divided by the corresponding quantisation value from c) and then ROUNDED DOWN to produce table d)

Now, if we reverse this by multiplying everything in d) be the number in c) we get table e)

Table e) is the compressed version of our DCT coefficients (table b) which we can use with the DCT algorithms to recreate the image in f)

That's how the main compression works but you are probably wondering why those values are being used in the Quantisation Matrix. Well, it was your question ^_^

OK, well if you look at table c) you will see that high numbers have been given at the bottom-right... the reason for this is that it is usually good to get rid of as many high spatial frequencies to get the best compression. So if you have a high quantisation value for those frequencies, the rounding effect will be more severe and they wont be used as much. This is clearly evident as most of the DCT coefficients become 0's at the bottom right of the block.

When you choose really high values, what you find is that you can get something like this example below, which will show you exactly how the spatial frequencies are working. This is an image of my friend Tom:

Image
This is the image after very high compression (mainly using high values in the quantisation table):

Image

Here you can actually see single spatial frequencies being used instead of a combination as you would get with normal compression. Cool huh?

Quantisation tables were chosen (in the jpeg form, particularly) mainly by people deciding which looked the nicest with natural images. The quantisation table used above is quite typical.

Anime and Computer Graphics, however, have very sharp edges.

Sharp edges can be problematic because, in order to represent those, you need to use the high spatial frequencies as much as the low ones. To simplify, cos waves aren't very sharp so you need to combine high frequency cos waves together to create a 'sharp' wave.

So, the tmpeg quantisation matrices for animation/cg are all 16s for the keyframes and all 32s for the delta frames in order to treat the quantisation equally for all the possible spatial frequencies.

So that's why you have the different quantisation tables in TMPGenc, it's so you can maximise the quality by having a bias towards the important spatial frequencies by compressing them with low quantisation values and having high values for the frequencies that aren't as important. As all the spatial frequencies are the same in anime, the value is kept the same - 16 for high quality frames and 32 for lower quality frames.

There you go... dead easy.

User avatar
Scintilla
(for EXTREME)
Joined: Mon Mar 31, 2003 8:47 pm
Status: Quo
Location: New Jersey
Contact:
Org Profile

Post by Scintilla » Sun Feb 06, 2005 11:40 am

Ahhhh, that explains why DCTFilter with settings like (1,1,1,1,1,0.75,0.25,0) is so godly for compressibility purposes... but also softens edges ever-so-unnoticeably :D

Looking at the highest spatial frequency, though, it looks almost like a perfect checkerboard pattern (except on the edges), such as results from dot crawl... is it possible that a filter could be developed that could hunt down and zero out those highest frequencies in areas of temporal fluctuation in the luma plane?
I tried a simple DCTFilterD(1) (before resizing) on the Azumanga Daioh movie, but it's not helping.

Or is the dot crawl checkerboard not made up of single pixels?...

Good read, though! I'd been wondering about some of this stuff myself. Thanks. :)
ImageImage
:pizza: :pizza: Image :pizza: :pizza:

User avatar
AbsoluteDestiny
Joined: Wed Aug 15, 2001 1:56 pm
Location: Oxford, UK
Contact:
Org Profile

Post by AbsoluteDestiny » Sun Feb 06, 2005 12:20 pm

It also explains why blocking in dark areas is so common in low bitrate encodes. A blackish image would be quantised (picking the pattern in grid 0,0) and then quantised. The rounding of this data, however, may end up with different results on a block to block basis if there is a certain amount of noise. Also, if the bitrate is variable, the rounding could change frame by frame as well, leaving nasty results :)

User avatar
Jnzk
Artsy Bastid
Joined: Tue Jan 28, 2003 5:30 pm
Location: Finland
Org Profile

Post by Jnzk » Sun Feb 06, 2005 1:06 pm

Hmm. I think I understood about half of that. :P

User avatar
Kalium
Sir Bugsalot
Joined: Fri Oct 03, 2003 11:17 pm
Location: Plymouth, Michigan
Org Profile

Post by Kalium » Sun Feb 06, 2005 2:28 pm

The scary thing is that I think I understand much of the mathematics behind that...

User avatar
NeoQuixotic
Master Procrastinator
Joined: Tue May 01, 2001 7:30 pm
Status: Lurking in the Ether
Location: Minnesota
Contact:
Org Profile

Post by NeoQuixotic » Wed Feb 09, 2005 5:16 pm

That was awesome! I barely understood that and I've even read an entire book about video compression and it still blows my mind. Very nice work though :P .

User avatar
mckeed
Joined: Tue May 15, 2001 1:02 pm
Location: Troy, NY
Contact:
Org Profile

Post by mckeed » Wed Feb 09, 2005 5:57 pm

Make this sticky? Or maybe into a mini-guide?
"People can not gain anything without putting forth any effort. That is the absolute Truth" - Dante, Full Metal Alchemist
Image

User avatar
AbsoluteDestiny
Joined: Wed Aug 15, 2001 1:56 pm
Location: Oxford, UK
Contact:
Org Profile

Post by AbsoluteDestiny » Wed Feb 09, 2005 8:36 pm

mckeed wrote:Make this sticky? Or maybe into a mini-guide?
but... it's of no practical use :)

User avatar
Kalium
Sir Bugsalot
Joined: Fri Oct 03, 2003 11:17 pm
Location: Plymouth, Michigan
Org Profile

Post by Kalium » Wed Feb 09, 2005 9:23 pm

AbsoluteDestiny wrote:
mckeed wrote:Make this sticky? Or maybe into a mini-guide?
but... it's of no practical use :)
It's of mathematical significance. This is a geekier area of a geeky forum. What does practicality have to do with it?

Plus, once my differential equations class actually gets to Fourier transforms, I won't have to hunt this down!

User avatar
downwithpants
BIG PICTURE person
Joined: Tue Dec 03, 2002 1:28 am
Status: out of service
Location: storrs, ct
Org Profile

Post by downwithpants » Thu Feb 10, 2005 1:23 am

make a link to this thread from the avtech guide then? specifically where you go into quantizer settings in the xvid export page.
maskandlayer()|My Guide to WMM 2.x
a-m-v.org Last.fm|<a href="http://www.frappr.com/animemusicvideosdotorg">Animemusicvideos.org Frappr</a>|<a href="http://tinyurl.com/2lryta"> Editors and fans against the misattribution of AMVs</a>

Locked

Return to “Video & Audio Help”