Hungry? Well, it's time for a feast.
You might want to make some sandwiches and prepare a flask of tea or somthing cause this may take some time.
In order to explain how the quanitzer matrices work, I need to explain how mpeg (and jpeg) images are compressed. The mathematical stages of compression are:
-Discrete Cosine Transform
-Quantisation
-Runlength encoding
-Binary encoding (Huffman, arithmetric)
The first two are the ones that are important for this discussion.
<b>Discrete Cosine Transform:</b>
The image is broken up into 8x8 blocks of pixels (four of these microblocks will later be stored as one Macroblock, which you've probably heard of).
In RGB, Red Green and Blue components will be analysed individually and combined later, so imagine we are just looking at one component.
Now, this block of pixels can be described mathematically with an infinite number cosine waves, similar to the Fourier series (if you knwo anything about that). However, you can get a pretty good estimation of an image just by using a pre-set amount of cos waved (in this case 64). So, each block can be recreated by combining a variety of cos waves (referred to from now on as spatial frequencies) combined with their measured intensities.
Each spatial frequency under the inverse transform looks as follows:

OK, all you have to know is that each image will be made with these spatial frequencies and an intesity value.
Now have a look at these tables:

a) Is the intesity values of the channel (from 0 to 255)
b) are intensities of each spatital frequency or Direct Cosine Transfer component. These numbers correspond to the pictures I've just shown you above - the grid (0,0) on the DCT relates to grid (0,0) on the pictures.
You will notice that the top-left block in b) is the higest value. This is simple - imagine the image is being compressed a great deal, you will notice that the instesity values of the original image are very similar, and because of this the majority of the image can be represented by the first spatial frequencty (the top left one) from the pictures I posted above - i.e. most of the image can be represented by flat colour.
To explain some of the other values, if you look at the number in grid (0,1) in b) and compare that to (0,1) in the frequency pictures you will see that ""-22.6"" is telling you that there is a part of the image which can be represented by a negative version of that block - and if you look at the original you can see that the image gets more intense at the bottom which is the opposite of the spatial frequency in (0,1) hence the number -22.6
Do you follow so far?
OK, now we move on to the interesting part.
<b>Quantisation:</b>
The goal of quantisation is compression... and the way this is acheived is by expressing things as simply as possible.
It is known that the human visual system is generally not particularly sensitive to the compnents in the high spatial frequencies, so the goal is to use as many low spatial frequencies (the first blocks in the picture grid) as possible.
Now look at the table and I'll talk you through it:

In a) we have the original block
in b) we have the spatial frequencies and their intensities (DCT coefficients)
and c) is the quantisation tables
The quantisation tables will be stored in the jpeg or mepg file for the decoder to use and they are the tool we use for simplifiying the data.
What happens is that each value in b) is divided by the corresponding quantisation value from c) and then ROUNDED DOWN to produce table d)
Now, if we reverse this by multiplying everything in d) be the number in c) we get table e)
Table e) is the compressed version of our DCT coefficients (table b) which we can use with the DCT algorithms to recreate the image in f)
That's how the main compression works but you are probably wondering why those values are being used in the Quantisation Matrix. Well, it was your question ^_^
OK, well if you look at table c) you will see that high numbers have been given at the bottom-right... the reason for this is that it is usually good to get rid of as many high spatial frequencies to get the best compression. So if you have a high quantisation value for those frequencies, the rounding effect will be more severe and they wont be used as much. This is clearly evident as most of the DCT coefficients become 0's at the bottom right of the block.
When you choose really high values, what you find is that you can get something like this example below, which will show you exactly how the spatial frequencies are working. This is an image of my friend Tom:

This is the image after very high compression (mainly using high values in the quantisation table):

Here you can actually see single spatial frequencies being used instead of a combination as you would get with normal compression. Cool huh?
Quantisation tables were chosen (in the jpeg form, particularly) mainly by people deciding which looked the nicest with natural images. The quantisation table used above is quite typical.
Anime and Computer Graphics, however, have very sharp edges.
Sharp edges can be problematic because, in order to represent those, you need to use the high spatial frequencies as much as the low ones. To simplify, cos waves aren't very sharp so you need to combine high frequency cos waves together to create a 'sharp' wave.
So, the tmpeg quantisation matrices for animation/cg are all 16s for the keyframes and all 32s for the delta frames in order to treat the quantisation equally for all the possible spatial frequencies.
So that's why you have the different quantisation tables in TMPGenc, it's so you can maximise the quality by having a bias towards the important spatial frequencies by compressing them with low quantisation values and having high values for the frequencies that aren't as important. As all the spatial frequencies are the same in anime, the value is kept the same - 16 for high quality frames and 32 for lower quality frames.
There you go... dead easy.