This wasn't as mysterious as I thought it might be. After a few minutes of detective work, I have come up with the following.
For starters, the video is going in the right direction at 45MB, the quantizer tends to average around Q20, but varies from 16-30 (this is a complete estimation just from watching the frame mean quantizer vary throughout the video). What that means in English is that the overall quality is very good, but bad in complex places. Also some frames are encoded at QP16 which is a considered wastage (it's like going beyond Q2 in XviD). In an ideal world you would have a video at average Q18 +/-1; so the average QP is 18, but it's allowed to vary between 17-19 in parts for efficiency (using higher quantizers in parts where it can get away with it in order to improve other scenes).
The video itself is fast moving (so not a lot of temporal redundancy) and some of the scenes (a good example at 01m 00s - 01m 03s) are incredibly hard to compensate, so instead of coding mainly residual (the difference between the previous frame and the new frame, think of it as sort of an overlay), almost complete new textures are coded, and despite it only being 3 seconds, it bloats the file (working on 20-30KB per frame could end up between 1-2MB for those 3 seconds).
The quantizer during this 3 second section flies all the way out to around Q24-30, which is ugly and gives it a sort of blurred/muddy effect (actually that's the inloop deblocking working overtime, otherwise it would simply look blocky as hell). To stop the quantizer flying out so wide, you can do one of (or both if multipass does not fix it) two things:
1) Do multipass, for example if you did a two pass and have this problem, try a three pass. Three pass is generally not required, but some people have found it's beneficial with AMVs where you have a lot of change in a short period of time.
2) Increase the qcomp value (Quantizer Compression in MeGUI). Default is 0.6. Setting it to low (eg. 0.1) gives it a more constant bitrate with quantizers that vary a lot (this looks terrible, like a 1 pass encode or old MPEG-1), setting it high (eg. 0.9) means it will find a more average quantizer for the video. This is good but not always desirable since even with a high value (more constant quantizer), other scenes may suffer due to complex scenes that can get away with high quantizers. I'd recommend trying 0.7 or 0.8 if a three pass still has problems (but carry on reading before you do).
What I also gathered from watching the video, was that there were no B-frames. No B-frames is causing a large loss of efficiency. You need to enable them ASAP.
I then went on to poke at the file itself and found that the settings used were pretty much default. I downloaded MeGUI and confirmed that they were more or less default. The good news is that we can improve the encode a fair bit. Some notable things to enable/use/change are:
AVC Profiles
Set this to high as it allows you to use an extra option I will cover later.
ME Range (--merange)
Currently set at 16 (default), I would suggest using 32. This increases the search area so the codec is able to make more matches and save bitrate, rather than encode new textures.
ME Algorithm (--me)
Default is Hexagon, again I suggest changing this to Multi Hex (known as UMH in the command line).
Subpixel refinement (--subme)
Default is 5, really suggest 6 or 7. This can make a nice difference.
Keyframe interval (--keyint)
Default is 250; you may change this to something higher but it's unlikely to benefit this AMV much. Basically it determines the maximum amount of frames before a keyframe is forced. Good for long scenes without a real scene change since you can minimise large I-frames, but this also can affect seeking.
Trellis (--trellis)
Default off. You should enable this also. If you have a good CPU, then use --trellis 2 (or Always in MeGUI), else use --trellis 1 (Final MB).
Reference frames (--ref)
Default is 1 (which is less than XviD uses). I recommend between 5 and 8 reference frames, or more if you don't mind the encode time, however the benefit diminishes after 8 frames. On one video Streicher encoded, it was 31MB without audio and 1 reference frame, and 27MB with 8 references. That's around a 15% saving, but the video had static parts that really benefited from references, but still they are good space savers; just don't expect a 15% saving on your video (they will certainly help a lot).
Mixed Reference Frames (--mixed-refs)
If you are using multiple references, then you almost must use this as it increases the flexibility allowing macroblock partitions to chose their own references rather than a whole macroblock using the same reference.
No Fast P-Skip (--no-fast-pskip)
Enabling or using this option disables early skip detection. Early skip detection can cause blocking in solid colours or gradients, which is bad news for anime. Definitely enable this option.
Minimum Quantizer (--qpmin)
To prevent wasting bitrate, you can set this to the lowest quantizer you are willing for a P-frame to be encoded at. Good values are between 16-20, but setting it too high will have an adverse effect on the quality. I would suggest 18. This will ensure that P-frames do not get lower quantizers than this which means you aren't wasting bitrate on quality you can't notice.
Factor between P and B-frame quants (--pbratio)
Default of 1.3 is good, but for extra compression, you may use up to 1.5 safely, however do beware that on a Q18 encode with a pbratio of 1.5; B-frames will get around Q21-22 which may look bad in some cases.
Macroblock options (--analyse)
Make sure you set the AVC Profile to "High", and then select All from Macroblock options. You should see Adaptive DCT, I4x4, P4x4, I8x8, P8x8 and B8x8 get checked. This allows the codec to be more flexible and make better choices for compensation (can be set in CLI as --analyse all).
B-frames (--b-frames)
Pretty essential. In XviD you would tend not to use more than 2 due to flaws in the MPEG-4 ASP standard with DCT drift, however it's safe to use multiple B-frames in H.264. I suggest 3, but you may use up to 5 or 6; just remember that more B-frames adds to encoding and decoding complexity, but at the same time is a great space saver. B-frames can also be used as references so it's double useful to use a good amount.
Adaptive B-frames
Should be enabled (a switch is not required to enable this in x264 CLI, just to disable). This decreases the number of B-frames where it makes sense, helping to increase the quality by using another frame type where a B-frame might not be optimal.
B-Pyramid (--b-pyramid)
This should also be enabled. Allows B-frames to be used as references and so improving efficiency.
RDO for B-frames (--b-rdo)
Improved motion estimation for B-frames (improving quality and efficiency) at the expense of encode time. Recommended. Requires Subpixel Refinement 6 or higher (eg --subme 6, --subme 7).
Weighted B prediction (--weightb)
Improves fades by regulating B-frame usage. Recommended also.
Bidirectional ME (--bime)
Enables an additional search for forward and backward motion vectors when coding B-frames. You should enable this also as it improves the efficiency and quality of B-frames again.
B-frame mode: (--direct)
Defines the motion prediction used for direct macroblocks. Temporal is usually the better choice since it uses the following P-frame for motion prediction, as opposed to spatial which relies on surrounding macroblocks and their motion, however selecting Auto allows x264 to switch between the two modes and choose the best one on a per frame basis which is more optimal than one or the other. Recommended to use auto.
So now I have listed the main options that you should change to get a nice encode, I suggest you play around in MeGUI and change these. The encode may slow to a crawl compared to what you have been used to, but the quality/filesize will be better and worth it in my opinion. Obviously you can ease up some of the options like less reference frames than I suggest if encoding is too slow for you, but bear with it if you can, this is what sets x264 apart from XviD. You can follow these screens which basically follow what I just suggested, or you can go and grab some MeGUI profiles from Doom9 which should be pretty nice. Also I might add that these settings are just suggested, they are not optimal/special or anything, but they are a lot better than the defaults. Check out the guide I linked to earlier for more help on the other options.
For the audio, you might want to look at Nero AAC at Q0.3-0.4 (which is VBR), much better than CBR 128kbps, however the filesize may vary a bit.