> Guide Index

Multimedia Frameworks

Multimedia Frameworks are a bit of an abstract concept, since we don't usually work directly with them. It is important to at least know what they are and be familiar with them though. Here is Wikipedia's definition of a multimedia framework:

A multimedia framework (MMF) is a software framework that handles media on a computer and through a network. A good multimedia framework offers an intuitive API and a modular architecture to easily add support for new codecs, container formats and transmission protocols. It is meant to be used by applications such as media players and audio or video editors, but can also be used to build Videoconferencing applications, media converters and other multimedia tools.

Got that? No? Well, let's just grossly oversimplify it by saying that a multimedia framework is the low level computer code that really handles all of the video processing stuff. Rather than worrying too much about exactly what a multimedia framework is, let's just look at some examples.


Don't confuse this with the Quicktime container, which is implemented ON TOP of the Quicktime multimedia framework. Quicktime was one of the first multimedia frameworks to come out, way back in 1991. Despite being so old, it was quite powerful and well designed right from the beginning. Apple has continued to improve the framework since it's initial release, so it's still a very good framework even today. For a long time, Macs were considered to be one of the best choices for a computer if you needed to edit video. This was largely based on the power and flexibility that the Quicktime multimedia framework offered.

Now, as awesome as quicktime is, there is one very important thing that you need to know about it: It's implementation on Microsoft Windows really sucks.

Video for Windows

Microsoft released Video for Windows in 1992 as a response to Apple's Quicktime. Video for Windows however, was not nearly as robust as Quicktime was. Despite that, it has become a very important framework for dealing with AVI files. You see, AVI files are basically the only things supported by the Video for Windows framework, and the framework only lets you do very basic things with AVI files. While it's not such a great framework for playback, it gets the job done when it comes to things like encoding or basic editing functions. Some very important video tools such as VirtualDub and AviSynth are based heavily around the Video for Windows framework. For VirtualDub to be able to encode or decode an AVI file, you must have the Video for Windows codec used by that AVI installed on your computer. Just about any AVI codec will have a Video for Windows version, such as Xvid, HuffYUV, Lagarith, Divx, and others.


Microsoft created the DirectShow framework to replace the Video for Windows framework, though Video for Windows is still better than DirectShow in some regards. DirectShow is a very powerful framework for video playback, however when it comes to aspects like editing, it leaves much to be desired. The primary reason for this is that DirectShow is not frame accurate. This means that if a program tells DirectShow, "alright, show me the frame that occurs at timecode 01:28.21 in this video" DirectShow might not always return the same frame!

Many media playback software such as Windows Media Payer, Media Player Classic, ZoomPlayer, and others use DirectShow to playback videos. DirectShow is backwards compatable with Video for Windows, which means for example, if you installed the Video for Windows Xvid codec, then you will be able to watch Xvid videos through DirectShow as well. However, the reverse is not true: If you install a DirectShow decoder for Xvid, then you can not open up an Xvid in a Video for Windows based application like VirtualDub.

It's worth mentioning at this point, that DirectShow doesn't really use codecs like Video for Windows does. While technically I suppose you can have a DirectShow codec, i.e. something that both Compresses and Decompresses the video through DirectShow, we typically only see decompressors in DirectShow. We call these decoder filters. Since it doesn't compress, it is technically not a codec, but most people refer to them as codecs anyway.

DirectShow works by creating what is called a filter graph. Now, this graph is usually something that happens entirely in the background whenever you play a video file - it's not something that you are usually aware of. There is a nice little tool called Graphedit, which allows you to display the filter graphs so you can see what is going on. Let's take a look at one:


Now, what I have done is opened up an MP4 video file to display its filter graph. The first thing that happens, the MP4 file is opened by a special filter called a splitter. The splitter takes the video and audio (and even other things like subtitles), and it splits them apart and sends each stream to another filter. Now, the splitter can only do this if it actually understands the type of video file that you are trying to open. A fresh install of Microsoft Windows doesn't know how to read MP4 files, so it would not be able to render this graph, and thus would not be able to play the file. By installing a splitter such as Haali Media Splitter, which happens to be used here, it adds support for MP4 and MKV files and allows them to be split and sent to the next stop in the filter chain.

Now, the video stream gets sent to a decoder filter which will decompress the video. As you can see here, the splitter has sent the video stream to the ffdshow Video Decoder. It does the same thing with the audio stream, sending it to an audio decoder. Finally, the decoded video stream is sent to the Video Renderer, which displays it on the screen, and the decoded audio stream is sent to the DirectSound device, which plays the sound through your speakers.

Now, while this filter graph is constructed completely automatically whenever you play a video file, it is precicely this fact that actually causes quite a lot of the playback problems that people have. The problem is that there can be multiple filters installed that are capable of doing the same thing. You might have 3 different splitters installed that can split MP4 files. You might have 5 or more decoders that are capable of decoding an Xvid video. So what happens, is you have a bunch of different filters fighting over the right to do the same thing. Some of these decoders might even have bugs in them that cause playback problems.

What invariably happens, is that someone has a problem playing back a certain video file, and the first thing that they usually think is "I'll go and download a codec pack". Now, just about anyone who knows anything about video would always tell you that codec packs are a *really* *really* *really* bad idea. Many playback problems come precicely from having too many decoders installed on your system, and then what people want to do is go and install some huge pack FULL of decoders. Now, it's not too difficult to see why codec packs are popular - most people know nothing about DirectShow or these filter graphs, all they know is that they need something else installed in order to play a certain video. They don't know *what* they need to install, they just know they need *something*. So what they do is they go grab whichever codec pack looks like it has the most stuff in it. Perhaps worse, they may even try installing more than one codec pack. By this time their system is probably a complete mess, and I imagine they will have more trouble playing back videos than they did before they started.


Now, the last multimedia framework to discuss is FFMPEG. It is an open source, cross-platform framework that is frequently updated. It's not nearly as robust as something like Quicktime or even DirectShow. While something like DirectShow can be extended in any way you want by installing new filters, FFMPEG is entirely self-contained. This is probably FFMPEG's greatest strength. With FFMPEG, there are no codecs or filters to install. Everything is just there. There are many media players that are based on FFMPEG, such as Mplayer and VLC. On windows, many users often go with an FFMPEG-based player like VLC instead of one based on DirectShow, since everything just works right out of the box, and you can avoid the problems with all sorts of filters fighting with one another.

Key Concepts

- Tools such as AviSynth and VirtualDub are based on the Video for Windows framework.

- On windows machines, playback is typically handled through the DirectShow framework, which utilizes many different filters to handle different aspects of a video file.

- Don't install "Codec Packs" on your system. They typically do more harm than good.

- Video players based on the FFMPEG framework don't require any sort of codecs for playback.