I've been looking over a video analysis method I wrote a year ago, and I think you might find it interesting. It is old, and chock full of techno-buzzwords ("analytic blah blah") so I'll just sum it up for you.
I'm considering an analysis system that basically stores information in unbuilt form. It finds reference images and transforms it to a big "vector"-style map. It then uses some fancy stuff ("flow modeling") to guess how those points are moving, then it tries to guess the 3-d object and motion that made the 2-d stream of frames. In the case of anime, the cels and motions.
The decoding is done with commonly availible algorithms, and may be assisted by a human operator. But for those anime companies doing their production electronically, might as well have them put out their source and compile it realtime.
In true Linux style - your DVD player would have to render "RahXephon 17, Source Code" yourself before you can see it!
This is on mechacker.com, and for those who don't like .doc files...
[quote="Daniel Wang"]
Stream Analysis through Flow Modeling
Daniel Wang
When faced with the task of developing an effective method of analyzing and compressing video, one must consider the properties of the media and task involved. The video is simply a recording of an environment and objects, with some effects. Even the media of animation is principally the same – projecting a 3-dimensional field in a series of 2-dimensional views. The extrapolated patterns in the media are ultimately what make analysis and compression possible.
Present-day analysis technologies use a “dictionary”, or mathematically-generated table, for storing information that is repeated across a frame or frames. With conventional media, this method works excellently, as seen with DivX and Windows Media. However, this method cannot recognize information that has been transformed erratically or modified, requiring a new key to be used.
I would like to propose a new method of video-stream analysis and storage that will not only detect and handle dynamic change, but additionally handle frame-based sequences as well as infinite-resolution vector streams.
Reference Indexing
First, reference indices must be planted into the video stream in order to accurately model the changes in the stream. Robust references are found using a variety of pattern-search systems and characteristics to ensure that the reference can be found even after a transform or overlap.
For example, a polygonal color or grade type of class may not be suited for animation or generated media because of the possibility for transparent overlaps, causing a discontinuity. However, a darkness-based reference, such as “Light, Dark, Light, and then Dark” may not be present in live-action environments or may be caused by ambiguous shadows.
Reference Linking and Vectoring
After reference indices have been found for frames at various periods in the stream, reference characteristics are sent through a table to classify them into profiles and classes. Ambiguous references are grouped and unique reference points are linked for vector creation.
The unique and grouped reference points are compared to other frames, and reference positions and dimensions are compared to determine if and how each reference point is being transformed and moved. Directional (location) and characteristic gradients are extrapolated and refined using references found in other frames.
Analysis and Flow Modeling
The references are grouped into shapes and planes, and the stream in sent through various “transformation extrapolators” to determine the manner in which each image was altered. For a live-action film, a 3-dimension transform may be applied to objects to simulate a view change, with a few transparent/tint delta fields for a flame effect or such. For animation, a polygonal analysis could be used to separate each object, with skew lines used to modify the shapes and shaped lighting compensation to equalize the shadowing effects.
In effect, what we are attempting to do is decompile the original source of the media. By first indexing references and linking them, we are attempting to separate each item into objects or layers, instead of a large image. By sending the image through vector-dimensional transforms, we are attempting to determine some of the information in the 3-dimensional objects on which the stream is based. With a analysis system based on the theory of rubber-sheet topology, animations can be isolated to the very cel.
[quote]
