OT: Hypothetical video codec

danielwang · Post by **danielwang** » Wed Aug 06, 2003 2:20 am

I've been looking over a video analysis method I wrote a year ago, and I think you might find it interesting. It is old, and chock full of techno-buzzwords ("analytic blah blah") so I'll just sum it up for you.

I'm considering an analysis system that basically stores information in unbuilt form. It finds reference images and transforms it to a big "vector"-style map. It then uses some fancy stuff ("flow modeling") to guess how those points are moving, then it tries to guess the 3-d object and motion that made the 2-d stream of frames. In the case of anime, the cels and motions.

The decoding is done with commonly availible algorithms, and may be assisted by a human operator. But for those anime companies doing their production electronically, might as well have them put out their source and compile it realtime.
In true Linux style - your DVD player would have to render "RahXephon 17, Source Code" yourself before you can see it!

This is on mechacker.com, and for those who don't like .doc files...

[quote="Daniel Wang"]
Stream Analysis through Flow Modeling
Daniel Wang

When faced with the task of developing an effective method of analyzing and compressing video, one must consider the properties of the media and task involved. The video is simply a recording of an environment and objects, with some effects. Even the media of animation is principally the same – projecting a 3-dimensional field in a series of 2-dimensional views. The extrapolated patterns in the media are ultimately what make analysis and compression possible.

Present-day analysis technologies use a “dictionary”, or mathematically-generated table, for storing information that is repeated across a frame or frames. With conventional media, this method works excellently, as seen with DivX and Windows Media. However, this method cannot recognize information that has been transformed erratically or modified, requiring a new key to be used.

I would like to propose a new method of video-stream analysis and storage that will not only detect and handle dynamic change, but additionally handle frame-based sequences as well as infinite-resolution vector streams.

Reference Indexing

First, reference indices must be planted into the video stream in order to accurately model the changes in the stream. Robust references are found using a variety of pattern-search systems and characteristics to ensure that the reference can be found even after a transform or overlap.

For example, a polygonal color or grade type of class may not be suited for animation or generated media because of the possibility for transparent overlaps, causing a discontinuity. However, a darkness-based reference, such as “Light, Dark, Light, and then Dark” may not be present in live-action environments or may be caused by ambiguous shadows.

Reference Linking and Vectoring

After reference indices have been found for frames at various periods in the stream, reference characteristics are sent through a table to classify them into profiles and classes. Ambiguous references are grouped and unique reference points are linked for vector creation.

The unique and grouped reference points are compared to other frames, and reference positions and dimensions are compared to determine if and how each reference point is being transformed and moved. Directional (location) and characteristic gradients are extrapolated and refined using references found in other frames.

Analysis and Flow Modeling

The references are grouped into shapes and planes, and the stream in sent through various “transformation extrapolators” to determine the manner in which each image was altered. For a live-action film, a 3-dimension transform may be applied to objects to simulate a view change, with a few transparent/tint delta fields for a flame effect or such. For animation, a polygonal analysis could be used to separate each object, with skew lines used to modify the shapes and shaped lighting compensation to equalize the shadowing effects.

In effect, what we are attempting to do is decompile the original source of the media. By first indexing references and linking them, we are attempting to separate each item into objects or layers, instead of a large image. By sending the image through vector-dimensional transforms, we are attempting to determine some of the information in the 3-dimensional objects on which the stream is based. With a analysis system based on the theory of rubber-sheet topology, animations can be isolated to the very cel.
[quote]

klinky · Post by **klinky** » Wed Aug 06, 2003 5:31 am

Would be a pretty ugly video codec

I don't think vector or object compression is going to take off. Nor do I think companies want to give out their "video source code" for everyone to see how they made their animation.

The problem with vector or object compression is that there are things which just can't be made into 2d or 3d vectors properly. 3D animation rendered in 2D and composited onto a 3d surface with some gradient masks. Say the 3D obeject glows to and the layers are constantly moving around. The object that would be found would be the entire frame. There would be no single object that could be saved from that scene and replicated later to save bandwidth.

I've been thinking of a 2D vector based codec for anime since there are large areas of solid color. However, as more and more 3D & computer aided animation is produced that codec becomes less useful. Somethings just could not be vectorized and would need to be stored and compressed as bitmaps(much like how flash does it's stuff today). The results would not be that great to look at since vectors are limitless in their scalability, but bitmaps are not. Also smoothing edges where bitmaps meet vectors would be a issue.

Then you have the problem with playback speed and content creators rights. All in all it would not be viable.

Creators I am sure are more than happy currently with DVD and will probably be just as happy when HD DVDs start to appear. I don't see anything really replacing current compression techniques..

Tab. · Post by **Tab.** » Wed Aug 06, 2003 7:38 am

HEXVID
hexagons bitches
make it. NOW.

koronoru · Post by **koronoru** » Wed Aug 06, 2003 10:06 am

This sounds a whole lot like Flash, and I think it would likely have many of the same advantages and disadvantages of Flash.

Zarxrax · Post by **Zarxrax** » Wed Aug 06, 2003 10:38 am

I've already heard of a codec like this... long time ago :\

danielwang · Post by **danielwang** » Wed Aug 06, 2003 1:49 pm

koronoru wrote:This sounds a whole lot like Flash, and I think it would likely have many of the same advantages and disadvantages of Flash.

Zarxrax wrote:I've already heard of a codec like this... long time ago :\

Flash is the production tool, format and compiler, in one system. The codec you are speaking of most likely utilizes region-based transform matching, which is very effective as well when paired with a wavelet transform... but it splits the video up in parts and looks for similar objects.

This one should be feasible for high-action videos... however there is scalability problems I agree. But when you are doing live action, most of the information is recorded an analogue filmtape or high res digital, and most of that info is discarded. By storing the peak amount of information needed to regenerate the imageset, you can save space...

Here's an example:

Our camera is zooking in on an box while doing a cheated 360degree turn aroung said box. To regenerate the sides of the box, you can simply store each side when it is closest to the camera, then do a 3d vector transform to make it look smaller.

The top of the box looks like a square when it is oriented isometrically, but when you pan down toward another side, the top of the box be in a zone shaped like a rectangle or other polygon. By squishing the sides of the image, you can recreate sideview image.

Cheers,
Daniel Wang

Tab. · Post by **Tab.** » Wed Aug 06, 2003 1:51 pm

stop hacking my mind

post-it · Post by **post-it** » Wed Aug 06, 2003 7:45 pm

? how long ago was this again ??

this almost sounds like the argument that took place between Amiga's Animation Studio and the Motion Pictures Group over the use and possible implimentation of 3D-FX Models becoming Auto-Downloadable Designed clip for the scientic community via AutoDesk's Motion Studio and Model Player

. the theory was . . you could have as many Models and scene's as you wished . . the Motion Studio would then re-create Battle Scene's, Dancing, instructions Blah-Blah in Full Screen Mode without the Large Video Files of that day . . a script was also encoded on how the backgrounds, set objects and Actors were to Move and at which angle they would be sceen at when played-back within the file.

T_T come to think of it, shows like ReBoot nad other head-aches started to be produced about a year after that Studio Program was released by AutoDesk ~_~

this is the only referance that comes to mind via your discription - to me.

zalas · Post by **zalas** » Wed Aug 06, 2003 10:29 pm

Well, having an object descriptive video encoding *may* help compression ratios, but it will *kill* processing time. Consider the typical number of effects animation companies layer in _1_ scene... It's gonna be a few years before we can even render that effect in real time, and by then, they'd just stick more on.

SS5_Majin_Bebi · Post by **SS5_Majin_Bebi** » Thu Aug 07, 2003 2:26 am

Tab. wrote:HEXVID
hexagons bitches
make it. NOW.

Wooh, sexy six-sided macroblocking goodness.

/end