I've listened to it several times and tried to map layers without a computer algorithm to help - on metric-ruled graph Pulp Pilot.
Lyrics are on a seperate layer and notes added about their significance. Possible insertion points are noted...

Video: Vision of Escaflowne, Escaflowne the Movie: A Girl in Gaea
Notes are taken on the video... they're all LAME and GENERIC, just like ME. Ok, so we've got some transition and neutral effects, some movement, some battle, some with mecha, the occasional character expressions, conversations, etc...
Idea:
Like a hybrid of "Memoried Dance" and "Blue Mercury".
Combine the idea of using linear blocks of scene classes and distributed theme segments to make a single verse and storyline synched event.
What I Need To Do:
I'm going to practically transcribe the video, in consideration of avoiding cheesy showcases (ala grouping all the mecha together, etc).
Challenge:
Transition in/from the scene classes without overusing digital effects...
I got an idea from clips in "Odorikuruu" (2m57s-3m0s) and "Memories Dance" (a lot, specifically 2m48s-2m52s-2m55s) to group camera pan and/or character action movement. Using back-to-back neutral scenes and themes (10 full seconds of movement) is not impressive!