Hahah, oh boy, you're in for a ride.
This source is truly fucked up, interlace ain't the problem per se.
As it stands, it seems that your source was resized probably multiple times, before and after it went through a 3:2 pulldown.
Load the sample, throw a bob() so each field becomes a frame for easier viewing and then go to frame 420 so you can follow.
Frame 420: eyes completely open.
Frame 421: blend between open and closed.
Frame 422-426: closed.
Frame 427: blend between open and closed.
Frame 428: closed.
Frame 429: open.
It's not only the animation. If you try to watch it in playback, you'll perceive a slight "back and forth" in the zoom too, instead of it being smooth. This looks like when you get the field order wrong. But boy, that ain't it, because if you try to assumetff() (this sample is bff) you'll see that the jerky motion gets worse. This is likely because they progressively resized the source after they did the pulldown, which caused a blending which got weighted in a bad way between fields, so you'll see fields out of pattern, possibly because they got the field order wrong themselves somewhere along the line between the resizes.
This is how I assume things went: before they did a pulldown, they probably vertically upscaled the source with something like nearest neighbour. This is what could have introduced the weird aliased/combed look you see on all frames, and you'll notice it's only vertical, not horizontal. After this upscale, there was a 3:2 pulldown, and after this there was a dowsncale with something like bicubic resize back to the destination resolution: this introduced the blending since it was done progressively on an interlaced source, and introduced the jerky look since fields which souldn't be seen yet were blended before their supposed screen time.
Anyway the TL;DR is that this is pretty much beyond saving. You can see what the pulldown pattern is supposed to be but you CAN'T match the fields nor you can do a proper bob because blending introduced a slight jerky motion.
What I suggest you do is first and foremost use QTGMC. This will leave you with a nicely bobbed source, and QTGMC's postprocessing also deals with the aliased/combed look introduced by the bad resize.
After this, you'll need to decide whether to keep the odd or even fields: keeping both will retain the jerky motion, keeping only one will at least keep the motion smooth. Judging from the sample, a selecteven() ends up with overall cleaner fields (the blends are rarer than with selecteodd). After you did this, you will have to decide scene by scene whether to keep it 29,97 or if attempting a decimation would be good: keep in mind you can't trust automatic decimation, but you'll have to specify a pattern.
I'll give you an example. With this sample at hand, after you do
go to frame 400.
You'll see frame 400, 401, and 402 are unique. Then when you reach frame 403 you'll se it's extremely similar to frame 402, albeit not identical. The slight difference is caused by QTGMC's processing (some temporal filters+the antialiasing), but you'll realize that the motion difference between 402 and 403 is lower than between 402 and 401 as well as between 401 and 400. If from frame 403 you go to frame 404 you'll see once again the correct motion amount. This means you could drop a frame between 402 and 403 to restore the original 23,976 pan. Generally speaking, you'll want to drop the first dup and keep the latter one (so drop 402 and keep 403) ─ this is suggested because the second dup is the progressive one and has more reference and bitrate in normal situations (in a ccnnc matching pattern you'd do kkkdk, dropping the second n match since it's identical to the following c match), but this situation is abnormal, so just do what you think is better case by case.
Since it's AMVs we're talking about and you likely won't have to use every single scene and frame, I suggest to keep only the qtgmc in the avs, then in your NLE manually cut out the frames which will fuck up the motion. Ultimately since you're editing you have a lot of freedom on the source, which is a blessing since you can freely decide to drop and keep frames in order to achieve a clean and smooth result. Basically while editing you'll have to do an extra effort and manually keep the good frames while ensuring the motion is smooth in playback, but this will allow the final AMV to be completely clean, free of blended frames and smooth, which is something you wouldn't be able to do if you had to keep the source length and all. If this sounds like TOO much effort, then the next best thing is making two videos, one with selectodd and the other with selecteven (both after the qtgmc, obviously), and then scene by scene pick the stream which is cleanest: in case you see that there are obvious dup frames which make the motion jerky, you can limit yourself to editing just those out in the nle, but at least part of the job will already be done.
I know it's not a perfect solution, but considering what you have at hand, this is the best workflow for your AMV, I think.