I went to AWA last year, and it was the first time I'd been to a con in like six or seven years. In fact, it was my first meaningful interaction with people from this community in at least three or four years. It was awesome, and as a result I was sucked back into AMVs, a hobby I had been all but neglecting for a long, long time. It re-sparked a passion I've had ever since I discovered this community back in 2006.
On the drive back from Atlanta to Chicago, I had a lot of time to absorb the previous weekend, and to start thinking about getting seriously involved in the hobby again. An idea planted itself in my mind. Basically, I've always been fascinated by AMVs, not only as a form of entertainment and creative expression, but also as a fandom. I've always wished that some sort of substantial, in-depth analysis could be done on AMVs to track the fandom's development over the years. Within this community, I've noticed that assumptions are made and attribution is credited to certain videos and editors for initiating trends or being the "first" to do something, however there's never been any hard data to back those claims up.
This, among other thoughts, sparked this idea to actually sit down and start recording data into a spreadsheet that could later be used to analyze and eventually get a solid, quantitatively sound study which objectively shows us trends and characteristics of how AMVs have developed in the past 14+ years.
I didn't do anything right away, because I realized the amount of work that would be involved to get anything approaching a useful pool of data entered into a spreadsheet. In fact that alone kept me from doing anything for a loooong time. I had also presented this idea to another .org member, who just kinda shrugged it off and said that not many people would be interested. This doesn't bother me much; this is something I would want to do for my own purposes anyway, as a way to help preserve the AMV fandom if and when this place disappears. Whether other people find it interesting is of little motivation to me.
A couple months ago, though, I decided to just go for it. If it leads nowhere, that's fine. If I drop it and decide not to pursue it anymore, that's fine too. But I've been working on this regularly for the past two months which is unusual for me...I typically get bored and give up on these kinds of projects after a few weeks. This tells me that this is something that I'm interested enough in to continue, at least for the foreseeable future. Which brings me to today.
So...what exactly are you rambling on about now?
If this has piqued your interest, there are a couple things you will need to download before we go any further. Nothing below here will make sense unless you have the following:
(1) This ZIP file which contains all the pertinent documents at this point. The main one I'll be focusing on is the spreadsheet entitled "AMV list.xls".
(2) Apache OpenOffice 4. I've used OpenOffice Calc to compose this spreadsheet, and as a result the formula syntax is all OpenOffice-based. I'm not sure if this will carry over into current versions of Excel (it definitely won't work in old versions of Excel prior to 2007, as I make heavy use of the COUNTIFS() function which is not available in Excel prior to the 2007 version). For best compatibility, I recommend just installing OpenOffice.
What I have to present to you at this point is a spreadsheet with a small number of videos entered -- around 250 -- which records a number of different types of data about each video. Many of the columns should be pretty self-explanatory, but a few probably need some explaining. Allow me to do so below:
- My rating: The decision to actually start doing this stemmed from my love of lists, and my love of rating things. I had originally decided to just start making a list of all the videos I have and rating them so that I had a kind of "Table of Contents" somewhere, but I quickly decided to turn this into the project I had come up with almost a year ago. The "My rating" column shows my personal rating (out of 10, gives me a little more room to work with and I only like to go in steps of .5). This is basically for me, probably will not contribute much to the final analysis except as a footnote.
- Star rating: With donator status I'm finally able to see the star rating for each video I download, which has been great. Everyone knows what the star rating is so I don't really need to explain it, but I'm adding it here just to make a note that the star rating is ONLY entered if a video has at least 100 stars, as before that the rating can change too much with even a few additional ratings. I do exhibit some leeway on this; if a video is old and has close to 100 (~90 or so) star ratings, it's a safe bet that nobody else will be downloading said video in the near future given the extremely lowered .org activity of late, so in those cases I'll go ahead and add the rating to the list. For all videos that have less than 90-100 star ratings or are not available for Local download on the .org (including YouTube-only videos), a "0" is entered here.
- AMV genre(s): For this column I'm just using the default genres provided in video profiles on the .org, with a couple exceptions: (1) "Dance" videos are those that actually contain dancing with the purpose of making the viewer want to dance, NOT videos that just use "dance" music. (2) The "Instrumental" genre is not included, but is represented in a more appropriate place in the spreadsheet ("General tags", see below). (3) I've added in "FX" as a genre to make it easier to identify those videos which use effects. Also, I have been entering genres according to my own standards, so a video's listed genres in this spreadsheet may not exactly match up with what is shown on the .org video profile.
- General tags: Inspired by the tag system over at AniDB, this is possibly the most useful part of the list in terms of future data analysis. The idea behind this is to, as objectively as possible, list out the characteristics of each and every video that I enter. This runs the gamut from the dominant type of sync used in a video to the feelings it is meant to evoke in the viewer. Now, admittedly, many of these are subjective in terms of how a given viewer will react to or interpret a certain video, but I've found that in most cases it's been pretty easy to discern the intent. So, even if a video didn't make me feel particularly sad, for example, I could tell that that's what the editor was going for. So I add that tag.
To keep it structured and consistent, there's a set list of 56 tags that can be added to any given video. After watching a video, I'll pull up the "Tags.txt" text document and go down it, adding each tag I feel applies. As I've gone through about 250 videos now, I've added a few tags to the list here and there, but I think the list is by now pretty robust and covers most important aspects of a video. Definitions of each and every tag can be found in the "Tag definitions.doc" document included in the RAR file posted above.
- FX tags: This column lists out the various kinds of effects that can be found in a video, assuming the video uses effects. This is a very...inconsistent and probably incomplete list. I do not have standardized definitions for each tag like I do with the General Tags. There are a couple reasons for this; first of all, effects are incredibly diverse, and I'm coming across new types of effects all the time. It's difficult to make a definitive list of effects because multiple effects can be combined to create new effects and it gets really difficult to track which effects are used and where. Also, sometimes it's very difficult to determine when effects are being used at all.
It's also hard to define the line between what constitutes an effect, and what does not. I've done what I can so far but I have ended up grouping certain things under a single tag; "color manipulation", for example, refers to any kind of editor-generated use of color that takes place in a video. This could be in the form of solid-color vector masks, to changing a scene to black-and-white, to using an invert filter. "Boxes" is another tag that comes up often, and simply refers to any sort of geometric overlay, regardless of shape. There are other such "umbrella" FX tags throughout the list.
As a result, I don't know how useful this column will be for any kind of meaninful analysis down the line. It's really difficult to keep track of each and every effect in a video, so this may be only useful for getting a general feel for what to expect in a certain video, rather than an objective listing that can be analyzed. If anyone has any ideas for making this better or more consistent, I'm all ears.
Okay that's nice, but what's the purpose of all this?
Well, there are a couple reasons I'm collecting all this data. Most immediately, this will serve as a personal organizational tool for myself. I like the idea of having a massive list of AMVs with a bunch of different ways to sort and find videos based on whatever criteria I may be looking for at a given time. Also, through doing this, if I'm thorough I will end up watching a lot of videos by unknown editors. I hope to find a lot of great hidden gems over time as I do this.
But none of that really concerns you, dear reader, and probably only interests you insofar as you can do this yourself with the completed list, if it ever gets to a point of "completion". The bigger, farther-reaching reason I am doing this is, as has already been implied, to create a comprehensive AMV database that is more detail-oriented than what the .org already provides. Obviously, for the AMV fan, this could turn out to be an extremely useful source of information, not just for searching for and finding specific types of AMVs, but also for data analysis.
That's where I get most excited. My plan is to eventually get to a point where I have enough useful data in here to start setting different datasets next to each other, analyzing them, and hopefully identifying trends and such in order to give a more objective foundation to the history and devlopment of this fandom.
Moreover, from here on out this data is going to be accessible to everyone. I will be updating the public spreadsheet I've posted above on a regular basis so that anyone interested can track the progress. Eventually, if this comes to full fruition, this data could be used by any other armchair AMV historians for their own purposes. I may be getting ahead of myself there, though. (I have no illusions about how few people might find this interesting...but this is for those few!)
Awesome! This sounds great so far, and it looks like you've already started organizing the data! Care to explain?
Those of you who have stuck with me so far and who have taken a peek at the spreadsheet may have noticed that there are a few different tabs on the bottom, each containing different types of data. Allow me to explain each one.
- List: This worksheet is the meat and potatoes of the database, and so far probably the most interesting one. This contains all the data entered on each video, from which all future analysis will be drawn.
- Refined data: This is the first of what will probably end up being multiple similar worksheets where the organization of data drawn from the list will be collected and maintained. Currently I've already provided a couple examples of what some of the data I'm interested in could look like. What's there right now should be pretty self-explanatory, but in case it isn't, I have two charts which show how my ratings and the star ratings are spread by year. I also have a small chart showing the genre distribution by actual number and percentage of the total entered into the list. There's also a small chart showing the breakdown of videos that make use of effects vs. those that do not. Since it's still very early in the process we can't really draw any conclusions yet, and because of the small amount of data I have and the inconsistent spread across the years, the numbers aren't very interesting or revealing.
- Tags: Here is a list of the tags, along with some corresponding numbers showing how many times a tag has been used, along with the percentage of the total number of videos on which a given tag appears. This actually might be interesting to some of you at this point, but again, still early.
- Data references: This is just a kind of data dump for any time I need a formula in a different worksheet to reference an unchanging set of numbers or dates in order to process the data I want. This is probably not going to be of much interest to most of you, unless you're particularly interested in the workings of the various formulas I'm using in the "Refined data" worksheet.
Currently, visual representations of the data are not useful or enlightening because the numbers are so low and I do not have anything close to a representative sample of AMVs entered into the database. Eventually, though, there will be graphs. Many graphs. In fact I hope to do a formal and technical write-up of my findings, eventually. I'm talking years down the road, most likely.
Wait...so how long will this take?
Realistically, years. Or, at least a year or so. This is going to be an ongoing project, to be continued indefinitely, or at least until I get bored and no longer feel like doing it. But I want to at least get enough videos entered where I have a large enough source of data to start arriving at reliable conclusions. Problem is, because we're looking at 14+ years of videos, I need to make sure I have enough videos from each year to provide a substantial base off of which to work. I'm thinking 100 or so from each year might be a good starting point (emphasis on "starting point", as I realize just how small a number of videos that is for certain years). Entering 250 videos took me a couple months...getting to 1,400+ is about 12 months of work at the pace I'm going, and it's not even definite that I can maintain that pace.
Sooo, yeah, between my current work schedule and other responsibilities, it's going to take a while.
Is there anything I can do to help?
Yes! Several things, actually:
- Tell me if this interests you in the least! Even if the answer is "No", feedback of any kind is appreciated! I'm doing this pretty much entirely for myself, because it's something I'm super passionate about and I want to see completed, but my motivation will increase, even if just a little, if I know others want to see it too. So let me know!
- Let me know if there's something else I should add in to the "General tags" section. Like I said, I feel like it's pretty comprehensive, but there might be something I'm missing. If there's a tag you'd like to see quantified, now's the time to tell me! At 250 videos, it's still doable to go back and amend for anything missing. Past that, it's going to be too much.
- Similarly, if there's anything else you'd like to see quantified or tracked, let me know. I can't make any guarantees, but I will take anything into consideration. I can't think of much else that I could add that's not there already, but that's why I'm bringing it to you.
- Let me know if you'd be interested in contributing yourself! This is still something I'm not 100% on -- I have many reasons for not wanting to open this project up to public modification, all of which have to do with keeping the data as pure and consistent as possible, but if there are a few people out there who might want to seriously contribute (by which I mean add in videos, video information and tags to help me out), message me and I'll consider it. I really am not sure about it though so again, no guarantees.
- Let me know if there are any particular things you want analyzed when all is said and done. I have my own ideas of things I want to check out, but if there are any particular trends you want explored, or any types of videos you want compared against other types of videos, or anything else you want examined in a quantitative way, tell me. This will help me build charts and tweak data entry if I need to in order to get the most accurate results.
- If there's anything I'm doing wrong that you can see, please tell me. This is the first time I've ever attempted anything like this, and I am by no means a statistician. I'm kind of doing this blind, so to speak. If there's anything you see that might indicate my data collection is sloppy or inaccurate, let me know and tell me how to fix it! I want this to be as accurate as possible.
Anything else I should know?
There are a few things I need to head off right from the start. First, don't take the provided example data as representative of anything. If you look at the videos I've entered now, you'll see a number of oddities, such as there being a comparitively massive amount of videos entered from 2014 and only a few from 2012. This will not even out for a while, but over time, ideally, you'll begin to see a more even distribution across all the years.
Similarly, I've noticed that the star ratings are skewed high -- much closer to 4.0 than I was expecting. I'm wondering already if this is a result of people overestimating how much they like a video when they rate it, or if it's just because of the "7/10 is average" mentality that people seem to have these days. More than likely, it's because I think I've been purposely avoiding downloading videos with lower star ratings. I'm going to have to make sure I download some crap in order to get an honest sample, so, y'know, I've got that to look forward to...
Also, one of the most obvious pre-quantified pieces of information from the .org has not been included here: opinions. To be honest, I have no idea how I would integrate opinions into this database. Where star ratings only depend on one number to give you information, opinion scores really rely on two: the score given, and the number of opinions given. If a video is given 10s across the board, but only through one opinion, how useful is that information really? It doesn't tell me anything in terms of what a large number of people thought about that video, which is all I'm really interested in here. Also, opinions are much more naturally left when someone likes a video. You'll have very few people leave a negative op on a video they don't like, unless it's from an op exchange. Also, really, really popular videos can have...oh, let's say 100+ opinions. Even that is rare. Let's take AbsoluteDestiny's Do It Right (Shake It!), which has 142 opinions...compared to the 15,000+ star ratings it has. In every case, the number of star ratings far overshadows the ops, and so provides a much better and more accurate picture of what people think about a given video.
(As a side note, I know over the years people have complained about the necessity of leaving star ratings for every video they download...at this point I'm very grateful for that though, it's provided a lot of great raw data to work with).
One final note on the star ratings -- the data obtained through star ratings will be useful until about, oh, I'm guessing around 2010, 2011. In many cases, newer videos are not going to have a star rating entered at all, simply because not enough people are still around and downloading videos. It's unfortunate, but I do expect a lot of juicy stuff to come out of the mid-2000s specifically when it comes to analyzing star ratings.
I've tweaked the "Year" column so that the numbers entered into that column must be in a MM/YYYY format, however it only displays the year on the list because in general that's all I care about. I could have made it be the full DD/MM/YYYY, however I don't think that the resolution for this particular piece of information needs to be that fine. Knowing the month/year a video was released should be more than sufficient for any kind of analysis.
In the "Anime" column, if there are more than four anime listed, "Various" is entered. If there are four or less anime used in a single video, the different anime are separated by "//". If the "anime" entered is in fact a video game, a "(VG)" will be appended to its title.
I've done my best so far at classifying the music into genres, although it's admittedly probably not as accurate as it could be in some cases. If you want a good list of musical genres and their descriptions, I would point you to this page on RateYourMusic.
Also, it might seem like there's not a whole lot of information being provided per video that hasn't already been provided on the .org, and that's partly true. I think, in general, most meaningful analysis is going to come from the tags provided for each video, and those contain a LOT of information about each video that is not explicitly on the .org. Beyond that, compiling the data into a spreadsheet makes comparisons and data analysis possible on a large and extremely customizable scale, which the .org is not currently capable of. Even if all I did was enter the info that's on the .org into the spreadsheet, it would be enough to provide a lot of pretty in-depth analysis.
If you ever want access to the spreadsheet and other documents, click the link beneath the new, pretty picture in my signature (the picture itself will take you to this post in this thread, so anyone who wants to know what it's about can reference the information provided here). It will take you right to the directory where all the documents can be accessed. For now, it will just lead to the bare FTP directory. Eventually, if I feel like it (and I do right now, just don't have the time), perhaps I will start up a Tumblr or other blog in order to track progress and provide observations, etc, that I come across as I'm working on this. But for now, you can always access the most up-to-date version of the spreadsheet by clicking the link in my signature. In that directory will be a .txt file as well with a date as a title; this will just serve to show when the spreadsheet was last updated. I plan to upload a new copy of the spreadsheet at least once a week, so check back if you want to watch the list grow! And if you're ever just looking for new videos, see what's in there! I don't know if your tastes necessarily line up with mine, so going by my ratings might not be helpful. But that's half of the reason the tags are there!
And yes, there is a lot about this that is imperfect...such is the nature of the beast, especially when (at least as of now) this is all being done by one person. There are going to be tags put on certain videos that may not be agreeable to every person. I ask that you bear with me, correct me if you see anything blatantly wrong, and accept any minor issues you may see. Hopefully they aren't prevalent enough to actually have a noticeable affect on the data.
Finally, if you want to play around with the spreadsheet in-depth but you don't know a lot about how OpenOffice Calc works, here are a few helpful hints:
- On the "List" worksheet, go to Data > Filter > AutoFilter. This will put a little drop-down menu in the corner of each of the headers, and when you click those you can filter out certain videos. For example, let's say you wanted to only see romance videos. You would go to the "AMV genre(s)" header, click the arrow, go to "Standard Filter", change the Condition field to "Contains", and then type "romance" into the Value field. Voila, now you see all romance videos on the list! You can mess around with these settings in each of the columns to filter the list down to super-specific videos. You can find all drama videos that use trance music from 2000-2003 that have lip sync in them, if you wanted. It's a very powerful tool for sorting and finding videos. To remove any custom-added filters, just go to each header that you're using as a filter (there'll be a little dot next to the drop-down arrow to mark those that are being used) and select "All".
- You can sort the way the list appears very simply. Just click on any cell in the List worksheet, go to Data > Sort, and OpenOffice will automatically select the entire list. You can now sort the data by any of the columns. By default, I sort them first by Editor, then by Video Title, but you can change this up however you want.
- If you don't know how formulas work in Excel-like programs, I wouldn't mess with anything on any of the other worksheets. All the numbers in the "Refined data" and "Tags" worksheets are dynamically generated based off of what is entered on the "List" worksheet, as well as the static numbers in the "Data references" worksheet. This means that as more data is entered on List, the numbers in the other worksheets (excluding "Data references") update accordingly. If you change anything on those worksheets, you risk messing everything up.
- That's all I can think of right now...but if you have any questions on how other features in OpenOffice work, ask! I'm still learning myself so I may refer you to Google
You seem to be taking this too seriously.
Maybe! But I love AMVs and I love this community and I love this fandom in general, and I want to contribute something that may someday be useful or memorable to someone who loves these things as much as I do. Besides...odd as it may sound, this is actually a lot of fun for me
If you made it to the end of this, congratulations and thank you! I appreciate your interest. If you have questions, comments, observations, whatever, please post away!