Examination of removing opinion scoring

Locked
User avatar
Kai Stromler
Joined: Fri Jul 12, 2002 9:35 am
Location: back in the USSA
Org Profile

Examination of removing opinion scoring

Post by Kai Stromler » Mon Feb 21, 2005 2:08 pm

Proposition: that under the current conditions of a-m-v.org, opinion scoring is now superfluous and introduces an exploit vulnerability to the site's endorsement of good material via the top-10% list.

First:
Opinion scoring is broken.
Except for "reviewability", the global average score in every caption is between 8 and 9. At the time of this writing, "reviewability" was averaging 7.8. By the intention of those who established the system, the average video on a-m-v.org should receive straight 5s. Reasons why this is not so may be described as follows:
1: Rising tide effect - The baseline for 5 as the average is the "average" video of December 2000, the time of the site's establishment. The art and craft of AMV has progressed since four years ago, much as records are constantly being set and broken in elite sports. If the baseline floats with the contemporary median, then old videos which scored high are valued over equally good videos created at a later date, when the median benchmark line has moved up. If not, the result is rating inflation as has happened, which rapidly pushes all scores towards 10 and meaninglessness, as no separation is created.
2: Poor understanding of opinion score values - It has been speculated that the general public understands opinion scores as parallel to the A-B-C mappings of percentile grades currently in use in most American educational institutions. This would equate to the current range of averages as mapping to the B-/B/B+ range, which under the current level of grade inflation is widely perceived to be "average" (in more rigorous academic programs, "average" is still recognized as 75% or 2.5/4.0, which is still not a correct mapping for the opinion system).
3: Differential opinionating - Because the opinion process is arduous and complex in comparison to not rating the video or giving it a star rating, opinions in most cases are only given when the opinionator has a powerful motivation to do so. In most cases, this is because the video is -- or is immediately percieved to be by the observer -- extremely good, resulting in all-tens opinions. These float the average upwards and contribute to the prior cases. In any case, if only videos that in a non-broken system would average 8 in all captions are reviewed, within the standard deviation, this does not equate to an average across all videos listed.

Second:
Opinion scoring is redundant as a selection measure for the "best" videos
This can be covered under several subpoints:
1: Opinion scoring does not cover as well as the star scale
The "Top 10%" list covers 282 items. This indicates that between 2820 and 2830 videos meet its selection criteria. By contrast, the star-scale admits nearly 25,900 videos under its criteria. Leaving aside that this is actually more than the number of videos currently in the database and some have surely been deleted for inadmissibility, it is readily seen that it is much more likely for a video to achieve 5 star ratings than to achieve seven opinions, and if the goal of opinion scoring is to establish a total order on the database, the star scale is preferable.
2: The Recommendations forum allows for stronger filtering of site-endorsed good videos
If the "Top 10%" or top star scale lists were immune from fandom-based overrating (artificially high ratings based more on source materials than on final product), this forum would not be necessary. However, mathematical gaming and other forms of self-promotion have apparently become so prevalent that this forum is advanced as a countermeasure to bring considered thought and human reactions back into the equation. While the recommendations are necessarily from a smaller sample and to a smaller sample (forum-goers to forum-goers), the oversight of the site's primary admin gives the forum a feeling of official endorsement on par with the canonical lists on the main site.
3: Every year the site's users vote on the best/most popular videos made in the last 12 months
The VCAs fill the gap between the pure-popularism and total-ordering of the star scale and the high standards of the Recommendations forum. Voting is done in multiple rounds to select the best (or most popular) video per caption per year, but there is plenty of recognition for nominees, a much larger population of videos, as well. There are many problems with the current VCA process, but the general consensus appears to be that it is less broken than the current opinion scoring system.

Third:
Opinion scoring is vulnerable to sockpuppet attacks
It is trivial to make one bogus sockpuppet account on a-m-v.org; it is only slightly less trivial to make ten and attack the "Top 10%" list with a marginal video. Because of the barriers to opinion scoring mentioned above, such a video will remain in some position on the list until it is similarly attacked by opinions given solely to remove it from its place. Once in the top 10%, the video, if made with source materials that induce fandom-based overrating, will also attract legitimate opinions from non-sockpuppet users whose comments and ratings may be difficult to separate from those entered by the video creator under false names and accounts. The ease with which star ratings are given makes this sort of gamesmanship very difficult under that system; it is not merely ten accounts but ten accounts per day that are required to maintain a video at a score level that it does not deserve.

Advantages of removing opinion scoring:
* near-removal of scores-based gamesmanship
* obviation of the rising-tide problem
* removal of redundancy in favor of existing superior methods
* re-imaging of the opinion as a venue for critique and comment outside mathematical systems

Disadvantages:
* major changes to main-site code
* major refactoring cost to the database
* loss of scoring to half the videos in the database

Proposed amelioration:
Allow the user to star-rate the video in place of the current opinion scoring. If the video is not locally hosted, track the "star rating" after the manner of current opinion scores and display it only on the video information page, not on the overall top star scale.

Costs of amelioration:
* additional code changes and refactoring costs
* development of plan to deal with legacy scores
* change in userbase culture to adapt to new opinionation system

Commentary:
open; fire away.



Though this has a slim chance of being accepted (see disadvantages above for a rough idea of how major a change this would be), with the recent concerns over people gaming the top 10% somebody had to at least float the idea.

--K
Shin Hatsubai is a Premiere-free studio. Insomni-Ack is habitually worthless.
CHOPWORK - abominations of maceration
skywide, armspread : forward, upward
Coelem - Tenebral Presence single now freely available

User avatar
dwchang
Sad Boy on Site
Joined: Mon Mar 04, 2002 12:22 am
Location: Madison, WI
Contact:
Org Profile

Re: Examination of removing opinion scoring

Post by dwchang » Mon Feb 21, 2005 3:55 pm

I will certainly not attempt to reply to this entire proposition since I'm at work, but I will admit you have some valid points which I'd like to address and to some degree, agree on. I will make it clear that I don't believe removing the scoring system is a good idea, but for different reasons:
Kai Stromler wrote:Opinion scoring is broken.
Agreed, but you more or less explained why in that people tend to only review videos they like and thus the averages are overly-inflated to said values.

This of course gives merit to your idea of the star-scale since viewers are more or less forced to give stars after watching 10 videos. Another positive is that people who just put the same score (or have a very small deviaition) are not factored in and thus cannot hurt people.

At the same time, it does make "attacking" a video pretty easy since it's a lot easier for me to click a 1 then give an opinion. Also stars are anonymous making it that much easier.

I do agree that it's (to some degree) a moot point since there are so many stars for a video and one won't do that much damage.

Kai Stromler wrote:Opinion scoring is redundant as a selection measure for the "best" videos
I agree to some degree. I do see a lot of "great" videos near the top of the star scale and the %'s I imagine are the similar (i.e. top 10% list vs. top 10% in star scale), however a big thing (to me) is that well...I usually don't look beyond the first page on the star-scale and that's only 100 videos.

Yes I realize this is lazy, but assuming many people do this (or even just a few pages), you can see how this list isn't as effective. In fact, I'd be willing to bet not as many people look at the star scale list than top 10% list. Probably by a large margin. Given that could change if we only had one list.

Kai Stromler wrote:Opinion scoring is vulnerable to sockpuppet attacks
It is trivial to make one bogus sockpuppet account on a-m-v.org; it is only slightly less trivial to make ten and attack the "Top 10%" list with a marginal video. Because of the barriers to opinion scoring mentioned above, such a video will remain in some position on the list until it is similarly attacked by opinions given solely to remove it from its place. Once in the top 10%, the video, if made with source materials that induce fandom-based overrating, will also attract legitimate opinions from non-sockpuppet users whose comments and ratings may be difficult to separate from those entered by the video creator under false names and accounts.
100% agree.

In fact, I've seen this so much more in the last year than in the past. Pehraps people will call me "old school," but when the new top 10% list and baysian average was introduced, I agreed with a large % of the list. I may not have liked every video per se, but I could at least say it was well-done and also see why it was on the list. Lately, not so much.

Is it any coincidence that 6 of the top 8 videos "of all time" (as I imagine people see it) heavily use either Naruto or Full Metal Alchemist? I'll be honest in that the majority I don't like and some I find quite marginal (I will however state I *do* like some of them). I am by no means pointing fingers as to any sort of abuse, but I do think a lot of this score inflation is because people see a video using a popular show on the top 10% list, download it and then score it really high (as you outlined quite nicely).

As you said, with opinion abuse, one could easily get their video *on the list* and once on the list, other fans of the show will find the video and will probably be more lenient on the video and I imagine scores will tend to be all 10's or close to it.

I guess I originally found it peculiar since I started seeing so many new videos populate the top 10% list (when in fact, it should be a lot harder since baysian means you'd start at the bottom and *earn* your way to the top little by little), but then I looked at the shows used and well...it made sense given anime fans and tastes lately. And let's be honest, some of the videos even have DivX logos on their corners...:roll:

Perhaps this will sound overly mean, but I am just shocked that Phade originally implemented such a system to "right" the top 10% list (and IMO I felt it did) and yet people still find ways to abuse it and get around it.

At the same time, I will not deny the point that well "that's your opinion and the Naruto/DBZ/other popular anime fans have theirs." Perhaps I may not agree with the list, but obviously hundreds of fans of the show do and all I can honestly do is give my own *honest* opinion telling them what I feel. However, I guess the counter-argument is that this list is supposed to legitimately reward videos that are good and should, as best as possible, not have such flaws like show-favortism, creator-favortism or worse...outright abuse.

I guess that's the main reason I don't agree in that although I see abuse and wrongs in the system, obviously others find it fine and it's not just *my* site and my opinion of videos that matter (no matter how much I disagree with some of the videos).

Also, I actually do still look at the top 10% to find videos occassionally and I certainly have much higher success with it than other methods. However, obviously less and less success as shown by my reply. Also the list does still contain some of the "best videos" (IMO), just at lower ranks (which I imagine correlates to less hits/downloads).

In any case, you certainly bring up some valid points that I too have been thinking and if anything, I hope it will at least get the ball moving towards improvement whether it be removal of the system or yet another iteration of changing the scoring system and specificaly the top 10% list because let's be honest, most of us realize most of these abuses come from people putting way too much faith into the list and letting it define themselves as editors and thus their motivation.

*gets off soapbox*
-Daniel
Newest Video: Through the Years and Far Away aka Sad Girl in Space

User avatar
Kai Stromler
Joined: Fri Jul 12, 2002 9:35 am
Location: back in the USSA
Org Profile

Re: Examination of removing opinion scoring

Post by Kai Stromler » Mon Feb 21, 2005 4:30 pm

dwchang wrote:I will certainly not attempt to reply to this entire proposition since I'm at work, but I will admit you have some valid points which I'd like to address and to some degree, agree on. I will make it clear that I don't believe removing the scoring system is a good idea, but for different reasons:
This is 100% positive; the idea as advanced is pretty damn radical and will require very significant changes to site coding and database content, so anything that can be suggested to fix the system as opposed to throwing it out entirely will be a positive result from kicking over this particular can.
dwchang wrote:
Kai Stromler wrote:Opinion scoring is broken.
Agreed, but you more or less explained why in that people tend to only review videos they like and thus the averages are overly-inflated to said values.

This of course gives merit to your idea of the star-scale since viewers are more or less forced to give stars after watching 10 videos. Another positive is that people who just put the same score (or have a very small deviaition) are not factored in and thus cannot hurt people.

At the same time, it does make "attacking" a video pretty easy since it's a lot easier for me to click a 1 then give an opinion. Also stars are anonymous making it that much easier.

I do agree that it's (to some degree) a moot point since there are so many stars for a video and one won't do that much damage.
What I actually was most concerned about under this caption was the rising-tide effect, which I hadn't really thought about prior to articulating the ideas that went into the original post (I was on hold for a long time trying to get travel arrangements through to install some equipment at y'alls's Dresden fab). I don't know accurately how big a factor this is, but it could be an element in the surge of new videos that you noted. A lot of the users of this site, lets face it, are not operating with a great sense of historical perspective. They may have difficulty seeing a video that was made when they were in grammar school as great, because it doesn't use the latest and greatest shows or the most up-to-the minute production tricks. Of course, the truth is that creators are always limited to the anime that exists, and limited by the software/equipment that they have access to at the particular point in time when they do their work. But for some people, that just doesn't matter.

Of course, since any broad-based scoring is going to involve a significant popularity factor, it's not easy to get away from. And yes, there's a legitimate argument to be made that the best videos of 2005 may be better than the best videos of 2000, if we give technical superiority the edge when artistic value is even. However, the idea of a permanent, persistent, database seems to suggest that historical continuity should also be valued, not just what is currently popular.

Would an AMV "Hall of Fame" be a step in the right direction on this? Set some eligibility criteria (like premiered 2+ years in the past), admins screen for nominees, and then users vote to elect up to a certain max every year at the same time as the VCAs, along the lines of how the American pro sports do their HOFs. (And if we wanted to be elitist like them, we could even give more weight to creator votes, like they do with sportswriters.)

Of course, this is only one proposal, and something like this may not be necessary or desirable depending on what directions the admins want to take the current system.
dwchang wrote:most of us realize most of these abuses come from people putting way too much faith into the list and letting it define themselves as editors and thus their motivation.
x10. I was originally thinking of putting some stuff in about the community-culture ramifications of competing for top spots on the list, but decided not to, in order to keep it focused on the op system and not on my personal beefs with general culture. You've said it much more cleanly and incisively than I would have.

--K
Shin Hatsubai is a Premiere-free studio. Insomni-Ack is habitually worthless.
CHOPWORK - abominations of maceration
skywide, armspread : forward, upward
Coelem - Tenebral Presence single now freely available

User avatar
Kalium
Sir Bugsalot
Joined: Fri Oct 03, 2003 11:17 pm
Location: Plymouth, Michigan
Org Profile

Post by Kalium » Mon Feb 21, 2005 5:08 pm

The 'Hall of Fame' idea sounds a lot like the VCAs... Except for the admin screening, and I can already forsee a lot of political-type complaining there.

On the sockpuppet point, yeah, that's definately true. Remember this? That was just four guys (one being derobert, another being me :twisted:) at about 2 AM one night over the summer in #AMV. Admittedly, that bug has since been fixed, but that it happened at all indicates the sockpuppeting issue.

Personally, I like the idea of a way to leave general comments on a video, without having to leave an op to do it.

Anyway, you've been reading journals again, haven't you?

User avatar
Kai Stromler
Joined: Fri Jul 12, 2002 9:35 am
Location: back in the USSA
Org Profile

Post by Kai Stromler » Mon Feb 21, 2005 5:31 pm

Kalium wrote:The 'Hall of Fame' idea sounds a lot like the VCAs... Except for the admin screening, and I can already forsee a lot of political-type complaining there.
Halls of Fame are always political, and always create pointless drama. The aging requirement is supposed to ameliorate this (by selecting in favor of stuff that stands the test of time), but eventually, as you get more and more Hall of Famers, the standard of greatness dilutes, and you get people picking sides and campaigning for their favorite marginal entrant. I think that AMV is young enough, at this point, that we can get away with it, but that's really just punting the problems ahead 10 years and hoping whoever is running the site will be able to get a handle on the issues at stake. In most pro sports the waiting period is five years after retirement; the trick for something like this is to nail the right point between "stands the test of time" and "this is perishable art anyways" for eligibility.

Of course, these are all implementation issues and maybe shouldn't even be under discussion yet.

I seem to recall something from a couple years ago about how older movies like The Bicycle Thief and even Citizen Kane are being pushed off more of younger film critics' top-whatever lists, so perhaps the problem of lack of persistence is endemic to pop-culture criticism (which is what any kind of ranking of this kind of cultural artifact is, when you get down to it). Explanations ranged from changing tastes to population of the sample space (number of films, and thus number of great films to choose from on a limited list, scales with time) to lack of historical perspective to even the rising-tide effect, so maybe the shift in the top 10% has a nontrivial natural component.
Kalium wrote:Anyway, you've been reading journals again, haven't you?
Whose? I seldom read other people's journals and am always befuddled when mine gets hits despite not being linked on the main page.


I maybe should have posted this under the "option to disable ops" thread, but I didn't want to threadjack them with a gigantic screed. The option to disable op scoring, whether user- or creator-selectable, seems like a step in the right direction, even though it will definitely require some extra coding, maybe a lot if the admins want to make it legacy-compatible.

--K
Shin Hatsubai is a Premiere-free studio. Insomni-Ack is habitually worthless.
CHOPWORK - abominations of maceration
skywide, armspread : forward, upward
Coelem - Tenebral Presence single now freely available

trythil
is
Joined: Tue Jul 23, 2002 5:54 am
Status: N͋̀͒̆ͣ͋ͤ̍ͮ͌ͭ̔̊͒ͧ̿
Location: N????????????????
Org Profile

Post by trythil » Mon Feb 21, 2005 5:59 pm

Kai Stromler wrote: Advantages of removing opinion scoring:
* near-removal of scores-based gamesmanship
* obviation of the rising-tide problem
* removal of redundancy in favor of existing superior methods
* re-imaging of the opinion as a venue for critique and comment outside mathematical systems
I'd like to add:

* reduction in complexity of site database schema
* significant reduction in number of members whining about meaningless numbers(+)

(+) The rest will whine about their low star ratings, and will still have to be periodically bitchslapped.

User avatar
Corran
Joined: Mon Oct 14, 2002 7:40 pm
Contact:
Org Profile

Post by Corran » Mon Feb 21, 2005 6:17 pm

I wonder if anyone on the top 10% list would consider commiting suicide if the scores were removed. There are a few people that I've seen lately that I wouldn't put this kind of thinking beyond.

In all seriousness though I think this would be a great idea. The only problem is convincing Phade and a few others to say yes.

If anything, we could keep the top 10% list and make it another optional thing to go along with my post here:
http://www.animemusicvideos.org/phpBB/v ... hp?t=46700

User avatar
godix
a disturbed member
Joined: Sat Aug 03, 2002 12:13 am
Org Profile

Post by godix » Mon Feb 21, 2005 6:21 pm

Your plan is missing the thing about ops I personally like the most, the comments. I'd prefer seeing the op system stay, perhaps modified, just to preserve those somewhat rare gems where the reviewer left an indepth review. I see several easy (compared to your idea at least) solutions to modify the op system instead of ditching it.

The first, and most obvious, is just to make it so new accounts can't leave ops for awhile. It's much less likely sockpuppets will be an issue if someone has to set them up six months in advance. It'll also help by giving a new member some time to experience more of the hobby thus reducing things like 'OMG! A DBZ vid to Linkin Park! I've never heard of that combination before, I must give it all 10's!'

If it was decided to not shut off ops entirely based on account age, they could be weighted instead. Something simple like if the account is under a month old it has 1/10th the weight valie as currently. Under 1 year would be 1/2. One year on up would be full value. A few specially entrusted accounts could be given a higher value with the understanding they'll use it to specifically counteract sockpuppets they see. This way in order to game the system it'd require MANY more sockpuppets than currently, would be much more obvious than currently, and would be far easier to counteract by a single op from an experienced member.

Another possability is to do like the stars ratings, if someone has a history of leaving the same score their op gets discarded when figuring the totals.

And one last possability, just ignore the op system. Rely on recommendations in the forums to find videos and pretend the op system and the top 10% based on it don't exist. I personally couldn't tell you what videos are in the top 10% anymore but I have a pretty good idea what videos the forum members think are good.
Kalium wrote:On the sockpuppet point, yeah, that's definately true. Remember this? That was just four guys (one being derobert, another being me Twisted Evil) at about 2 AM one night over the summer in #AMV. Admittedly, that bug has since been fixed, but that it happened at all indicates the sockpuppeting issue.
IIRC the four guys couldn't game the system enough to get a video to the #1 slot alone and it required an unusual level of intervention to do so. As time goes on and top videos get even more ops than they had (Euphoria is pushing 1000 according to some guys sig) it becomes even harder to game. So while it does indicate the sockpuppeting issue, it also indicates a somewhat small number of sockpuppets is rather ineffective.
Image

User avatar
Kalium
Sir Bugsalot
Joined: Fri Oct 03, 2003 11:17 pm
Location: Plymouth, Michigan
Org Profile

Post by Kalium » Mon Feb 21, 2005 6:37 pm

Kai Stromler wrote:I think that AMV is young enough, at this point, that we can get away with it, but that's really just punting the problems ahead 10 years and hoping whoever is running the site will be able to get a handle on the issues at stake.
<warp ahead>
I remember before I could add lens flares with my mind...
</warp ahead>
Kai Stromler wrote:Whose? I seldom read other people's journals and am always befuddled when mine gets hits despite not being linked on the main page.
I meant technical journals. And your journal entries do get linked on the main page.
trythil wrote:* reduction in complexity of site database schema
BIIIIIIG plus! The last thing the Org needs at the moment is a more complicated schema.
Corran wrote:I wonder if anyone on the top 10% list would consider commiting suicide if the scores were removed. There are a few people that I've seen lately that I wouldn't put this kind of thinking beyond.
Should we really worry about that, though? I'd think that it's really not our buisness if someone has issues like that.
godix wrote:So while it does indicate the sockpuppeting issue, it also indicates a somewhat small number of sockpuppets is rather ineffective.
Let's just say derobert is vulnerable to strange ideas late at night. That's not the only one I've caused.

User avatar
Zarxrax
Joined: Sun Apr 01, 2001 6:37 pm
Contact:
Org Profile

Post by Zarxrax » Mon Feb 21, 2005 9:36 pm

I fully agree with all the ideas stated in the parent thread. The opinion system is broken, and outdated. With the recommended videos forum, we have yet another method of finding the rare gems. It still stands to see exactly how this will pan out in the future, as I'm most certain it will be hit by abuse.

I also rememebr a while back, me and AD were talking about ideas for the "video of the week" section. I forget exactly what AD proposed, but it was an excellant idea that would put the videos that deserve some attention into the spotlight. Things like this really serve to make the top 10% useless.

Just to point out exactly how worthwhile the current opinion system is, I am going to look through every opinion that "AMV Hell" has recieved. Here are the results:

- Total Opinions on AMV Hell: 123
- Opinions that were helpful and provided insight into things that I had never thought about, driving me to improve myself: 3
- Opinions that had some thought put into them, but still didn't do much aside from praise or condemn the video: 26
- Worthless opinions that just had some scores and a few lines of meaningless comments: 94

If I remember the original intent of the opinion system, it was for that first category of opinions, the kind I only got 3 of. The 26 people that wrote a fair bit were well meaning, and likely intended for their comments to help me in some way, so lets say those are valid opinions as well. The remaining 94 opinions exist solely to assign numbers to the video. That is about 70% of the total opinions!

The purpose of the opinion system is not to rate videos. By removing the scoring aspect, we leave the way for more genuine comments about the videos, rather than people just wanting to rate it.

Locked

Return to “Site Help & Feedback”