Examination of removing opinion scoring

Castor Troy · Post by **Castor Troy** » Mon Feb 21, 2005 9:46 pm

Zarxrax wrote:The purpose of the opinion system is not to rate videos. By removing the scoring aspect, we leave the way for more genuine comments about the videos, rather than people just wanting to rate it.

I despise the opinion system and I want it to die a bloody death. However, as long as humans live on this planet, they have the right to give a short or as long review as they want and there's nothing we can do to change that.

downwithpants · Post by **downwithpants** » Mon Feb 21, 2005 10:37 pm

Kai Stromler wrote:Opinion scoring is broken.

yes. there is no single baseline to which we can assess scores, with the exception of the reviewability score, which actually does have quasi-objective standards for each score that appear when you hover your mouse over the scores. this is probably why the review score is lower than the other scores. many people who systematically give opinions do have their own scoring strategies - comparing to an perceived average or some other baseline they have set, setting explicit standards for each score rating, etc.

furthermore, the rising tide effect not only results from the advance of technology and the general public advance of the hobby/art, but also occurs simply because users may interpret the site average to be the score of an "average" video standard.

Kai Stromler wrote:Opinion scoring is vulnerable to sockpuppet attacks.

obviously.

Kai Stromler wrote:Opinion scoring is redundant as a selection measure for the "best" videos.

yes. but the purpose of the opinion scoring is not to a selection measure for the "best videos." even though it has always been the primary measure of "best videos". there is a difference between the purpose of the star scale ratings and opinions. star scale ratings are an index meant to let viewers quickly gauge the estimated "quality" of the video before downloading the video. opinions are feedback that communicate what the opinion giver liked and disliked about the video.

as such, i don't think the opinions score be removed because it is important for creators to receive feedback and having a scoring system helps the opinion giver communicate how much she liked certain aspects of the video. however, one fix that i think would reduce some of the problems kai has listed without defeating the purpose of opinions is to remove the top 10% list or make it visible only to creators who have received opinions.

either option would:
-reduce the rising tide effect, by preventing opinion givers from comparing scores between modern and older videos and by opinion givers from comparing scores to a constantly rising average.
-reduce the weight of getting high opinions scores among the creators' motivations, hopefully lessening the incentive to create sockpuppet accounts, and otherwise abuse the scoring/opinion exchanges system.
-eliminate the misconception that opinions are intended to put creators into the spotlight (if anything, let the star scale/vcas do this)
-force opinion givers to create their own standards off which to compare scores (instead of site average scores or the scores of ranking videos, which often have biased scores for various reasons).

differential opinionating can't be eliminated, even if opinions were mandatory and easy like the star scale, simply because people download better "quality" videos more often than they do poorer videos.

Scintilla · Post by **Scintilla** » Tue Feb 22, 2005 12:16 am

One thing I just thought of:

With the Top 10% list set up the way it is, it's very easy to point someone to, or find for oneself, a list of very good (if not always the best) videos in a specific genre, or videos that are great examples of lip synch or digital effects, etc. Just one simple link and there you go.

If we were to do away with the Top 10% list, or to hide it from non-creators as DWP suggested, viewers seeking such categorized lists of great videos would have to go to the new Recommended AMVs board and search through the posts to find the videos they were looking for.
To add to this, there's also the fact that there are plenty of site members who have never set foot inside the forums.

Also, there are some videos that are ineligible for receiving star ratings because they aren't allowed to be locally hosted, yet are still great videos nevertheless (AMV Hell 2 comes to mind).

YouTube · Post by **derobert** » Tue Feb 22, 2005 2:02 am

I doubt this is easily implementable, but would it help if we completely re-did the way the top 10 list were computed? For example, what if we treated each member's opinion scores in each category not as a score on some global scale, but rather as relative to their other scores. Then assume that each member's scores in each category should follow the normal distribution curve, and curve them until they do. Then, having done that for each member, compute the ranking from the curved scores.

Put this together with some sock-puppet defeating measures (e.g., new accounts count none/less) and would it help?

Kalium · Post by **Kalium** » Tue Feb 22, 2005 7:27 am

That would probably help stem the rising-tide effect.

dwchang · Post by **dwchang** » Tue Feb 22, 2005 12:55 pm

Scintilla wrote:With the Top 10% list set up the way it is, it's very easy to point someone to, or find for oneself, a list of very good (if not always the best) videos in a specific genre, or videos that are great examples of lip synch or digital effects, etc. Just one simple link and there you go.

If we were to do away with the Top 10% list, or to hide it from non-creators as DWP suggested, viewers seeking such categorized lists of great videos would have to go to the new Recommended AMVs board and search through the posts to find the videos they were looking for.
To add to this, there's also the fact that there are plenty of site members who have never set foot inside the forums.

Also, there are some videos that are ineligible for receiving star ratings because they aren't allowed to be locally hosted, yet are still great videos nevertheless (AMV Hell 2 comes to mind).

Although I agree with the points of the original point that the system is broken, ultimately Scintilla is right in that people actually do use this list (including myself). Obviously my gripe is the inaccuracy of it due to abuse and fanboy'ism of recent shows. He's also right in that a lot of members don't even visit the forums.

So I guess what I'm saying is that the list has a good purpose, but people are abusing it (for whatever reasons) and thus making it less worthwhile and useful.

derobert wrote:I doubt this is easily implementable, but would it help if we completely re-did the way the top 10 list were computed? For example, what if we treated each member's opinion scores in each category not as a score on some global scale, but rather as relative to their other scores. Then assume that each member's scores in each category should follow the normal distribution curve, and curve them until they do. Then, having done that for each member, compute the ranking from the curved scores.

Put this together with some sock-puppet defeating measures (e.g., new accounts count none/less) and would it help?

Thing is, I was a major advocate of the implementation of the baysian system and for awhile (a few months) it looked like it was really working. I mean it still does to some degree since it's harder to abuse (as opposed to get 3 ops of all 10's and being at the top of the list and list being in flux way too much).

However one thing I did not realize is that well...the reason IMDB.com's list seems to stay stable and free from abuse is because it lacks something we have. Namely creators. People on that site legitimately vote for movies without any external motivation or sockpuppets. It's not like Steven Spielberg visits the site to see where Schindler's List is on the site right? It's not like there is a huge legion of Schindler's List fans who will give it all 10's right?

On this site, we do have this motivation as well as much more rabid fans about particular shows (or rather smaller membership count compared to them and larger % of fans per show). Thus, with the way things are...I honestly think no matter what we do, there *will* be abuse. It's just a matter of how difficult we make it while still keeping things statistically sound (which I think the baysian system *can* be like it is on IMDB.com).

I am by no means advocating complacency just because it won't be perfect. I guess the best thing would be to implement a new system and see what the results are before going "public" with it. I remember Phade did that with the recent iteration of the scoring system and the list seemed to make "more sense" then. Obviously fans and creators have "evolved" and now the system should improve as well. To be honest, I am not sure what is the best option.

There were some interesting points like weighing reviewer's scores (perhaps if they give all 10's all the time, it's not counted? or perhaps usefullness?) as well as the waiting period (which will lessen, but not remove, sockpuppets).

At the same time, as I said above, is it really our right to do such things? I may not like rabid fans of particular shows and how they inflate everything, but they are members just like me and deserve a "say." Even if I disagree with it and quite frankly find it stupid. I mean the list is just as much their's as it is ours/mine right? I may not agree with the list, but obviously hundreds of people do...even though I imagine their conception of "good" is different since they probably only watch videos to that show. It's certainly a valid point and to be honest, although I write it, I'm not sure where I stand on it given the abuse and inaccuracy I see.

Regardless, I think this type of discussion will certainly at least lead to a few ideas that can be implemented and we can see which works the best.

theocide · Post by **theocide** » Tue Feb 22, 2005 1:03 pm

derobert wrote:I doubt this is easily implementable, but would it help if we completely re-did the way the top 10 list were computed? For example, what if we treated each member's opinion scores in each category not as a score on some global scale, but rather as relative to their other scores. Then assume that each member's scores in each category should follow the normal distribution curve, and curve them until they do. Then, having done that for each member, compute the ranking from the curved scores.

Put this together with some sock-puppet defeating measures (e.g., new accounts count none/less) and would it help?

I agree. I say we need to intensify the Top 10%'s equation. It could incorporate star scale, rate of d/l, etc.

Org Profile · Post by **Phade** » Tue Feb 22, 2005 1:15 pm

Hey All,

There are lots of good points and good suggestions listed here in this thread. Hopefully I’ll address most of them in this post.

The purpose of the Top 10% List is have something for newbies to immediately go to in order to find good AMVs. The procedure usually goes like this: Post: “Hey, I’m a newbie. What are some good AMVs?” Reply: “/me points you to Top 10% list...” Members who have been around for a while are likely to already have downloaded the majority of the list and also download potential new additions before they appear on the list.

The Top Star Scale list is more of an ongoing useful list for all members. The All Time Star Scale list can be used for newbies as well as seasoned members. I know that I personally use it by looking at the weekly/all-time list and narrowing the band by choosing different min/max options (ex 40/160 is a good one to look at on the All Time list since it weeds out newbie vids with few stars as well as vids that have just been around for a long time that I’ve likely already downloaded). Sliding the min/max factors around can result in some good vids for old and new members.

But back to the problems of the actual scores...

Instead of dumping the whole system and losing lots of valuable data, perhaps additional “fixes” can be applied to the existing system. Hopefully the result will be just as pleasing as when the Bayesian average was first applied.

Possible scoring “fudge” factors:

<ol><li>Slight negative discount for videos made with only crazy-popular anime
How this could help: With ultra-popular anime, there is a much higher percentage of pure fanboy scores. Having a slight discount for these anime would counteract the fanboy factor.
Possible abuse: There is not much an individual can do to affect the popularity of anime in AMVs.
Abuse countermeasures: Since there is very little an individual can do to affect this factor, countermeasures are not needed.</li><li>Members who have been here longer (older accounts) get slightly more weight than newer accounts
How this could help: Members who have been around generally have more AMV experience. These members’ opinions should help counteract newbies who think the first few AMVs they’ve ever seen are the greatest things in the AMV world. This would also combat any “sock puppet” problem.
Possible abuse: Newer members could attempt to highjack older accounts in order to have more weight to their scores. Older members could be jerks and give lower scores inappropriately.
Abuse countermeasures: Detection for password-guessing attacks. Require less-than-obvious passwords.</li><li>A members first opinions have less weight than newer opinions
How this could help: As a member becomes more familiar with videos, their opinion scores are likely to be more true. This would also combat any “sock puppet” problem.
Possible abuse: A member could give a series of bogus opinions before giving opinions on the vids they want to pump up.
Abuse countermeasures: Add an opinion score by the video creator (See factor #4...)</li><li>Add an admin/creator-given usefulness score for each opinion
How this could help: Video creators would be able to identify members who do a particularly good job of giving opinions. These members would then have a higher weight for their opinions since they appear to be more thought out than normal or just particularly biased.
Possible abuse: An individual member could be a dick and give low usefulness scores no matter what or give high scores to a group of friends so that their factor is inappropriately increased.
Abuse countermeasures: Since it would take many members conspiring together to have this factor wrongly indicated, the likelihood of a successful incorrect score and then have it be enough to adversely affect the scoring system is very low.</li><li>Add a score for “I am a big fan of this anime” to “I’m not really a big fan of this anime”
How this could help: With the upcoming “anime I’ve watched” section, this score would be used as an inverse adjustment factor. If you are not a fan of the anime but score an AMV highly with that anime in it, the score gets a boost since the creator must have done something really good for you to give it a higher score.
Possible abuse: A member could put the opposite of their “fan” preference so that the inverse factor would swing towards their preferred anime.
Abuse countermeasures: To combat abuse, the system will also be used in searches such as the Suggestion Query for returning better fit videos to the user. The “fan” preference would be made into a global list of popular anime. If the member put the opposite of their true preference, their favorite anime would be negatively impacted in this list. The “I’m a fan” list would also be made visible as a part of the member profile.</li><li>Create an AMV Hall Of Fame
How this could help: By creating a so-called “hall of fame”, video creators could not directly affect any particular system but instead would truly have to create a good AMV in order to be on the list. The list would be voted upon by a set of trusted/competent members. Video age and other criteria must be met before being a potential candidate. More to follow if implemented...
Possible abuse: Since voting members would be a trusted group, abuse should be at a minimum.
Abuse countermeasures: Since voting members would be a trusted group, abuse should be at a minimum.</li></ol>

With each of these score adjustment factors, they will have to be tweaked so that the final list appears to be the most correct. Certain factors can be judged by using certain benchmark videos (ex. Scott Melzer’s “This Is DBZ Life” video uses only DBZ footage (ultra-popular anime) but it is actually a very good video).

The only issue that has not been address is the “rising tide” issue. The only answer to that issue that I can think of right now is that the tide can only rise so high; after that, members would have to go back and lower their scores for previous videos in order to “make room” for the newer better videos.

No matter what we do, there will always be a certain level of abuse of the system. Hopefully by implementing some or all of these adjustment factors, abuse can be mitigated and the Top 10% List can be salvaged and more genuinely reflect the status of AMVs.

Phade.

Kai Stromler · Post by **Kai Stromler** » Tue Feb 22, 2005 1:20 pm

derobert wrote:I doubt this is easily implementable, but would it help if we completely re-did the way the top 10 list were computed? For example, what if we treated each member's opinion scores in each category not as a score on some global scale, but rather as relative to their other scores. Then assume that each member's scores in each category should follow the normal distribution curve, and curve them until they do. Then, having done that for each member, compute the ranking from the curved scores.

Put this together with some sock-puppet defeating measures (e.g., new accounts count none/less) and would it help?

The sockpuppet-defeating stuff is necessary on this, because otherwise it's still vulnerable to a moderately sophisticated attack (which I'll be happy to explain via PM, because this thread is about fixing/replacing the op-scoring system, not doing proof-of-concept attacks on it), but what's more to be concerned about is that selecting relative to other scores on the member's other videos will tend to promote only one video to the top list per creator (the highest outlier). Right now, 6 of AD's 25 premiered videos are on the top 10%. Under the new system, maybe one or two, depending on the distance between mean and outlier for everybody else under consideration. Normalizing scores to the creator's own standard would recognize more people, but if it's about the videos, the top list should recognize the most worthy videos on the site regardless of who made them.

The mean-outlier distance thing also allows the sockpuppet attack, which while workable in theory is probably beyond the math skills of most people who allow their self-worth to be dictated by opinion scores.

If I've misunderstood something wrt normalization, please say so; I may just have done too much OS in my last year of grad school and so automatically, on seeing a situation expecting a normal curve, think solely about how to break it by inducing a heavy-tail.

--K

silver_moon · Post by **silver_moon** » Tue Feb 22, 2005 2:56 pm

Phade wrote:Members who have been here longer (older accounts) get slightly more weight than newer accounts

I kind of like this idea. Thinking back to when I first joined, I went around giving pretty high scores to some Final Fantasy vids that weren't that great, but because I had seen very few other music videos, they seemed like the coolest videos on the planet to me. I think I give much fairer and more thoughtful opinions now, and I honestly would not mind if my older opinions did not count as much.

Plus this would make it more difficult to bring a video to the top 10% using fake accounts, since all the fake accounts would be new.