I've generated a few graphs of MEPs. These graphs link editors to MEPs, and are laid out using a force-directed layout algorithm (specifically, the Fruchterman-Reingold algorithm).
For this first iteration I did only a direct application of the algorithm, making no attempt to influence the layout with AMV-specific data, e.g. I did not try to pin project founders / coordinators closer to their projects and I did not try to group similar projects (RoS series, DDR series, NES/RVG, etc.). Nevertheless, the nature of the algorithm does expose some interesting insights into the collection of MEPs on the .org, and I'll discuss a few of them. You might (and, I hope, will) see other patterns. I'll discuss future work after I've presented the current work.
I broke the graphs up into three subsets: all MEPs with more than 2 editors, all MEPs having 2 < editor count <10>= 10 editors.
NOTE - These are big, big PNGs. I know Firefox and Safari can handle them, given sufficient memory. (For Firefox, you will want to turn off Firefox's automatic image resizing; this can be done by typing about:config in the address bar and setting browser.enable_automatic_image_resizing to false.) Nevertheless, you may want to download them and view them in a program explicitly designed to handle large images.
A note on reading these graphs: Vertices and edges are coded by brightness and color. The color is assigned by an object's ID, and is meaningless beyond that, except perhaps as a guide to assist in following edges. The brightness of an edge or vertex is directly proportional to the degree of the node:
- The brightness of a member node (and the brightness of its edges) is linear in the number of MEPs in which that member has participated.
- The brightness of a video node is linear in the number of editors participating in that project.
I tried some other scaling schemes, such as logarithmic scaling, but I liked the the way linear scaling worked out, so.
I am placing these graph images in the public domain. Do whatever you want with them.
Here's the graphs:
MEPs with more than two editors:
The middle section is quite unreadable, but by scrolling around you can see some neat large-scale patterns. First off, you can see a bunch of small production studios, as well as some that connect to larger projects by one or two people (Studio Gaijin, consisting of Kazz, Spyral, and Silvercat, is like this); secondly, there's a pretty obvious "mainstream" and then some much smaller, but still prolific, sub-communities. It's interesting that said "mainstream" and those smaller communities appear to be disconnected from each other.
The nature of the layout algorithm (i.e. in a nutshell, it attempts to minimize edge lengths, make edge lengths as uniform as possible, and distribute nodes as evenly as possible) tends to places MEPs with similar editor sets next to each other; this is how you get e.g. the DDR projects all stacked next to each other, with Kusoyaro pretty close to them all. It also tends to put editors who have done similar projects next to each other, and also tends to put editors who tend to do MEPs (in godix's terms, "MEP whores") in the center. You can check out the center of the "mainstream" segment to confirm this.
==
MEPs having (2, 10) editors:
This one takes out some of the noise caused by the mega-MEPs, and brings to light some more patterns. My favorite one is the Random Destination Studios cluster in the middle-right, mostly because it's one of the few I can recognize; I'm sure that there's other studio clusters present as well. It'd be interesting to see if (and how) those clusters interact.
Also, I think this graph also makes a case for AtomX actually being in RDS, despite his constant claims to the contrary. :P
==
MEPs with 10 or more editors:
Another view of the big MEPs.
==
So, where to go from here? Here's a few things I want to do:
- Address problems in the data: Not all the multi-editor projects cataloged on the .org are present here, simply because the collaboration data hasn't been entered for all MEPs. I'm pretty sure that I'm missing a bunch of More Than Toast videos, for example.
- Use more domain-specific data to shape the graph: One thing I want to do is use hit (and, if possible, download) counts to influence the graph. It could, for example, provide some insight into the most popular MEPs, the editors that tend to join those MEPs, and so forth. There's other things that can be done with the data from this site.
- Try different layout algorithms: Force-directed layout algorithms are pretty handy for this kind of stuff, but I've got my mind open to other ideas. One thing I'd like to do is to use splines or polylines instead of straight lines for the graph edges; that'd allow me to clean up the layout significantly. Unfortunately, I've not been able to get the solvers I've tried (graphviz and aiSee) to route spline edges with any sort of reasonable efficiency. I know it's a computationally-intensive problem, but I'm hoping there's something out there that will handle it.
==
Technical details: I used a pair of Ruby scripts to generate these graphs. The first Ruby script parsed video information pages using the Hpricot HTML parser (oh what I would give for a good Web service API), and stored the data in a MySQL database via Active Record. The second Ruby script actually did the graphing, controlling Graphviz via the ruby-graphviz bindings.
If anyone would like a copy of these scripts, the DOT files fed to Graphviz, and/or the dataset I used, just let me know in this thread and I'll post them. I'm not posting them now because I'm not particularly happy with the code (actually, some of it is outright embarrassing) and I'd like to clean it up before publishing it.