Org Database Overview

After more than 20 years of looking like this, the site is planning a major rebuild! We need your feedback!!
Post Reply
User avatar
Phade
Site Admin
Joined: Fri Oct 20, 2000 10:49 pm
Location: Little cabin in the woods...
Org Profile

Org Database Overview

Post by Phade » Sun Jul 27, 2025 3:07 pm

Hey Everyone,

I've started fleshing out the details of the new Org database. It will be similar to the old one, mainly because our core mission hasn't changed. This diagram shows the relationships between the major sections, each of which will have a lead curator (the same person can curate more than one section, but these are the major sections).

Image

The Org News section is self-explanatory; it's the news that currently goes on the home page.

The Org Wiki will be a phase 2 addition to replace the "how-to" guides.

The site Members section is also generally self-explanatory. The trick here, though, will be integrating or replicating user information into the forum tables; members should be forum users and forum users should be members.

The Anime section is likely to be complicated. One reason is that the anime titles should be standardized and not a haphazard jumble of names. There are existing anime databases available that we can replicate, providing additional details such as the overarching franchise, seasons, episodes, characters, and thumbnails for display. The trick will be curating and updating the list regularly as new anime is released.

The Music section will be similarly complicated but also similarly standardized with external databases we can replicate.

There are big plans for the AMV Events section with a better buildout of event organizer tools and ease of management all around.

The AMV Feedback section will continue with star ratings as well as an improved review and quick comments tools.

Org Banners have been an ongoing breath of fresh air ever since they were launched and they should continue for sure!

Finally, the Donations Management will be improved for timely responses and accurate accountability.

Each of these sections will need a lead curator. Do you need to know how to program? No. Do you need to have a deep understanding of how databases work? Not really. Do you need to be a passionate nerd about a particular topic? That's the thing we need for a curator!!

If you would like to help curate one or more of these sections, please let me know! We need your help!! The programming nerds will help you out. šŸ˜€

Thanks again and have fun!!

Phade.

User avatar
Phade
Site Admin
Joined: Fri Oct 20, 2000 10:49 pm
Location: Little cabin in the woods...
Org Profile

Re: Org Database Overview

Post by Phade » Tue Jul 29, 2025 7:57 pm

If we zoom into the Members section, here is what the database will likely look like.

While the main members table may be expanded by the forum technical details, the information listed here directly relates to the Org's member information. Splitting the stats to its own table will allow calculations to take place, but not affect the main table itself. Member images will allow for historical views of what your member profile banner was, so that you can switch it up and switch it back as needed. The preferences section will allow us to add preference options without having to redraw the schema later on. Social media links, followers, and playlists round out the rest of the fun stuff.

Unlike a general username and password check, authentication will be a separate service of its own, allowing greater flexibility in how and with whom users can authenticate. The main feature will be authenticating via external sources such as Google. This will allow us to definitively verify who belongs to which YouTube channel.

If you have any questions or comments, please share now so that we can easily make adjustments now if needed; fixing things later will require much more effort.

Image

User avatar
Phade
Site Admin
Joined: Fri Oct 20, 2000 10:49 pm
Location: Little cabin in the woods...
Org Profile

Re: Org Database Overview

Post by Phade » Wed Jul 30, 2025 6:29 pm

The video information schema in the database is so large that I had to split it into two images!

The first section here is the general video information that the AMV creator will provide. The base information consists of the usual stuff like the video title, description, thumbnail image, premiere date, and so on. Since an AMV can have more than one video source (anime and/or non-anime), audio source, collaborator, or category, the listings for those are in separate tables. Now that we have extra storage space, an AMV can have multiple files uploaded for multiple purposes, such as a general distribution version, a contest version, a high-res archive version, and so on. Links to external streaming sites like YouTube are also available.

The feedback section focuses on members interacting with the video itself. The tried and true star rating will remain a staple of general feedback section as well as the more formal text-based reviews. Members will be able to add videos to playlists and choose whether the playlist is public or private. Finally, there are general video warnings that the video creator or member can give to the AMV. These warnings are for general categories like language, violence, nudity, and so on, as well as a "specific warning" for content that doesn't fit into a general category, like snakes, vomit, overt racism, sexual assault, and other specific content that viewers may find objectionable. Having members provide this feedback can help average out what the true warnings should be so that search results can be more accurate.

More schema diagrams to come!!

Image

Image

User avatar
Phade
Site Admin
Joined: Fri Oct 20, 2000 10:49 pm
Location: Little cabin in the woods...
Org Profile

Re: Org Database Overview

Post by Phade » Thu Jul 31, 2025 6:08 pm

Here is a draft of the next section: AMV Contest Events

Admittedly, this is a Phase 2 feature. But, I wanted to give a preview of things to come. This section will allow AMV contest coordinators to create the base entry for their contest (or exhibition), the events in their contest, rules going around each of them, participants, awards, and winners.

The grayed-out videos and video files tables are the same as the original Videos section of the schema. The new portion here is the submission of a particular video file to an event. Often, contests require that a video submission be "clean", meaning that there are no bumpers, watermarks, or other visually identifiable markings to reveal who the video creator is. This method will allow a "clean" copy to be available to contests, but not make it available for general download.

Image

User avatar
Kariudo
Twilight prince
Joined: Fri Jul 15, 2005 11:08 pm
Status: 1924 bots banned and counting!
Location: Los taquitos unidos
Contact:
Org Profile

Re: Org Database Overview

Post by Kariudo » Wed Feb 18, 2026 9:25 pm

Ok, jumping in on the Video Information section (and just that one for now). Pardon any formatting mistakes, it's been a while since I've been here.

Using dbdiagram.io cuz
  • it has a free tier
  • it has markdown-like text that is easy to copy and edit
  • it can export to Database-specific DDL
https://dbdiagram.io/d/Video-Informatio ... fce2fd72ff (not editable, but you can copy/paste the "DBML" in the left column)
Image

We're gonna need to hash out a naming convention for tables/columns here. I'm mixing pascalCase and snake_case right now, and I don't like that.

I'm going to push for a few key things:
  1. Not storing image data directly in the database
  2. Using UUIDs for at least some of the tables' primary keys
  3. Using ACLs (Access Control Lists) to manage CRUD permissions rather than relying on Ids in the tables themselves to determine ownership
Point 1: Image Data Storage
While image data could be stored in a blob column, it'd cause issues with bloating the database's cache and likely degrade performance (the db engine would need to read or skip all of that extra data, with no concrete size, from cache/disk for each row when running a query). Storing a file path and grabbing the file through NFS sounds like the accepted solution here.

Point 2: UUIDs
Postgres (the selected DBMS for the rewrite) has a UUID type, and it does have an indexing strategy for them (pretty much just a b-tree index, looks like). UUIDs being larger means the resulting index is larger, but that's hopefully not a big concern. A UUIDv7 or ULID may be a better choice than a UUIDv4 as they dedicate some number of bits to a time component, but that may depend on having software/language support. Could always just include a datetime instead if we need to index/search by time.

Outside of security reasons (basically impossible to guess what another UUID will be, let alone what UUID is used for the previous or next row), the main benefit to using UUIDs is that a UUID can be generated without needing to involve the database (unlike integers/auto-increments/sequences).

That makes things easier when trying to insert child/grandchild records.
Should also help keep API-side logic simpler. I've dealt with trying to create related records too much in my day job to ignore that consideration.

Think like inserting a new Video, and then the Visual and Audio sources (both just junction tables sitting between the Video and the Anime/Song tables to handle the many-many relationship). Being able to generate ids locally means the relationship can be established at the time we create the objects, rather than needing to create the video, insert it, then create the junction table rows for Anime/Music, then insert those.

Point 3: ACLs
I'd argue we should remove the member id from the AMV table, and just keep all of the creators in the VideoCreators/Editors/Collaborators (mostly) junction table. The ACL can store information on which user can edit the catalog entry, and also grant administration and community curators the ability to make edits.

Keeping editor information in a single table should also simplify queries and make them more consistent. We'd only need to filter on a single table to get the list of AMVs to display on someone's profile, and it's the same query/filters even if someone isn't the primary editor. Ditto that in general searches.

Having a "Owner" or "Primary" flag in VideoCreators could help the web app easily determine which member to display in the video card (search/main pages)

Miscellaneous
My initial draft of the DB schema for the rewrite tried to remove AMV-specific terminology. Part of the idea was that some other community, e.g. Lets Plays or perhaps a Game Jam, could use what we're building here for themselves. Become the Wordpress of multimedia creations. So I came up with "Works" (as in "a work of art") as a generic term. We don't need to keep that. My main concern is in trying to avoid doubling words in database table names like "VideoVideoSource"

Categories (drama, action, comedy, etc...), Codecs, Link Sources, and Link Statuses are pretty set in stone, so it may make more sense to have them as an Enum instead of a separate table.
About the only downside is that the DB would need to go down to add/remove any values.

The "VisualSources" table is a bit of an oddball. My idea here is that this will allow us to better handle anime "aliases" in searches and when creating a catalog entry. A single Id for a single Anime (or other source), and multiple names for that one Id (e.g. "Shingeki no Kyojin" and "Attack on Titan").
Keeping the "primary" title off of the "VisualSources" table is also done, following the same "let's keep queries and filters consistent" idea of removing the user id from the Works/Videos/AMVs table.
Image
Image

User avatar
Phade
Site Admin
Joined: Fri Oct 20, 2000 10:49 pm
Location: Little cabin in the woods...
Org Profile

Re: Org Database Overview

Post by Phade » Thu Feb 19, 2026 5:36 pm

Thank you for providing the feedback! I am always up for new ways to look at the same problem. šŸ™‚
Kariudo wrote: ↑
Wed Feb 18, 2026 9:25 pm
Point 1: Image Data Storage
While image data could be stored in a blob column, it'd cause issues with bloating the database's cache and likely degrade performance (the db engine would need to read or skip all of that extra data, with no concrete size, from cache/disk for each row when running a query). Storing a file path and grabbing the file through NFS sounds like the accepted solution here.
I wholeheartedly agree. File systems should handle files and data systems should handle data. The database should store information about the images while the actual images live outside the database on the file system. This will make everything run better and was the original intent (sorry if it didn’t come across that way).

Kariudo wrote: ↑
Wed Feb 18, 2026 9:25 pm
Point 2: UUIDs
Postgres (the selected DBMS for the rewrite) has a UUID type, and it does have an indexing strategy for them (pretty much just a b-tree index, looks like). UUIDs being larger means the resulting index is larger, but that's hopefully not a big concern. A UUIDv7 or ULID may be a better choice than a UUIDv4 as they dedicate some number of bits to a time component, but that may depend on having software/language support. Could always just include a datetime instead if we need to index/search by time.

Outside of security reasons (basically impossible to guess what another UUID will be, let alone what UUID is used for the previous or next row), the main benefit to using UUIDs is that a UUID can be generated without needing to involve the database (unlike integers/auto-increments/sequences).

That makes things easier when trying to insert child/grandchild records.
Should also help keep API-side logic simpler. I've dealt with trying to create related records too much in my day job to ignore that consideration.

Think like inserting a new Video, and then the Visual and Audio sources (both just junction tables sitting between the Video and the Anime/Song tables to handle the many-many relationship). Being able to generate ids locally means the relationship can be established at the time we create the objects, rather than needing to create the video, insert it, then create the junction table rows for Anime/Music, then insert those.
I strongly agree that removing sequential video ID integers from general public view is a good idea and should be implemented. However, I’m not sure if UUID is the right fit for this project.

As you pointed out, from an internal database perspective, it is more difficult to index, join, and search a UUID string than an integer. This makes database activities more computationally expensive, which is especially important for less powerful devices, like our current internal failover systems, phones, and tablets (future use for an Org App).

Another issue is that if we create tightly-coupled direct integrations with other systems that use integer IDs (forum, wiki, etc.), we would need to map between integers and UUIDs. This setup would take more time to create the integration and may further complicate things if we migrate systems (moving from [forum x] to [forum y] for our site usage).

If we were creating billions of IDs throughout disconnected systems and we needed to guarantee the uniqueness of a new ID without comparing the new ID to all existing IDs (see IDs used for Google Docs, Sheets, Slides, etc.), I’d say UUID would be a good way to go.

But from an external user aesthetic perspective, UUIDs are long and ugly compared to integers or alternative IDs.

Instead of a UUID, we could use a shorter alphanumeric ID for public view while keeping integers for private internal comparisons. For example, YouTube uses 11-character case-sensitive alphanumeric (with dash and underscore) IDs for their videos. This gives YouTube videos a theoretical limit of 64^11 IDs, which is about 73.8 quintillion values. I’m sure they scrub those for accidental words (would be funny if some YouTube video ID was -ButtHoe69-), so the limit is probably only 1 quintillion IDs.

I suggest that we continue to use integer ID for internal use due to computational efficiency and cross-system compatibility but never directly reveal the integer to the public, while we generate an alternate ID for public view. Perhaps we could use the same alphanumeric options but with 64^5 for a limit of one-billion 5-character video ID values. We can pre-create a million 5-character IDs, auto-scrub them for words and other undesirable character strings (FUmom, 69Bro), and pop from that sanitized stack every time we need a new video ID.

From an API coding perspective, the API would ā€œpopā€ a new video ID from the database (computationally fast) and then push that ID with the information insert exactly as you described above. The ā€œpopā€ function marks that ID as ā€œconsumedā€ in the database so that a subsequent ā€œpopā€ does not ever reuse that ID.

We could then have URLs like https://animemusicvideos.org/video/4wCh7 or shorter https://a-m-v.org/4wCh7

I think those are more visually pleasing than a displayed UUID like this:

Code: Select all

https://animemusicvideos.org/video/d5581502-befe-41ed-8b89-2aebf900a028 
Somewhat related, user IDs should be treated similarly and also remove visible sequence integers, but still keep them in the background, kinda like this:

Member Profile: https://www.youtube.com/@Ileia
Video Entry: https://youtu.be/26u32u4bh3s

YouTube has a similar randomly-generated ID system for users and channels in addition to the username text value. We could create and allow that identifier for users with illegal characters in their username.

Kariudo wrote: ↑
Wed Feb 18, 2026 9:25 pm
Point 3: ACLs
I'd argue we should remove the member id from the AMV table, and just keep all of the creators in the VideoCreators/Editors/Collaborators (mostly) junction table. The ACL can store information on which user can edit the catalog entry, and also grant administration and community curators the ability to make edits.

Keeping editor information in a single table should also simplify queries and make them more consistent. We'd only need to filter on a single table to get the list of AMVs to display on someone's profile, and it's the same query/filters even if someone isn't the primary editor. Ditto that in general searches.

Having a "Owner" or "Primary" flag in VideoCreators could help the web app easily determine which member to display in the video card (search/main pages)

While keeping video information separate from creator information improves data structure normalization, it can also increase computational costs and complicate data integrity checks.

The Org has the requirement that every video must have a primary video owner/creator. Of the current 209,512 video entries, there are only 3,863 videos with cataloged collaborators. This means that 98.2% of all AMVs are created by a single person, though some AMVs may have missing collaborator entries (I’m willing to bet that more than 95% of all AMVs have single-person creators).

Similarly for multi-editor projects, all MEP videos must have one primary coordinator with everyone else as collaborators.

From an internal database perspective, if we wanted to pull a list of video titles and creators and we have the video creator in the videos table like it is now, only one join is needed to pull information: videos -> users. Pulling collaborators from a separate table will be quicker with a small table dataset since the vast majority of AMVs don’t have collaborators.

But if we keep all user information separate from the videos table, the system will always have to do two joins to get a basic list of information instead of one: videos -> video_users.owner -> users. It also runs the risk of orphaned video records, either on insert, update, or delete. Mitigating that risk through data integrity checks outside of the videos table further complicates CRUD interactions and increases computational expense.

Enforcing the rule of ā€œall videos must have an ownerā€ is easiest to maintain if that member is directly inside the primary video information table. This will allow the database itself to enforce non-null video owners on insert and update while enforcing referential integrity with existing users. It also makes it easy to identify who can add collaborators to a MEP video, keeping the confirmation of collaboration to the person added in the collaborators table.

Kariudo wrote: ↑
Wed Feb 18, 2026 9:25 pm
Miscellaneous
My initial draft of the DB schema for the rewrite tried to remove AMV-specific terminology. Part of the idea was that some other community, e.g. Lets Plays or perhaps a Game Jam, could use what we're building here for themselves. Become the Wordpress of multimedia creations. So I came up with "Works" (as in "a work of art") as a generic term. We don't need to keep that. My main concern is in trying to avoid doubling words in database table names like "VideoVideoSource"

Categories (drama, action, comedy, etc...), Codecs, Link Sources, and Link Statuses are pretty set in stone, so it may make more sense to have them as an Enum instead of a separate table.
About the only downside is that the DB would need to go down to add/remove any values.

The "VisualSources" table is a bit of an oddball. My idea here is that this will allow us to better handle anime "aliases" in searches and when creating a catalog entry. A single Id for a single Anime (or other source), and multiple names for that one Id (e.g. "Shingeki no Kyojin" and "Attack on Titan").
Keeping the "primary" title off of the "VisualSources" table is also done, following the same "let's keep queries and filters consistent" idea of removing the user id from the Works/Videos/AMVs table.

I agree. Some of the original tables and columns were named specifically with AMVs in mind, like ā€œanimeā€ as the video source table and ā€œsongsā€ for the audio source table. However, you are correct that even with AMVs, video sources may not always be ā€œanimeā€ and audio may not always be ā€œmusicā€.

In addition, the use of external standardized databases (for example: MusicBrainz for music, AniList for anime, IMDB for non-anime, etc), while making things easier and more consistent for users, further complicates data tracking within the system. Updating the data structures to reflect this wider availability of sources, both internal and external, should be taken into consideration with all new data structures.

Even then, sometimes ā€œvideoā€ sources are used as ā€œaudioā€ sources, as it is for trailer-based AMVs. In that case, the IMDB entry for a movie would be listed as the ā€œaudioā€ source because of its trailer. But if there is a database of trailers, we might switch to that one so that we can be even more specific for the audio source ([movie X] came out with a teaser trailer on [date A], [date B], and [date C] with full trailers on [date D] and [date E]; this AMV audio source comes from [movie X] [date D] trailer).

But this brings up a broader question: Are we creating a system for AnimeMusicVideos.org to use, or are we creating a general system that AnimeMusicVideos.org happens to use?

If it is for ourselves, we can be mildly stringent in the project creation and use structures that optimize how AMVs are cataloged. But if the project is open to outside audiences using and contributing to the creation of the system, we must be even more vigilant across all aspects of the project. I’m already concerned about how we will be internally doing basic tasks like functionality testing, code review, system architecture, production rollouts, external integrations, etc. If we make the system audience scope even larger, we must spend more time considering options outside of our own AMV use cases. I’m not saying this is a bad thing to do, just that it adds complexity and time to the project development.

Phade.

Post Reply

Return to ā€œOrg Site Rebuildā€