Org Database Overview

After more than 20 years of looking like this, the site is planning a major rebuild! We need your feedback!!
Post Reply
User avatar
Phade
Site Admin
Joined: Fri Oct 20, 2000 10:49 pm
Location: Little cabin in the woods...
Org Profile

Org Database Overview

Post by Phade » Sun Jul 27, 2025 3:07 pm

Hey Everyone,

I've started fleshing out the details of the new Org database. It will be similar to the old one, mainly because our core mission hasn't changed. This diagram shows the relationships between the major sections, each of which will have a lead curator (the same person can curate more than one section, but these are the major sections).

Image

The Org News section is self-explanatory; it's the news that currently goes on the home page.

The Org Wiki will be a phase 2 addition to replace the "how-to" guides.

The site Members section is also generally self-explanatory. The trick here, though, will be integrating or replicating user information into the forum tables; members should be forum users and forum users should be members.

The Anime section is likely to be complicated. One reason is that the anime titles should be standardized and not a haphazard jumble of names. There are existing anime databases available that we can replicate, providing additional details such as the overarching franchise, seasons, episodes, characters, and thumbnails for display. The trick will be curating and updating the list regularly as new anime is released.

The Music section will be similarly complicated but also similarly standardized with external databases we can replicate.

There are big plans for the AMV Events section with a better buildout of event organizer tools and ease of management all around.

The AMV Feedback section will continue with star ratings as well as an improved review and quick comments tools.

Org Banners have been an ongoing breath of fresh air ever since they were launched and they should continue for sure!

Finally, the Donations Management will be improved for timely responses and accurate accountability.

Each of these sections will need a lead curator. Do you need to know how to program? No. Do you need to have a deep understanding of how databases work? Not really. Do you need to be a passionate nerd about a particular topic? That's the thing we need for a curator!!

If you would like to help curate one or more of these sections, please let me know! We need your help!! The programming nerds will help you out. 😀

Thanks again and have fun!!

Phade.

User avatar
Phade
Site Admin
Joined: Fri Oct 20, 2000 10:49 pm
Location: Little cabin in the woods...
Org Profile

Re: Org Database Overview

Post by Phade » Tue Jul 29, 2025 7:57 pm

If we zoom into the Members section, here is what the database will likely look like.

While the main members table may be expanded by the forum technical details, the information listed here directly relates to the Org's member information. Splitting the stats to its own table will allow calculations to take place, but not affect the main table itself. Member images will allow for historical views of what your member profile banner was, so that you can switch it up and switch it back as needed. The preferences section will allow us to add preference options without having to redraw the schema later on. Social media links, followers, and playlists round out the rest of the fun stuff.

Unlike a general username and password check, authentication will be a separate service of its own, allowing greater flexibility in how and with whom users can authenticate. The main feature will be authenticating via external sources such as Google. This will allow us to definitively verify who belongs to which YouTube channel.

If you have any questions or comments, please share now so that we can easily make adjustments now if needed; fixing things later will require much more effort.

Image

User avatar
Phade
Site Admin
Joined: Fri Oct 20, 2000 10:49 pm
Location: Little cabin in the woods...
Org Profile

Re: Org Database Overview

Post by Phade » Wed Jul 30, 2025 6:29 pm

The video information schema in the database is so large that I had to split it into two images!

The first section here is the general video information that the AMV creator will provide. The base information consists of the usual stuff like the video title, description, thumbnail image, premiere date, and so on. Since an AMV can have more than one video source (anime and/or non-anime), audio source, collaborator, or category, the listings for those are in separate tables. Now that we have extra storage space, an AMV can have multiple files uploaded for multiple purposes, such as a general distribution version, a contest version, a high-res archive version, and so on. Links to external streaming sites like YouTube are also available.

The feedback section focuses on members interacting with the video itself. The tried and true star rating will remain a staple of general feedback section as well as the more formal text-based reviews. Members will be able to add videos to playlists and choose whether the playlist is public or private. Finally, there are general video warnings that the video creator or member can give to the AMV. These warnings are for general categories like language, violence, nudity, and so on, as well as a "specific warning" for content that doesn't fit into a general category, like snakes, vomit, overt racism, sexual assault, and other specific content that viewers may find objectionable. Having members provide this feedback can help average out what the true warnings should be so that search results can be more accurate.

More schema diagrams to come!!

Image

Image

User avatar
Phade
Site Admin
Joined: Fri Oct 20, 2000 10:49 pm
Location: Little cabin in the woods...
Org Profile

Re: Org Database Overview

Post by Phade » Thu Jul 31, 2025 6:08 pm

Here is a draft of the next section: AMV Contest Events

Admittedly, this is a Phase 2 feature. But, I wanted to give a preview of things to come. This section will allow AMV contest coordinators to create the base entry for their contest (or exhibition), the events in their contest, rules going around each of them, participants, awards, and winners.

The grayed-out videos and video files tables are the same as the original Videos section of the schema. The new portion here is the submission of a particular video file to an event. Often, contests require that a video submission be "clean", meaning that there are no bumpers, watermarks, or other visually identifiable markings to reveal who the video creator is. This method will allow a "clean" copy to be available to contests, but not make it available for general download.

Image

User avatar
Kariudo
Twilight prince
Joined: Fri Jul 15, 2005 11:08 pm
Status: 1924 bots banned and counting!
Location: Los taquitos unidos
Contact:
Org Profile

Re: Org Database Overview

Post by Kariudo » Wed Feb 18, 2026 9:25 pm

Ok, jumping in on the Video Information section (and just that one for now). Pardon any formatting mistakes, it's been a while since I've been here.

Using dbdiagram.io cuz
  • it has a free tier
  • it has markdown-like text that is easy to copy and edit
  • it can export to Database-specific DDL
https://dbdiagram.io/d/Video-Informatio ... fce2fd72ff (not editable, but you can copy/paste the "DBML" in the left column)
Image

We're gonna need to hash out a naming convention for tables/columns here. I'm mixing pascalCase and snake_case right now, and I don't like that.

I'm going to push for a few key things:
  1. Not storing image data directly in the database
  2. Using UUIDs for at least some of the tables' primary keys
  3. Using ACLs (Access Control Lists) to manage CRUD permissions rather than relying on Ids in the tables themselves to determine ownership
Point 1: Image Data Storage
While image data could be stored in a blob column, it'd cause issues with bloating the database's cache and likely degrade performance (the db engine would need to read or skip all of that extra data, with no concrete size, from cache/disk for each row when running a query). Storing a file path and grabbing the file through NFS sounds like the accepted solution here.

Point 2: UUIDs
Postgres (the selected DBMS for the rewrite) has a UUID type, and it does have an indexing strategy for them (pretty much just a b-tree index, looks like). UUIDs being larger means the resulting index is larger, but that's hopefully not a big concern. A UUIDv7 or ULID may be a better choice than a UUIDv4 as they dedicate some number of bits to a time component, but that may depend on having software/language support. Could always just include a datetime instead if we need to index/search by time.

Outside of security reasons (basically impossible to guess what another UUID will be, let alone what UUID is used for the previous or next row), the main benefit to using UUIDs is that a UUID can be generated without needing to involve the database (unlike integers/auto-increments/sequences).

That makes things easier when trying to insert child/grandchild records.
Should also help keep API-side logic simpler. I've dealt with trying to create related records too much in my day job to ignore that consideration.

Think like inserting a new Video, and then the Visual and Audio sources (both just junction tables sitting between the Video and the Anime/Song tables to handle the many-many relationship). Being able to generate ids locally means the relationship can be established at the time we create the objects, rather than needing to create the video, insert it, then create the junction table rows for Anime/Music, then insert those.

Point 3: ACLs
I'd argue we should remove the member id from the AMV table, and just keep all of the creators in the VideoCreators/Editors/Collaborators (mostly) junction table. The ACL can store information on which user can edit the catalog entry, and also grant administration and community curators the ability to make edits.

Keeping editor information in a single table should also simplify queries and make them more consistent. We'd only need to filter on a single table to get the list of AMVs to display on someone's profile, and it's the same query/filters even if someone isn't the primary editor. Ditto that in general searches.

Having a "Owner" or "Primary" flag in VideoCreators could help the web app easily determine which member to display in the video card (search/main pages)

Miscellaneous
My initial draft of the DB schema for the rewrite tried to remove AMV-specific terminology. Part of the idea was that some other community, e.g. Lets Plays or perhaps a Game Jam, could use what we're building here for themselves. Become the Wordpress of multimedia creations. So I came up with "Works" (as in "a work of art") as a generic term. We don't need to keep that. My main concern is in trying to avoid doubling words in database table names like "VideoVideoSource"

Categories (drama, action, comedy, etc...), Codecs, Link Sources, and Link Statuses are pretty set in stone, so it may make more sense to have them as an Enum instead of a separate table.
About the only downside is that the DB would need to go down to add/remove any values.

The "VisualSources" table is a bit of an oddball. My idea here is that this will allow us to better handle anime "aliases" in searches and when creating a catalog entry. A single Id for a single Anime (or other source), and multiple names for that one Id (e.g. "Shingeki no Kyojin" and "Attack on Titan").
Keeping the "primary" title off of the "VisualSources" table is also done, following the same "let's keep queries and filters consistent" idea of removing the user id from the Works/Videos/AMVs table.
Image
Image

User avatar
Phade
Site Admin
Joined: Fri Oct 20, 2000 10:49 pm
Location: Little cabin in the woods...
Org Profile

Re: Org Database Overview

Post by Phade » Thu Feb 19, 2026 5:36 pm

Thank you for providing the feedback! I am always up for new ways to look at the same problem. 🙂
Kariudo wrote:
Wed Feb 18, 2026 9:25 pm
Point 1: Image Data Storage
While image data could be stored in a blob column, it'd cause issues with bloating the database's cache and likely degrade performance (the db engine would need to read or skip all of that extra data, with no concrete size, from cache/disk for each row when running a query). Storing a file path and grabbing the file through NFS sounds like the accepted solution here.
I wholeheartedly agree. File systems should handle files and data systems should handle data. The database should store information about the images while the actual images live outside the database on the file system. This will make everything run better and was the original intent (sorry if it didn’t come across that way).

Kariudo wrote:
Wed Feb 18, 2026 9:25 pm
Point 2: UUIDs
Postgres (the selected DBMS for the rewrite) has a UUID type, and it does have an indexing strategy for them (pretty much just a b-tree index, looks like). UUIDs being larger means the resulting index is larger, but that's hopefully not a big concern. A UUIDv7 or ULID may be a better choice than a UUIDv4 as they dedicate some number of bits to a time component, but that may depend on having software/language support. Could always just include a datetime instead if we need to index/search by time.

Outside of security reasons (basically impossible to guess what another UUID will be, let alone what UUID is used for the previous or next row), the main benefit to using UUIDs is that a UUID can be generated without needing to involve the database (unlike integers/auto-increments/sequences).

That makes things easier when trying to insert child/grandchild records.
Should also help keep API-side logic simpler. I've dealt with trying to create related records too much in my day job to ignore that consideration.

Think like inserting a new Video, and then the Visual and Audio sources (both just junction tables sitting between the Video and the Anime/Song tables to handle the many-many relationship). Being able to generate ids locally means the relationship can be established at the time we create the objects, rather than needing to create the video, insert it, then create the junction table rows for Anime/Music, then insert those.
I strongly agree that removing sequential video ID integers from general public view is a good idea and should be implemented. However, I’m not sure if UUID is the right fit for this project.

As you pointed out, from an internal database perspective, it is more difficult to index, join, and search a UUID string than an integer. This makes database activities more computationally expensive, which is especially important for less powerful devices, like our current internal failover systems, phones, and tablets (future use for an Org App).

Another issue is that if we create tightly-coupled direct integrations with other systems that use integer IDs (forum, wiki, etc.), we would need to map between integers and UUIDs. This setup would take more time to create the integration and may further complicate things if we migrate systems (moving from [forum x] to [forum y] for our site usage).

If we were creating billions of IDs throughout disconnected systems and we needed to guarantee the uniqueness of a new ID without comparing the new ID to all existing IDs (see IDs used for Google Docs, Sheets, Slides, etc.), I’d say UUID would be a good way to go.

But from an external user aesthetic perspective, UUIDs are long and ugly compared to integers or alternative IDs.

Instead of a UUID, we could use a shorter alphanumeric ID for public view while keeping integers for private internal comparisons. For example, YouTube uses 11-character case-sensitive alphanumeric (with dash and underscore) IDs for their videos. This gives YouTube videos a theoretical limit of 64^11 IDs, which is about 73.8 quintillion values. I’m sure they scrub those for accidental words (would be funny if some YouTube video ID was -ButtHoe69-), so the limit is probably only 1 quintillion IDs.

I suggest that we continue to use integer ID for internal use due to computational efficiency and cross-system compatibility but never directly reveal the integer to the public, while we generate an alternate ID for public view. Perhaps we could use the same alphanumeric options but with 64^5 for a limit of one-billion 5-character video ID values. We can pre-create a million 5-character IDs, auto-scrub them for words and other undesirable character strings (FUmom, 69Bro), and pop from that sanitized stack every time we need a new video ID.

From an API coding perspective, the API would “pop” a new video ID from the database (computationally fast) and then push that ID with the information insert exactly as you described above. The “pop” function marks that ID as “consumed” in the database so that a subsequent “pop” does not ever reuse that ID.

We could then have URLs like https://animemusicvideos.org/video/4wCh7 or shorter https://a-m-v.org/4wCh7

I think those are more visually pleasing than a displayed UUID like this:

Code: Select all

https://animemusicvideos.org/video/d5581502-befe-41ed-8b89-2aebf900a028 
Somewhat related, user IDs should be treated similarly and also remove visible sequence integers, but still keep them in the background, kinda like this:

Member Profile: https://www.youtube.com/@Ileia
Video Entry: https://youtu.be/26u32u4bh3s

YouTube has a similar randomly-generated ID system for users and channels in addition to the username text value. We could create and allow that identifier for users with illegal characters in their username.

Kariudo wrote:
Wed Feb 18, 2026 9:25 pm
Point 3: ACLs
I'd argue we should remove the member id from the AMV table, and just keep all of the creators in the VideoCreators/Editors/Collaborators (mostly) junction table. The ACL can store information on which user can edit the catalog entry, and also grant administration and community curators the ability to make edits.

Keeping editor information in a single table should also simplify queries and make them more consistent. We'd only need to filter on a single table to get the list of AMVs to display on someone's profile, and it's the same query/filters even if someone isn't the primary editor. Ditto that in general searches.

Having a "Owner" or "Primary" flag in VideoCreators could help the web app easily determine which member to display in the video card (search/main pages)

While keeping video information separate from creator information improves data structure normalization, it can also increase computational costs and complicate data integrity checks.

The Org has the requirement that every video must have a primary video owner/creator. Of the current 209,512 video entries, there are only 3,863 videos with cataloged collaborators. This means that 98.2% of all AMVs are created by a single person, though some AMVs may have missing collaborator entries (I’m willing to bet that more than 95% of all AMVs have single-person creators).

Similarly for multi-editor projects, all MEP videos must have one primary coordinator with everyone else as collaborators.

From an internal database perspective, if we wanted to pull a list of video titles and creators and we have the video creator in the videos table like it is now, only one join is needed to pull information: videos -> users. Pulling collaborators from a separate table will be quicker with a small table dataset since the vast majority of AMVs don’t have collaborators.

But if we keep all user information separate from the videos table, the system will always have to do two joins to get a basic list of information instead of one: videos -> video_users.owner -> users. It also runs the risk of orphaned video records, either on insert, update, or delete. Mitigating that risk through data integrity checks outside of the videos table further complicates CRUD interactions and increases computational expense.

Enforcing the rule of “all videos must have an owner” is easiest to maintain if that member is directly inside the primary video information table. This will allow the database itself to enforce non-null video owners on insert and update while enforcing referential integrity with existing users. It also makes it easy to identify who can add collaborators to a MEP video, keeping the confirmation of collaboration to the person added in the collaborators table.

Kariudo wrote:
Wed Feb 18, 2026 9:25 pm
Miscellaneous
My initial draft of the DB schema for the rewrite tried to remove AMV-specific terminology. Part of the idea was that some other community, e.g. Lets Plays or perhaps a Game Jam, could use what we're building here for themselves. Become the Wordpress of multimedia creations. So I came up with "Works" (as in "a work of art") as a generic term. We don't need to keep that. My main concern is in trying to avoid doubling words in database table names like "VideoVideoSource"

Categories (drama, action, comedy, etc...), Codecs, Link Sources, and Link Statuses are pretty set in stone, so it may make more sense to have them as an Enum instead of a separate table.
About the only downside is that the DB would need to go down to add/remove any values.

The "VisualSources" table is a bit of an oddball. My idea here is that this will allow us to better handle anime "aliases" in searches and when creating a catalog entry. A single Id for a single Anime (or other source), and multiple names for that one Id (e.g. "Shingeki no Kyojin" and "Attack on Titan").
Keeping the "primary" title off of the "VisualSources" table is also done, following the same "let's keep queries and filters consistent" idea of removing the user id from the Works/Videos/AMVs table.

I agree. Some of the original tables and columns were named specifically with AMVs in mind, like “anime” as the video source table and “songs” for the audio source table. However, you are correct that even with AMVs, video sources may not always be “anime” and audio may not always be “music”.

In addition, the use of external standardized databases (for example: MusicBrainz for music, AniList for anime, IMDB for non-anime, etc), while making things easier and more consistent for users, further complicates data tracking within the system. Updating the data structures to reflect this wider availability of sources, both internal and external, should be taken into consideration with all new data structures.

Even then, sometimes “video” sources are used as “audio” sources, as it is for trailer-based AMVs. In that case, the IMDB entry for a movie would be listed as the “audio” source because of its trailer. But if there is a database of trailers, we might switch to that one so that we can be even more specific for the audio source ([movie X] came out with a teaser trailer on [date A], [date B], and [date C] with full trailers on [date D] and [date E]; this AMV audio source comes from [movie X] [date D] trailer).

But this brings up a broader question: Are we creating a system for AnimeMusicVideos.org to use, or are we creating a general system that AnimeMusicVideos.org happens to use?

If it is for ourselves, we can be mildly stringent in the project creation and use structures that optimize how AMVs are cataloged. But if the project is open to outside audiences using and contributing to the creation of the system, we must be even more vigilant across all aspects of the project. I’m already concerned about how we will be internally doing basic tasks like functionality testing, code review, system architecture, production rollouts, external integrations, etc. If we make the system audience scope even larger, we must spend more time considering options outside of our own AMV use cases. I’m not saying this is a bad thing to do, just that it adds complexity and time to the project development.

Phade.

User avatar
Kariudo
Twilight prince
Joined: Fri Jul 15, 2005 11:08 pm
Status: 1924 bots banned and counting!
Location: Los taquitos unidos
Contact:
Org Profile

Re: Org Database Overview

Post by Kariudo » Wed Feb 25, 2026 4:35 pm

Phade wrote:
Thu Feb 19, 2026 5:36 pm
I strongly agree that removing sequential video ID integers from general public view is a good idea and should be implemented. However, I’m not sure if UUID is the right fit for this project.

As you pointed out, from an internal database perspective, it is more difficult to index, join, and search a UUID string than an integer. This makes database activities more computationally expensive, which is especially important for less powerful devices, like our current internal failover systems, phones, and tablets (future use for an Org App).
That wasn't really what I was getting at. Outside of the concerns with indexing (which I'll get to next), would joins and searching really be impacted?

UUIDv7 addresses most of the issues with indexing because the first 48 bits are derived from the current time (reducing demand on the entropy source compared to UUIDv4). That gives it a greater ability to be sorted, which helps keeps records created close in time also close in the index/internal db pages.

Postgres also has a uuid data type that stores it on the back end in 16 bytes. That's double of a bigint (8 bytes), but the key point is that it's not being stored as a string.

The UUIDs could be generated in the Database itself much like an auto-number sequence (Postgres 18 added a function to generate UUIDv7, prior versions could only generate a UUIDv4), but to reiterate, I believe we should be generating these outside of the database. Specifically, in the API layer. Clients would be unaffected.

I would much rather do this

Code: Select all

for(json 10 video data with visuals){
    video = new videoWithId();
    
    for(visual) {
	videoVisualSource = new VideoVisualSource();
        videoVisualSource.videoId = video.id;
        videoVisualSourceList.add(videoVisualSource);
    }
    
    videoList.add(video);
}

databaseinsert(videoList);
databaseInsert(videoVisualSourceList);
compared to doing this

Code: Select all

for(json 10 video data with visuals){
    video = new video();
    videoList.add(video);
    
    for(visual) {
	videoVisualSource = new VideoVisualSource();
        videoVisualSourceList.add(videoVisualSource);
        tempIdToVisualSourceList.get(video.tempId).add(videoVisualSource);
    }
}

insertedVideos = databaseInsert(videoList);

for(insertedVideos) {
    tempIdToRealIdMap.put(video.tempId => video.id);
}

for(tempIdToVisualSourceList as key => value){
    realId = tempIdToRealIdMap.get(key);
    for(visualSourceList) {
        videoVisualSource.videoId = realId;
        videoVisualSourceListFlat.add(videoVisualSource);
    }
}
insertedVisualSources = databaseInsert(videoVisualSourceListFlat);
My argument here kinda hinges on creating multiple videos (or multiple of anything that has child records under it) at once. If we don't allow that (or if it just naturally doesn't happen) and only have a single video inserted at a time, then things become simpler but not quite as simple as the "we have the id before inserting into the db" case.
Another assumption here is that we shift to a "gather all video information first, then start inserting" approach rather than the "create database records and return the id between each step of video creation" that I think we're doing today.

If we ever want to "scale out" the database as part of the hybrid-cloud approach we've discussed in the past for failover, UUIDs are the way to go (as you've said) unless the cloud DB is only a read replica. UUIDs aren't the only way to scale out the DB for a writable DB instance, but it is an IETF standard. Beyond that, the name of the game here is collision resistance, which pretty much requires a good PRNG.

If UUIDv7 solves the database performance side of the equation, then the remaining concern is how performant is creating the UUIDs on the API side?
The implementation built into Kotlin (v2.3+) uses Java's SecureRandom(), which if my reading is correct uses /dev/urandom and thus shouldn't block.
The first call is slow, but then it speeds up once the seed has been set.
I believe the Spring application is long-lived (it doesn't shut down and start up on each request), so I'd be hopeful that we wouldn't run into the slow start for SecureRandom()
Phade wrote:
Thu Feb 19, 2026 5:36 pm
Another issue is that if we create tightly-coupled direct integrations with other systems that use integer IDs (forum, wiki, etc.), we would need to map between integers and UUIDs. This setup would take more time to create the integration and may further complicate things if we migrate systems (moving from [forum x] to [forum y] for our site usage).
I don't quite understand the scenario you're painting here, can you go into more detail?

Is it something like tying the forum user and main site user together so we can provide links from a user profile on the forum to their profile on the main site (and vice-versa)?

If that's the case then a mapping would need to happen regardless of the data type of the id field, because phpBB (or any other external system we plug into the site) isn't going to be the source of truth regarding users/credentials. If phpBB were our source of truth for users/credentials, then a mapping would be required if/when we add a wiki.

Keycloak is our Identity Provider, our Identity and Access Management. That's how we get social logins, passkeys, OAuth, and more. Maybe there's a way for us to include extra data in keycloak (and retrieve that when we fetch an access token) so the access token can contain the forum user id (which we don't have control over). If it's in the access token, then it'll be available to the API layer. I'll have to look into that a bit more.

If we expose an API endpoint that needs the phpBB user id, and if we can include that in the access token, then the API simply needs to use the appropriate id.
Phade wrote:
Thu Feb 19, 2026 5:36 pm
But from an external user aesthetic perspective, UUIDs are long and ugly compared to integers or alternative IDs.

Instead of a UUID, we could use a shorter alphanumeric ID for public view while keeping integers for private internal comparisons. For example, YouTube uses 11-character case-sensitive alphanumeric (with dash and underscore) IDs for their videos. This gives YouTube videos a theoretical limit of 64^11 IDs, which is about 73.8 quintillion values. I’m sure they scrub those for accidental words (would be funny if some YouTube video ID was -ButtHoe69-), so the limit is probably only 1 quintillion IDs.

I suggest that we continue to use integer ID for internal use due to computational efficiency and cross-system compatibility but never directly reveal the integer to the public, while we generate an alternate ID for public view. Perhaps we could use the same alphanumeric options but with 64^5 for a limit of one-billion 5-character video ID values. We can pre-create a million 5-character IDs, auto-scrub them for words and other undesirable character strings (FUmom, 69Bro), and pop from that sanitized stack every time we need a new video ID.

From an API coding perspective, the API would “pop” a new video ID from the database (computationally fast) and then push that ID with the information insert exactly as you described above. The “pop” function marks that ID as “consumed” in the database so that a subsequent “pop” does not ever reuse that ID.

We could then have URLs like https://animemusicvideos.org/video/4wCh7 or shorter https://a-m-v.org/4wCh7

I think those are more visually pleasing than a displayed UUID like this:

Code: Select all

https://animemusicvideos.org/video/d5581502-befe-41ed-8b89-2aebf900a028 
Somewhat related, user IDs should be treated similarly and also remove visible sequence integers, but still keep them in the background, kinda like this:

Member Profile: https://www.youtube.com/@Ileia
Video Entry: https://youtu.be/26u32u4bh3s

YouTube has a similar randomly-generated ID system for users and channels in addition to the username text value. We could create and allow that identifier for users with illegal characters in their username.
Yeah, it'd be hard to argue that uuids are aesthetic. I still think that, in general, the fewer database calls we have to make, the better.
Sticking to hex would all but eliminate bad strings (trying to filter bad strings is not trivial, so I'd rather avoid the issue).
8 characters gets us 2^32 (4 billion) combinations

The tradeoffs here are
  • Still need to index this "display id" so we can query the records efficiently. If it's not sortable, we run into the same issue as UUIDv4 (poor indexing, lots of page splits)
  • We need a very large id-space to avoid the birthday paradox. That, or pre-generating ids (which gets harder to generate as more ids get used) which involves another round trip to the DB. It doesn't necessarily waste space, just pre-allocates it.
  • Extra logic, database calls, or a database trigger to create the "display id"
  • Permissions with Spring ACLs (more on that later) gets more complicated if it's not immediately translatable to a Long. Whatever Id we use on the front-end, that's the id we're stuck with for ACLs
Looking at a few other options, it looks like people like nanoId and sqids
though the simplest approach would probably be to convert the UUID to hex and then base64URL encode it
019c96ae-2fc8-7be6-a8c2-417064182976
becomes
AZyWri_Ie-aowkFwZBgpdg
Phade wrote:
Thu Feb 19, 2026 5:36 pm
While keeping video information separate from creator information improves data structure normalization, it can also increase computational costs and complicate data integrity checks.

The Org has the requirement that every video must have a primary video owner/creator. Of the current 209,512 video entries, there are only 3,863 videos with cataloged collaborators. This means that 98.2% of all AMVs are created by a single person, though some AMVs may have missing collaborator entries (I’m willing to bet that more than 95% of all AMVs have single-person creators).

Similarly for multi-editor projects, all MEP videos must have one primary coordinator with everyone else as collaborators.

From an internal database perspective, if we wanted to pull a list of video titles and creators and we have the video creator in the videos table like it is now, only one join is needed to pull information: videos -> users. Pulling collaborators from a separate table will be quicker with a small table dataset since the vast majority of AMVs don’t have collaborators.

But if we keep all user information separate from the videos table, the system will always have to do two joins to get a basic list of information instead of one: videos -> video_users.owner -> users. It also runs the risk of orphaned video records, either on insert, update, or delete. Mitigating that risk through data integrity checks outside of the videos table further complicates CRUD interactions and increases computational expense.
Given a random video id, and only that id, we can't possibly know if it's a single editor or multi-editor video. From both the search results and an individual video page, that second join is inevitable. A basic index on video_id (in the collaborators table) should remove concerns related to performance, and having an entry in the collaborators table for each video should get about as close to an "ideal" index as can be expected in a real environment.

Database normalization isn't really what I'm chasing here. It's more about
  1. keeping business logic and queries consistent
  2. preferring joins over multiple database queries (and the additional network round-trips and subsequent processing)
My wager is that it'll be faster and impose less computational load to have the DB manage a few joins and return a single result-set
than it would be to do the videos -> users join, do a second query for collaborators, then loop over that second result-set and merge it with the first
Phade wrote:
Thu Feb 19, 2026 5:36 pm
Enforcing the rule of “all videos must have an owner” is easiest to maintain if that member is directly inside the primary video information table. This will allow the database itself to enforce non-null video owners on insert and update while enforcing referential integrity with existing users. It also makes it easy to identify who can add collaborators to a MEP video, keeping the confirmation of collaboration to the person added in the collaborators table.
Fair point. "At_Least_One-to-Many" isn't really a good fit for SQL, and introduces a chicken-and-egg problem. I think the compromise here would be
  • Keep the user id in the videos table where we can add a NOT NULL constraint
  • But still go through the collaborators table (which we need to query anyway) to get at the user data
I'm really trying to keep authorization separate from the data. Be it deciding who can edit a video, who can add collaborators, who can edit a profile, etc...
Having that extra level of indirection, so that the actual data doesn't dictate who does/does not have permission, is a significant boon to security

Spring (the framework we're using) has a component, Spring Security, that makes it dead simple to enforce authorization. As a contrived (but still pretty close to reality) example

Code: Select all

@PreAuthorize("hasPermission(video_id, edit)")
fun addCollaborators(userIds: List<Id>): Account {
    // ... is only invoked if the user calling this endpoint has the `edit` permission
    // for the video_id in question
}
Access Control Entries can also be added to allow moderators/administration to add collaborators or edit the video. If there's a reason to separate the ability to edit videos from adding collaborators, there's a way to add custom permissions. We can have up to 32 permissions total (with read/write/create/delete and administration already pre-defined)

The consistency of handling that through a single interface means there's a lot less room for mistakes.
Phade wrote:
Thu Feb 19, 2026 5:36 pm
I agree. Some of the original tables and columns were named specifically with AMVs in mind, like “anime” as the video source table and “songs” for the audio source table. However, you are correct that even with AMVs, video sources may not always be “anime” and audio may not always be “music”.

In addition, the use of external standardized databases (for example: MusicBrainz for music, AniList for anime, IMDB for non-anime, etc), while making things easier and more consistent for users, further complicates data tracking within the system. Updating the data structures to reflect this wider availability of sources, both internal and external, should be taken into consideration with all new data structures.

Even then, sometimes “video” sources are used as “audio” sources, as it is for trailer-based AMVs. In that case, the IMDB entry for a movie would be listed as the “audio” source because of its trailer. But if there is a database of trailers, we might switch to that one so that we can be even more specific for the audio source ([movie X] came out with a teaser trailer on [date A], [date B], and [date C] with full trailers on [date D] and [date E]; this AMV audio source comes from [movie X] [date D] trailer).

But this brings up a broader question: Are we creating a system for AnimeMusicVideos.org to use, or are we creating a general system that AnimeMusicVideos.org happens to use?

If it is for ourselves, we can be mildly stringent in the project creation and use structures that optimize how AMVs are cataloged. But if the project is open to outside audiences using and contributing to the creation of the system, we must be even more vigilant across all aspects of the project. I’m already concerned about how we will be internally doing basic tasks like functionality testing, code review, system architecture, production rollouts, external integrations, etc. If we make the system audience scope even larger, we must spend more time considering options outside of our own AMV use cases. I’m not saying this is a bad thing to do, just that it adds complexity and time to the project development.

Phade.
I was perhaps a bit over-broad, initially thinking that this system wouldn't just be for AMVs, or even only Music Videos, but to a broader set of Multimedia communities.

Going that broad would be difficult (how would media "sources" be handled for a Let's Play? or a community for music-backed semi-dynamic typography like the short-story memories from Lost Odyssey?)

So I think I'll take a step back. Let's focus on Music Videos, but drop the "music" and "anime" terminology.
AvatarMusicVideos, CartoonMusicVideos, vidding communities in general could use what we build, but any involvement from those other communities are years down the road (if it happens at all).

So with that in mind, and considering our discord call on the 21st, I'll propose the following general naming scheme:
  • Videos (this one is what the site currently has, no?)
  • Visuals (generically covers Anime, Manga, FMVs, Game cutscenes, Comics, etc...)
  • Audios (or is Audio the plural, and Audii the singular?)
  • Events
  • Contests
  • Playlists
  • Either Members or Users
and we've agreed to use snake_case for database table and column names
Image
Image

eat_those_lemons
Joined: Tue Feb 10, 2026 7:33 pm
Org Profile

Re: Org Database Overview

Post by eat_those_lemons » Thu Feb 26, 2026 8:50 pm

Generally I agree with what Karido said. I have specific responses but most of the responses are generally worrying about pre-optimizing. I do have a question of why there is no join table between "works" and "work_categories".

I agree we should use UUIDv7's they solve some of the database locality issues as well as allow the greatest amount of flexibility. I think that sequential id's and related solutions are attractive from a simplicity perspective but I've worked on too many projects that have changing requirements or additional features to not choose the most flexible option when its such a simple change with few trade offs.

I also agree with the "gather all video information first, then start inserting" method.

I do keep seeing database performance being talked about and I think we are falling in to the common trap of pre-optimization. Joins are basically free now with modern databases, we should stop trying to optimize joins and fall into the premature optimization trap

I also agree with using ACL's. That will make security much easier. I've dealt with to many security issues to want to complicate it. (Used to work on a system that tried to make access "easy" and avoid acl's and it was a mess)

I agree that trying to make a system flexible enough that any other community could use it was overly ambitious. I think that going that broad complicates our design as well as we don't know enough about those communities needs that we would be guessing when making things "work for everyone". So we have a high chance of complicating our job and ending up with something that doesn't really work for other communities.

I do also agree that it would be good to remove the anime specific verbiage

User avatar
Kariudo
Twilight prince
Joined: Fri Jul 15, 2005 11:08 pm
Status: 1924 bots banned and counting!
Location: Los taquitos unidos
Contact:
Org Profile

Re: Org Database Overview

Post by Kariudo » Fri Feb 27, 2026 6:14 pm

eat_those_lemons wrote:
Thu Feb 26, 2026 8:50 pm
I have specific responses but most of the responses are generally worrying about pre-optimizing
Well, yes and no. Without making too much information public, database related issues are probably the thing that's impacting the org the most right now (outside of poorly-behaved llm-scraping).
Stuff like indexing UUIDv4s is pretty well understood at this point
eat_those_lemons wrote:
Thu Feb 26, 2026 8:50 pm
I do have a question of why there is no join table between "works" and "work_categories".
Because in my initial diagram, I turned the categories into an Enum.

I think an Enum is suitable here because the individual Categories haven't really changed in over a decade. That and we decided (a few years ago) to not chase Internationalization, rather choosing to let in-browser translation do the heavy lifting. If we were going to support internationalization, then a separate table would make more sense.

The one exception that I see for us is Anime titles, which is why the video_visual_sources -> visual_sources -> visual_source_names part of the diagram (considering the naming convention we've agreed on) is a bit weird (one additional level removed from the standard "junction table" approach). That's not so much intended to provide internationalization so much as it is intended to make it easier for people to search for anime when creating a catalog entry or searching for AMVs based on an anime. I think I've said this before, but my main goal with this is to allow for people to search for anime using either the English or Romanji titles. Attack on Titan vs Shingeki no Kyojin and the like.

I think (at least for now) that Joins are better than multiple queries
and using an Enum is better than joining on a table (if the data is constant)
Image
Image

Post Reply

Return to “Org Site Rebuild”