Thank you for providing the feedback! I am always up for new ways to look at the same problem.
Kariudo wrote: āWed Feb 18, 2026 9:25 pm
Point 1: Image Data Storage
While image data could be stored in a blob column, it'd cause issues with bloating the database's cache and likely degrade performance (the db engine would need to read or skip all of that extra data, with no concrete size, from cache/disk for each row when running a query). Storing a file path and grabbing the file through NFS sounds like the accepted solution here.
I wholeheartedly agree. File systems should handle files and data systems should handle data. The database should store information about the images while the actual images live outside the database on the file system. This will make everything run better and was the original intent (sorry if it didnāt come across that way).
Kariudo wrote: āWed Feb 18, 2026 9:25 pm
Point 2: UUIDs
Postgres (the selected DBMS for the rewrite) has a UUID type, and it does have an indexing strategy for them (pretty much just a b-tree index, looks like). UUIDs being larger means the resulting index is larger, but that's hopefully not a big concern. A UUIDv7 or ULID may be a better choice than a UUIDv4 as they dedicate some number of bits to a time component, but that may depend on having software/language support. Could always just include a datetime instead if we need to index/search by time.
Outside of security reasons (basically impossible to guess what another UUID will be, let alone what UUID is used for the previous or next row), the main benefit to using UUIDs is that
a UUID can be generated without needing to involve the database (unlike integers/auto-increments/sequences).
That makes things easier when trying to insert child/grandchild records.
Should also help keep API-side logic simpler. I've dealt with trying to create related records too much in my day job to ignore that consideration.
Think like inserting a new Video, and then the Visual and Audio sources (both just junction tables sitting between the Video and the Anime/Song tables to handle the many-many relationship). Being able to generate ids locally means the relationship can be established at the time we create the objects, rather than needing to create the video, insert it, then create the junction table rows for Anime/Music, then insert those.
I strongly agree that removing sequential video ID integers from general public view is a good idea and should be implemented. However, Iām not sure if UUID is the right fit for this project.
As you pointed out, from an internal database perspective, it is more difficult to index, join, and search a UUID string than an integer. This makes database activities more computationally expensive, which is especially important for less powerful devices, like our current internal failover systems, phones, and tablets (future use for an Org App).
Another issue is that if we create tightly-coupled direct integrations with other systems that use integer IDs (forum, wiki, etc.), we would need to map between integers and UUIDs. This setup would take more time to create the integration and may further complicate things if we migrate systems (moving from [forum x] to [forum y] for our site usage).
If we were creating billions of IDs throughout disconnected systems and we needed to guarantee the uniqueness of a new ID without comparing the new ID to all existing IDs (see IDs used for Google Docs, Sheets, Slides, etc.), Iād say UUID would be a good way to go.
But from an external user aesthetic perspective, UUIDs are long and ugly compared to integers or alternative IDs.
Instead of a UUID, we could use a shorter alphanumeric ID for public view while keeping integers for private internal comparisons. For example, YouTube uses 11-character case-sensitive alphanumeric (with dash and underscore) IDs for their videos. This gives YouTube videos a theoretical limit of 64^11 IDs, which is about 73.8 quintillion values. Iām sure they scrub those for accidental words (would be funny if some YouTube video ID was -ButtHoe69-), so the limit is probably only 1 quintillion IDs.
I suggest that we continue to use integer ID for internal use due to computational efficiency and cross-system compatibility but never directly reveal the integer to the public, while we generate an alternate ID for public view. Perhaps we could use the same alphanumeric options but with 64^5 for a limit of one-billion 5-character video ID values. We can pre-create a million 5-character IDs, auto-scrub them for words and other undesirable character strings (FUmom, 69Bro), and pop from that sanitized stack every time we need a new video ID.
From an API coding perspective, the API would āpopā a new video ID from the database (computationally fast) and then push that ID with the information insert exactly as you described above. The āpopā function marks that ID as āconsumedā in the database so that a subsequent āpopā does not ever reuse that ID.
We could then have URLs like
https://animemusicvideos.org/video/4wCh7 or shorter
https://a-m-v.org/4wCh7
I think those are more visually pleasing than a displayed UUID like this:
Code: Select all
https://animemusicvideos.org/video/d5581502-befe-41ed-8b89-2aebf900a028
Somewhat related, user IDs should be treated similarly and also remove visible sequence integers, but still keep them in the background, kinda like this:
Member Profile:
https://www.youtube.com/@Ileia
Video Entry:
https://youtu.be/26u32u4bh3s
YouTube has a similar randomly-generated ID system for users and channels in addition to the username text value. We could create and allow that identifier for users with illegal characters in their username.
Kariudo wrote: āWed Feb 18, 2026 9:25 pm
Point 3: ACLs
I'd argue we should remove the member id from the AMV table, and just keep all of the creators in the VideoCreators/Editors/Collaborators (mostly) junction table. The ACL can store information on which user can edit the catalog entry, and also grant administration and community curators the ability to make edits.
Keeping editor information in a single table should also simplify queries and make them more consistent. We'd only need to filter on a single table to get the list of AMVs to display on someone's profile, and it's the same query/filters even if someone isn't the primary editor. Ditto that in general searches.
Having a "Owner" or "Primary" flag in VideoCreators could help the web app easily determine which member to display in the video card (search/main pages)
While keeping video information separate from creator information improves data structure normalization, it can also increase computational costs and complicate data integrity checks.
The Org has the requirement that every video must have a primary video owner/creator. Of the current 209,512 video entries, there are only 3,863 videos with cataloged collaborators. This means that 98.2% of all AMVs are created by a single person, though some AMVs may have missing collaborator entries (Iām willing to bet that more than 95% of all AMVs have single-person creators).
Similarly for multi-editor projects, all MEP videos must have one primary coordinator with everyone else as collaborators.
From an internal database perspective, if we wanted to pull a list of video titles and creators and we have the video creator in the videos table like it is now, only one join is needed to pull information: videos -> users. Pulling collaborators from a separate table will be quicker with a small table dataset since the vast majority of AMVs donāt have collaborators.
But if we keep all user information separate from the videos table, the system will always have to do two joins to get a basic list of information instead of one: videos -> video_users.owner -> users. It also runs the risk of orphaned video records, either on insert, update, or delete. Mitigating that risk through data integrity checks outside of the videos table further complicates CRUD interactions and increases computational expense.
Enforcing the rule of āall videos must have an ownerā is easiest to maintain if that member is directly inside the primary video information table. This will allow the database itself to enforce non-null video owners on insert and update while enforcing referential integrity with existing users. It also makes it easy to identify who can add collaborators to a MEP video, keeping the confirmation of collaboration to the person added in the collaborators table.
Kariudo wrote: āWed Feb 18, 2026 9:25 pm
Miscellaneous
My initial draft of the DB schema for the rewrite tried to remove AMV-specific terminology. Part of the idea was that some other community, e.g. Lets Plays or perhaps a Game Jam, could use what we're building here for themselves. Become the Wordpress of multimedia creations. So I came up with "Works" (as in "a work of art") as a generic term. We don't need to keep that. My main concern is in trying to avoid doubling words in database table names like "VideoVideoSource"
Categories (drama, action, comedy, etc...), Codecs, Link Sources, and Link Statuses are pretty set in stone, so it may make more sense to have them as an Enum instead of a separate table.
About the only downside is that the DB would need to go down to add/remove any values.
The "VisualSources" table is a bit of an oddball. My idea here is that this will allow us to better handle anime "aliases" in searches and when creating a catalog entry. A single Id for a single Anime (or other source), and multiple names for that one Id (e.g. "Shingeki no Kyojin" and "Attack on Titan").
Keeping the "primary" title off of the "VisualSources" table is also done, following the same "let's keep queries and filters consistent" idea of removing the user id from the Works/Videos/AMVs table.
I agree. Some of the original tables and columns were named specifically with AMVs in mind, like āanimeā as the video source table and āsongsā for the audio source table. However, you are correct that even with AMVs, video sources may not always be āanimeā and audio may not always be āmusicā.
In addition, the use of external standardized databases (for example: MusicBrainz for music, AniList for anime, IMDB for non-anime, etc), while making things easier and more consistent for users, further complicates data tracking within the system. Updating the data structures to reflect this wider availability of sources, both internal and external, should be taken into consideration with all new data structures.
Even then, sometimes āvideoā sources are used as āaudioā sources, as it is for trailer-based AMVs. In that case, the IMDB entry for a movie would be listed as the āaudioā source because of its trailer. But if there is a database of trailers, we might switch to that one so that we can be even more specific for the audio source ([movie X] came out with a teaser trailer on [date A], [date B], and [date C] with full trailers on [date D] and [date E]; this AMV audio source comes from [movie X] [date D] trailer).
But this brings up a broader question: Are we creating a system for AnimeMusicVideos.org to use, or are we creating a general system that AnimeMusicVideos.org
happens to use?
If it is for ourselves, we can be mildly stringent in the project creation and use structures that optimize how AMVs are cataloged. But if the project is open to outside audiences using and contributing to the creation of the system, we must be even more vigilant across all aspects of the project. Iām already concerned about how we will be internally doing basic tasks like functionality testing, code review, system architecture, production rollouts, external integrations, etc. If we make the system audience scope even larger, we must spend more time considering options outside of our own AMV use cases. Iām not saying this is a bad thing to do, just that it adds complexity and time to the project development.
Phade.