Phade wrote: ↑Thu Feb 19, 2026 5:36 pm
I strongly agree that removing sequential video ID integers from general public view is a good idea and should be implemented. However, I’m not sure if UUID is the right fit for this project.
As you pointed out, from an internal database perspective, it is more difficult to index, join, and search a UUID string than an integer. This makes database activities more computationally expensive, which is especially important for less powerful devices, like our current internal failover systems, phones, and tablets (future use for an Org App).
That wasn't really what I was getting at. Outside of the concerns with indexing (which I'll get to next), would joins and searching really be impacted?
UUIDv7 addresses most of the issues with indexing because the first 48 bits are derived from the current time (reducing demand on the entropy source compared to UUIDv4). That gives it a greater ability to be sorted, which helps keeps records created close in time also close in the index/internal db pages.
Postgres also has a uuid data type that stores it on the back end in 16 bytes. That's double of a bigint (8 bytes), but the key point is that it's not being stored as a string.
The UUIDs
could be generated in the Database itself much like an auto-number sequence (Postgres 18 added a function to generate UUIDv7, prior versions could only generate a UUIDv4), but to reiterate,
I believe we should be generating these outside of the database. Specifically, in the API layer. Clients would be unaffected.
I would much rather do this
Code: Select all
for(json 10 video data with visuals){
video = new videoWithId();
for(visual) {
videoVisualSource = new VideoVisualSource();
videoVisualSource.videoId = video.id;
videoVisualSourceList.add(videoVisualSource);
}
videoList.add(video);
}
databaseinsert(videoList);
databaseInsert(videoVisualSourceList);
compared to doing this
Code: Select all
for(json 10 video data with visuals){
video = new video();
videoList.add(video);
for(visual) {
videoVisualSource = new VideoVisualSource();
videoVisualSourceList.add(videoVisualSource);
tempIdToVisualSourceList.get(video.tempId).add(videoVisualSource);
}
}
insertedVideos = databaseInsert(videoList);
for(insertedVideos) {
tempIdToRealIdMap.put(video.tempId => video.id);
}
for(tempIdToVisualSourceList as key => value){
realId = tempIdToRealIdMap.get(key);
for(visualSourceList) {
videoVisualSource.videoId = realId;
videoVisualSourceListFlat.add(videoVisualSource);
}
}
insertedVisualSources = databaseInsert(videoVisualSourceListFlat);
My argument here kinda hinges on creating multiple videos (or multiple of anything that has child records under it) at once. If we don't allow that (or if it just naturally doesn't happen) and only have a single video inserted at a time, then things become simpler but not quite as simple as the "we have the id before inserting into the db" case.
Another assumption here is that we shift to a "gather all video information first, then start inserting" approach rather than the "create database records and return the id between each step of video creation" that I think we're doing today.
If we ever want to "scale out" the database as part of the hybrid-cloud approach we've discussed in the past for failover, UUIDs are the way to go (as you've said) unless the cloud DB is only a read replica. UUIDs aren't the only way to scale out the DB for a writable DB instance, but it is an IETF standard. Beyond that, the name of the game here is collision resistance, which pretty much requires a good PRNG.
If UUIDv7 solves the database performance side of the equation, then the remaining concern is how performant is creating the UUIDs on the API side?
The implementation built into Kotlin (v2.3+) uses Java's SecureRandom(), which if my reading is correct uses /dev/urandom and thus shouldn't block.
The first call is slow, but then it speeds up once the seed has been set.
I believe the Spring application is long-lived (it doesn't shut down and start up on each request), so I'd be hopeful that we wouldn't run into the slow start for SecureRandom()
Phade wrote: ↑Thu Feb 19, 2026 5:36 pm
Another issue is that if we create tightly-coupled direct integrations with other systems that use integer IDs (forum, wiki, etc.), we would need to map between integers and UUIDs. This setup would take more time to create the integration and may further complicate things if we migrate systems (moving from [forum x] to [forum y] for our site usage).
I don't quite understand the scenario you're painting here, can you go into more detail?
Is it something like tying the forum user and main site user together so we can provide links from a user profile on the forum to their profile on the main site (and vice-versa)?
If that's the case then a mapping would need to happen regardless of the data type of the id field, because phpBB (or any other external system we plug into the site) isn't going to be the source of truth regarding users/credentials. If phpBB were our source of truth for users/credentials, then a mapping would be required if/when we add a wiki.
Keycloak is our Identity Provider, our Identity and Access Management. That's how we get social logins, passkeys, OAuth, and more. Maybe there's a way for us to include extra data in keycloak (and retrieve that when we fetch an access token) so the access token can contain the forum user id (which we don't have control over). If it's in the access token, then it'll be available to the API layer. I'll have to look into that a bit more.
If we expose an API endpoint that needs the phpBB user id, and if we can include that in the access token, then the API simply needs to use the appropriate id.
Phade wrote: ↑Thu Feb 19, 2026 5:36 pm
But from an external user aesthetic perspective, UUIDs are long and ugly compared to integers or alternative IDs.
Instead of a UUID, we could use a shorter alphanumeric ID for public view while keeping integers for private internal comparisons. For example, YouTube uses 11-character case-sensitive alphanumeric (with dash and underscore) IDs for their videos. This gives YouTube videos a theoretical limit of 64^11 IDs, which is about 73.8 quintillion values. I’m sure they scrub those for accidental words (would be funny if some YouTube video ID was -ButtHoe69-), so the limit is probably only 1 quintillion IDs.
I suggest that we continue to use integer ID for internal use due to computational efficiency and cross-system compatibility but never directly reveal the integer to the public, while we generate an alternate ID for public view. Perhaps we could use the same alphanumeric options but with 64^5 for a limit of one-billion 5-character video ID values. We can pre-create a million 5-character IDs, auto-scrub them for words and other undesirable character strings (FUmom, 69Bro), and pop from that sanitized stack every time we need a new video ID.
From an API coding perspective, the API would “pop” a new video ID from the database (computationally fast) and then push that ID with the information insert exactly as you described above. The “pop” function marks that ID as “consumed” in the database so that a subsequent “pop” does not ever reuse that ID.
We could then have URLs like
https://animemusicvideos.org/video/4wCh7 or shorter
https://a-m-v.org/4wCh7
I think those are more visually pleasing than a displayed UUID like this:
Code: Select all
https://animemusicvideos.org/video/d5581502-befe-41ed-8b89-2aebf900a028
Somewhat related, user IDs should be treated similarly and also remove visible sequence integers, but still keep them in the background, kinda like this:
Member Profile:
https://www.youtube.com/@Ileia
Video Entry:
https://youtu.be/26u32u4bh3s
YouTube has a similar randomly-generated ID system for users and channels in addition to the username text value. We could create and allow that identifier for users with illegal characters in their username.
Yeah, it'd be hard to argue that uuids are aesthetic. I still think that, in general, the fewer database calls we have to make, the better.
Sticking to hex would all but eliminate bad strings (trying to filter bad strings is not trivial, so I'd rather avoid the issue).
8 characters gets us 2^32 (4 billion) combinations
The tradeoffs here are
- Still need to index this "display id" so we can query the records efficiently. If it's not sortable, we run into the same issue as UUIDv4 (poor indexing, lots of page splits)
- We need a very large id-space to avoid the birthday paradox. That, or pre-generating ids (which gets harder to generate as more ids get used) which involves another round trip to the DB. It doesn't necessarily waste space, just pre-allocates it.
- Extra logic, database calls, or a database trigger to create the "display id"
- Permissions with Spring ACLs (more on that later) gets more complicated if it's not immediately translatable to a Long. Whatever Id we use on the front-end, that's the id we're stuck with for ACLs
Looking at a few other options, it looks like people like
nanoId and
sqids
though the simplest approach would probably be to convert the UUID to hex and then base64URL encode it
019c96ae-2fc8-7be6-a8c2-417064182976
becomes
AZyWri_Ie-aowkFwZBgpdg
Phade wrote: ↑Thu Feb 19, 2026 5:36 pm
While keeping video information separate from creator information improves data structure normalization, it can also increase computational costs and complicate data integrity checks.
The Org has the requirement that every video must have a primary video owner/creator. Of the current 209,512 video entries, there are only 3,863 videos with cataloged collaborators. This means that 98.2% of all AMVs are created by a single person, though some AMVs may have missing collaborator entries (I’m willing to bet that more than 95% of all AMVs have single-person creators).
Similarly for multi-editor projects, all MEP videos must have one primary coordinator with everyone else as collaborators.
From an internal database perspective, if we wanted to pull a list of video titles and creators and we have the video creator in the videos table like it is now, only one join is needed to pull information: videos -> users. Pulling collaborators from a separate table will be quicker with a small table dataset since the vast majority of AMVs don’t have collaborators.
But if we keep all user information separate from the videos table, the system will always have to do two joins to get a basic list of information instead of one: videos -> video_users.owner -> users. It also runs the risk of orphaned video records, either on insert, update, or delete. Mitigating that risk through data integrity checks outside of the videos table further complicates CRUD interactions and increases computational expense.
Given a random video id, and
only that id, we can't possibly know if it's a single editor or multi-editor video. From both the search results and an individual video page, that second join is inevitable. A basic index on video_id (in the collaborators table) should remove concerns related to performance, and having an entry in the collaborators table for each video should get about as close to an "ideal" index as can be expected in a real environment.
Database normalization isn't really what I'm chasing here. It's more about
- keeping business logic and queries consistent
- preferring joins over multiple database queries (and the additional network round-trips and subsequent processing)
My wager is that it'll be faster and impose less computational load to have the DB manage a few joins and return a single result-set
than it would be to do the videos -> users join, do a second query for collaborators, then loop over that second result-set and merge it with the first
Phade wrote: ↑Thu Feb 19, 2026 5:36 pm
Enforcing the rule of “all videos must have an owner” is easiest to maintain if that member is directly inside the primary video information table. This will allow the database itself to enforce non-null video owners on insert and update while enforcing referential integrity with existing users. It also makes it easy to identify who can add collaborators to a MEP video, keeping the confirmation of collaboration to the person added in the collaborators table.
Fair point. "At_Least_One-to-Many" isn't really a good fit for SQL, and introduces a chicken-and-egg problem. I think the compromise here would be
- Keep the user id in the videos table where we can add a NOT NULL constraint
- But still go through the collaborators table (which we need to query anyway) to get at the user data
I'm really trying to keep authorization separate from the data. Be it deciding who can edit a video, who can add collaborators, who can edit a profile, etc...
Having that extra level of indirection, so that the actual data doesn't dictate who does/does not have permission, is a significant boon to security
Spring (the framework we're using) has a component, Spring Security, that makes it dead simple to enforce authorization. As a contrived (but still pretty close to reality) example
Code: Select all
@PreAuthorize("hasPermission(video_id, edit)")
fun addCollaborators(userIds: List<Id>): Account {
// ... is only invoked if the user calling this endpoint has the `edit` permission
// for the video_id in question
}
Access Control Entries can also be added to allow moderators/administration to add collaborators or edit the video. If there's a reason to separate the ability to edit videos from adding collaborators, there's a way to add custom permissions. We can have up to 32 permissions total (with read/write/create/delete and administration already pre-defined)
The consistency of handling that through a single interface means there's a lot less room for mistakes.
Phade wrote: ↑Thu Feb 19, 2026 5:36 pm
I agree. Some of the original tables and columns were named specifically with AMVs in mind, like “anime” as the video source table and “songs” for the audio source table. However, you are correct that even with AMVs, video sources may not always be “anime” and audio may not always be “music”.
In addition, the use of external standardized databases (for example: MusicBrainz for music, AniList for anime, IMDB for non-anime, etc), while making things easier and more consistent for users, further complicates data tracking within the system. Updating the data structures to reflect this wider availability of sources, both internal and external, should be taken into consideration with all new data structures.
Even then, sometimes “video” sources are used as “audio” sources, as it is for trailer-based AMVs. In that case, the IMDB entry for a movie would be listed as the “audio” source because of its trailer. But if there is a database of trailers, we might switch to that one so that we can be even more specific for the audio source ([movie X] came out with a teaser trailer on [date A], [date B], and [date C] with full trailers on [date D] and [date E]; this AMV audio source comes from [movie X] [date D] trailer).
But this brings up a broader question: Are we creating a system for AnimeMusicVideos.org to use, or are we creating a general system that AnimeMusicVideos.org
happens to use?
If it is for ourselves, we can be mildly stringent in the project creation and use structures that optimize how AMVs are cataloged. But if the project is open to outside audiences using and contributing to the creation of the system, we must be even more vigilant across all aspects of the project. I’m already concerned about how we will be internally doing basic tasks like functionality testing, code review, system architecture, production rollouts, external integrations, etc. If we make the system audience scope even larger, we must spend more time considering options outside of our own AMV use cases. I’m not saying this is a bad thing to do, just that it adds complexity and time to the project development.
Phade.
I was perhaps a bit over-broad, initially thinking that this system wouldn't just be for AMVs, or even only Music Videos, but to a broader set of Multimedia communities.
Going that broad would be difficult (how would media "sources" be handled for a Let's Play? or a community for music-backed semi-dynamic typography like the short-story memories from Lost Odyssey?)
So I think I'll take a step back. Let's focus on Music Videos, but drop the "music" and "anime" terminology.
AvatarMusicVideos, CartoonMusicVideos, vidding communities in general could use what we build, but any involvement from those other communities are years down the road (if it happens at all).
So with that in mind, and considering our discord call on the 21st, I'll propose the following general naming scheme:
- Videos (this one is what the site currently has, no?)
- Visuals (generically covers Anime, Manga, FMVs, Game cutscenes, Comics, etc...)
- Audios (or is Audio the plural, and Audii the singular?)
- Events
- Contests
- Playlists
- Either Members or Users
and we've agreed to use snake_case for database table and column names