Storing the order of videos that are in a playlist - mysql

I am working on a simple video-database with a playlist feature. In such a playlist, videos can be placed in a user-specified order.
So I thought I assign a number_in_playlist to each video_id. The problem with this is, if say video 19 is lateron moved to a position between video 2 and 3, then additionally the number_in_playlist of all the videos inbetween needs to be updated.
Now that strongly reminds me of Array vs Linked List. So I thought a linked list would solve that problem, i.e storing something like previous_video_id_in_playlist and next_video_id_in_playlist for each video record. However, in that case I am not sure how to fetch (in order) all videos that are in the playlist?
This must be a problem that others have encountered before, so I was wondering if there is a standard recommended solution?
PS: I am using MySQL and I very much prefer short, fast queries (which I think speaks against the linked list solution?)

If you make your playlist.number_in_playlist column a double, then you can start by sequencing your videos with whole numbers. When an item in a playlist is moved to a new position, you set the new number_in_playlist value to the (probably fractional) number that is half-way between the preceeding and following videos. This lets you move videos around for a very long time before you ever have to worry about resequencing your whole playlist.
The trigger for resequencing is when your new calculated is equal to one of your end points (i.e. the same value as the preceeding or following video). For practical purposes this will happen very, very rarely unless your users spend more time resequencing videos than watching them.

can you not do something like:
SELECT *
FROM videos
WHERE playlist_id = 1
ORDER BY next_video_id_in_playlist ASC

Is the list usually not too long? Is write performance not a problem? In this case I'd just use the number_in_playlist solution. On every write all numbers need to be updated basically.
Linked lists in a relational database smell like they will cause unforseeable problems. Like cycles caused by bugs.

Related

Best solution to count occurence of words in database

I'm going to scrape a forums new threads page for each word appearing in the titles of the threads to make a sort of popularity trends (like Google Trends). I've found a way to scrape it but I don't know how I should store it in the database for optimal performance. I thought of two different ways.
Store each word that is new in a row and if the word isn't new, add one count to the "occurrences" field.
Store each word in a own row, no matter what.
Is there any other solutions to this problem?
If you are going through the trouble of scraping, you should be keeping multiple levels of information.
First, keep track of each forum title that you encounter, along with the date of the posting (and of your finding it) as well as other information. You can put a full text index on the forum title, which will give you nice capabilities for finding similar versions of the same word ("database" and "databases").
Second, store each word separately in a table along with the date and time of the posting (or of your finding it) and a link back to the posting table. The value of Google trends is not that it keeps a gross count of words ever. It is that you can break it down over time.
Then, do the aggregation in a query. If you have performance issues, you can partition the data by date, so most queries will only read a subset of the data. If the summaries are highly used, then you can consider summarization on a batch basis, say once per night.
Finally, how are you going to deal with different versions of the word appearing over time? WIth misspellings? Which multiple appearances of the same word in one title?
Idea #1 is the most compact, and should generally be the fastest. Check out INSERT/ON DUPLICATE KEY, using a unique key on the word and the date.
Idea #2 becomes important if you're storing other data than just the word, like the id of the forum thread, etc.
Good luck.

Sunspot: How to implement a Search Result Hierarchy?

I`m currently working on implementing Solr through Sunspot in a Rails project.
Looking at documentation I don`t see how I would implement a hierarchy of search results, by that I mean:
All users that match the query & have profile pictures should be
displayed first.
All users that match the query & don`t have a profile picture should
be displayed underneath.
And so on... .
I would appreciate any guidance or references on how to implement such a system.
If you want to display users with profile pictures first and the ones who don't later -
you can use sorting with sortMissingLast, this will cause all the records which do not have any value to appear last.
else have a default value for the records not having a value so that they appear last when sorted.
I've heard this request many times over the years, and it doesn't work quite like people expect. The worst case behavior is pretty bad and pretty common.
You may not want to do exactly that. As soon as you include a common term, like "Jr", you will have to show thousands of results with pictures before the first profile without a picture, even if that one is the right result.
This will happen more often than you expect, because common names are, well, common, so they show up in queries a lot and match a lot of documents. This may happen for your most common queries. Oops.
Instead, boost results with a quality factor. If there are two "Joe Smith" profiles, the one with the picture is better and should be shown first. You can do this with the "boost" parameter of the edismax result handler. If a profile has a photo, use a boost of 2, otherwise a boost of 1. You may have to play with the exact values to get what you want.

What is the best way... considering performance

I have around 5,00,000 users in my table and each user is associated with some books(has_many)
I want to display all the users along with their books ....
I wont be displaying all the users in the same page, they would be paginated.
What is the best way to do this, keeping performance,database hits in the mind. What all things needs to be considered while dealing with large records.
It would be unreasonable to display all the users and their books on the same page. There are, I believe, two possible approaches to solving this:
You can have a index page for the users where you list all the users. Corresponding to each user you can have a "show" page where you display the user's books. This would greatly simplify the resulting database queries as you need to load only the users for the index page, and load only one user's books on his/her show page. That means no complex joins and not a lot of data each time.
If you really want to show multiple users and their books on the same page, then, like someone mentioned in the comments above, you need to use pagination, say load 5 users per page. However, to add to that, you would also need to use eager loading as that could easily turn into an N + 1 problem. You could read more in "Eager loading of associations".
Going back to the first approach, you could even use pagination in that as well; For example, in listing the users or even the books for a user.
Queries with large offsets are inefficient in MySQL; When evaluating a query with an offset of 100000, MySQL has to actually find those 100,000 rows and discard them before it can find the ten rows you end up displaying.
One way around this is to give your application hints: Rather than saying page 10000, say that it's the page where id > x, if you were sorting in primary key order.
It's also crucial that you have appropriate indexes.
There's a good article called "Efficient Pagination Using MySQL" on percona.com with a variety of approaches for paginating through large sets.
Few thoughts -
MySQL is smart enough to handle those many records. You may read all users and their books in a single query and then display them as you may wish. However, showing those many records on a single web page will impact the response time.
Hence comes pagination - read limited records per page. Though this will mean a SQL query per page but still optimized. And of course you can use some query caching.
A better option could be to show an alphabetical list of users, not necessarily A-Z but also like AB, AC, AD and so on. That way your visitors can directly jump to a particular list. Consider adding pagination to it if the number of users in a given list is too large.
I'm not sure how important is it for your website to show latest updates ASAP, but you may also think on generating XML files, as many as you seem necessary, for example alphabetical and generate your web pages from the XML files. You may update those XML files once every 24h. So, minimum DB load.
And please consider building a search because navigating through those many users could be discouraging.
Hope it helps!

MySQL Column Unification, any performance improvements?

I'm designing a MySQL table for an authentication system for a high-traffic personal website. Every time a user comment, article, etc is displayed the following fields will be needed:
login
User Display
User Bio ( A little signature )
Website Account
YouTube Account
Twitter Account
Facebook Account
Lastfm Account
So everything is in one table to prevent the need to call sub-tables. So my question is:
¿Would there be any improvements if I combine Website, Youtube, Twitter, Facebook and Lastfm columns to one?
For example:
[website::something.com][youtube::youtube.com/something]
No, combining these columns would not result in any improvement. Indeed it seems you would extend the overall length (with the adding of prefix and separators, hence potentially worsening performance.
A few other tricks however, may help:
reduce the size of the values stored in "xxxAccount" columns, by removing altogether, or replacing with short-hand codes, the most common parts of these values (the examples shown indicate some kind of URL whereby the beginning will likely be repeated.
depending on the average length of the bio, and typical text found therein, it may also be useful to find ways of shrinking its [storage] size, with simple replacement of common words, or possibly with actual compression (ZIP and such), although doing so may result in having to store the column in a BLOB column which may then become separated from the table, depending on the server implementation/configuration.
And, of course, independently form any improvements at the level of the database, the use model indicated seems to prompt for caching this kind of data agressively, to avoid the trick to SQL altogether.
Well i dont think so , think of it this way .. you will need some way to split them and that would require additional processing and then why not just have one field in the whole table and have everything in that? :) Dont worry about the performance it would be better with separate columns

MySQL - Saving items

This is a follow up from my last question: MySQL - Best method to saving and loading items
Anyways, I've looked at some other examples and sources, and most of them have the same method of saving items. Firstly, they delete all the rows that's already inserted into the database containing the character's reference, then they insert the new rows accordingly to the current items that the character has.
I just wanted to ask if this is a good way, and if it would cause a performance hit if i were to save 500 items per each character or so. If you have a better solution, please tell me!
Thanks in advance, AJ Ravindiran.
It would help if you talked about your game so we could get a better idea of your data requirements.
I'd say it depends. :)
Are the slot/bank updates happening constantly as the person plays, or just when the person savles their game and leaves. Also does the order of the slots really matter for the bank slots? Constantly deleting and inserting 500 records certainly can have a performance hit, but there may be a better way to do it, possibly you could just update the 500 records without deleting them. Possibly your first idea of 0=4151:54;1=995:5000;2=521:1;
wasn't SO bad. If the database is only being used for storing that information, and the game itself is managing that information once its loaded. But if you might want to use it for other things like "What players have item X", or "What is the total value of items in Player Ys bank". Then storing it like that won't allow you to ask the database, it would have to be computed by the game.