Implementing MySQL search results pagination - mysql

This ought to be a fairly common problem. I could not find a solution in the stackoverflow questions database. I am not sure I did the right search.
I run a MySQL, CGI/Perl site. May be a 1000 hits a day. A user can search the database for the website. The where clause can become quite lengthy. I display 10 items per page and give the user links to go to the next and previous pages. Currently, I do a new search, every time the user clicks on 'prev' or 'next' page link. I use
LIMIT num-rows-to-fetch OFFSET num-rows-to-skip
along with the query statement. However, response time is way too much for subsequent searches. This can only get worse as I add more users. I am trying to see if I can implement this in a better way.
If you can give me some pointers, I would appreciate it very much.

If you don't mind using Javascript, you should check DataTables. This way you send all rows to the client, and pagination is done on client side.
If it is not an option, then you could try using mod_perl or CGI::Session in order to save the query result between page querys, so you will not need to query mysql again and again.

you could try analyzing the query to find out which part causes the most trouble for the database. If there are joins, you could create indexes that include both the key fields and possible filter fields used in the query. But that could, depending on the database layout and size, create a lot of indexes. And if the query can change significantly you may still be at a loss if you did not create an index for a particular case. In any case i would at first analyze the query, trying to find out what makes the query so slow. If there is one major cause for the bad performance you could try throwing indexes at it.
Another way would be to cache the search results in memory and do pagination from there, avoiding the database roundtrip at all. Depending on your database size you might want to limit the search results to a reasonable number.

I have never used this module, so cannot vouch for it's performance, but have a look at DBI::ResultPager

Related

Is filtered selection faster than fetching all the rows and then filtering

So I want to create a table in the frontend where I will list every single user. The thing is that the tables are relational and I have to get data from multiple tables in order to fulfill my goal.
Now here comes my question (keep in mind I have a MySQL database) :
Which method is better on the long run :
Generate joined queries that fetch all the data from each table where a user has any information (it outputs ~80 column per row and only 15 of them are needed)
Fetch the data that I need with multiple queries and then just "stick" the values together and output them (15 columns and all of them are needed, but I have to do extra work)
I would suggest you to go for third option.
Generate joined queries that fetch only necessary 15 columns for your front end. It would be the most efficient way.
If you are facing challenges with joining the tables then you can share table structures with sample data and desired output here with your query. We can try to help you achieve your goal.
This is a bit long for a comment.
I don't understand your first option. Why would you be selecting columns that you don't need? If there are 15 columns that you specifically want, then select those columns and nothing else.
In general, it is faster to have the database do most of the work. It can take advantage of its optimizer to produce the best execution plan that it can.
From Experience with embedded hardware mysql server.
If the hardware can do it and has enough resources you let the databse server run it course, as it can run its optimizer.
But if the server hardware lags on some fronts, you transpport all data to the client and let it run Javascript on all returned data.
The same goes for bandwith of the internet connection, it is slow, you want lesser number of rows, to transport because that the user will notice it, even old smartphones have to much power in cpu, amd can so handle everything with easy what you through at them.
In Basic there is no sime answer, you have to check server hardware and the usual bandwith offered and then program a solution that works best
A simple Rule of Thumb:
Fewer round-trips to the database server is usually the faster alternative.

What is the Optimized way to Paginate Active Record Objects with Filter?

I want to display the Users list in pagination with my rails API, However I have few constraints here before displaying the users I want to check users who have access to the view files, Here is the code:
def verified_client
conditions = {}
conditions[:user_name] = fetch_verified_users_with_api_call # returns[user_1,user_2, ....]
#users = User.where(conditions).where('access NOT LIKE ?', 'admin_%').ordered
will_paginate(#users, params[:page])
end
Q1) Is there a way where I don't have to make sql call when users try to fetch subsequent pages(page 2, page 3.. page n)?
Q2) What would happen when verified_users list return million on items? I suspect the SQL will fail
I could have used limit and offset with the Query, but I will not know the total result and page size to achieve the same I have to fire one more SQL call to get count and write up own logic to get number of pages.
Generated SQL:
select *
from users
where user_name IN (user_1, user_2 .... user_10000)
AND (access NOT LIKE 'admin_%')
That query is hard to optimize. It probably does essentially all the work for each page and there is no good way to prevent this scan. Adding these may help:
INDEX(access)
INDEX(user, access)
I have seen 70K items in an IN list, but I have not heard of 1M. What is going on? Would it be shorter to say which users are not included? Could there be another table with the user list? (Sometimes a JOIN works better than IN, especially if you have already run a Select to get the list.)
Could the admins be filtered out of the IN list before building this query? Then,
INDEX(user)
is likely to be quite beneficial.
Is there at most one row per user? If so, then pagination can be revised to be very efficient. This is done by "remembering where you left off" instead of using OFFSET. More: http://mysql.rjweb.org/doc.php/pagination
Q1) Is there a way where I don't have to make sql call when users try
to fetch subsequent pages(page 2, page 3.. page n)?
The whole idea of pagination is that you make the query faster by returning a small subset of the total number of records. In most cases the number of requests for the first page will vastly outnumber the other pages so this could very well be a case of premature optimization that might do more harm then good.
If is actually a problem its better adressed with SQL caching, ETags or other caching mechanisms - not by loading a bunch of pages at once.
Q2) What would happen when verified_users list return million on items? I suspect the SQL will fail
Your database or application will very likely slow to a halt and then crash when it runs out of memory. Exactly what happens depends on your architecture and how grumpy your boss is on that given day.
Q1) Is there a way where I don't have to make sql call when users try to fetch subsequent pages(page 2, page 3.. page n)?
You can get the whole result set and store it in your app. As far as the database is concerned this is not slow or non-optimal. Then performance including memory is your app's problem.
Q2) What would happen when verified_users list return million on items? I suspect the SQL will fail
What will happen is all those entries will be concatenated in the SQL string. There is likely a maximum SQL string size and a million entries would be too much.
A possible solution is if you have a way to identify the verified users in the database and do a join with that table.
What is the Optimized way to Paginate Active Record Objects with Filter?
The three things which are not premature optimizations with databases is (1) use indexed queries not table scans, (2) avoid correlated sub-queries, and (3) reduce network turns.
Make sure you have an index it can use, in particular for the order. So make sure you know what order you are asking for.
If instead of the access field starting with a prefix if you had a field to indicate an admin user you can make an index with the first field as that admin field and the second field as what you are ordering by. This allows the database to sort the records efficiently, especially important when paging with offset and limit.
As for network turns you might want to use paging and not worry about network turns. One idea is to prefetch the next page if possible. So after it gets the results of page 1, query for page 2. Hold the page 2 results until viewed, but when viewed then get the results for page 3.

How does pagination of results in databases work?

This is a general question that applies to MySQL, Oracle DB or whatever else might be out there.
I know for MySQL there is LIMIT offset,size; and for Oracle there is 'ROW_NUMBER' or something like that.
But when such 'paginated' queries are called back to back, does the database engine actually do the entire 'select' all over again and then retrieve a different subset of results each time? Or does it do the overall fetching of results only once, keeps the results in memory or something, and then serves subsets of results from it for subsequent queries based on offset and size?
If it does the full fetch every time, then it seems quite inefficient.
If it does full fetch only once, it must be 'storing' the query somewhere somehow, so that the next time that query comes in, it knows that it has already fetched all the data and just needs to extract next page from it.
In that case, how will the database engine handle multiple threads? Two threads executing the same query?
I am very confused :(
I desagree with #Bill Karwin. First of all, do not make assumptions in advance whether something will be quick or slow without taking measurements, and complicate the code in advance to download 12 pages at once and cache them because "it seems to me that it will be faster".
YAGNI principle - the programmer should not add functionality until deemed necessary.
Do it in the simplest way (ordinary pagination of one page), measure how it works on production, if it is slow, then try a different method, if the speed is satisfactory, leave it as it is.
From my own practice - an application that retrieves data from a table containing about 80,000 records, the main table is joined with 4-5 additional lookup tables, the whole query is paginated, about 25-30 records per page, about 2500-3000 pages in total. Database is Oracle 12c, there are indexes on a few columns, queries are generated by Hibernate.
Measurements on production system at the server side show that an average time (median - 50% percentile) of retrieving one page is about 300 ms. 95% percentile is less than 800 ms - this means that 95% of requests for retrieving a single page is less that 800ms, when we add a transfer time from the server to the user and a rendering time of about 0.5-1 seconds, the total time is less than 2 seconds. That's enough, users are happy.
And some theory - see this answer to know what is purpose of Pagination pattern
Yes, the query is executed over again when you run it with a different OFFSET.
Yes, this is inefficient. Don't do that if you have a need to paginate through a large result set.
I'd suggest doing the query once, with a large LIMIT — enough for 10 or 12 pages. Then save the result in a cache. When the user wants to advance through several pages, then your application can fetch the 10-12 pages you saved in the cache and display the page the user wants to see. That is usually much faster than running the SQL query for each page.
This works well if, like most users, your user reads only a few pages and then changes their query.
Re your comment:
By cache I mean something like Memcached or Redis. A high-speed, in-memory key/value store.
MySQL views don't store anything, they're more like a macro that runs a predefined query for you.
Oracle supports materialized views, so that might work better, but querying the view would have the overhead of interpreting an SQL query.
A simpler in-memory cache should be much faster.

possible alternative to LIKE clause in specific situation

I have a web service which returns results to a jquery auto-complete.
For the query I must use four full blown like clauses
LIKE('%SOME_TERM%')
The reason being that the users need to be able to return results from sub strings as well as proper words. I have also tried full-text indexes with their many options in this case in both Natural and Boolean mode but they just does not work as well and their results leave a lot to be desired in this case.
The database is highly optimized with indexes and even with the LIKE clauses returning results from a query with multiple joins and with one of the tables having 200,000 rows takes ~ 0.2/0.3 seconds on first run. Once its cached by the server then obviously the time taken is miniscule.
I was wondering if there is anything else that would be worth trying here. I had looked at some standalone search providers but I'm a little tight time-wise on this project(nearly done and ready to launch) so can't afford any large setups or large-scale refactoring time and funding wise.
Its possible that it's as good as it gets but no harm in letting SO have its say is my attitude.
I think apache solr is they way to go for you. For full text searches.
http://lucene.apache.org/solr/
But like you said, if you don't have much time left, and you are sure your queries are performing at their best. I don't see much you can do.

How do I use EXPLAIN to *predict* performance of a MySQL query?

I'm helping maintain a program that's essentially a friendly read-only front-end for a big and complicated MySQL database -- the program builds ad-hoc SELECT queries from users' input, sends the queries to the DB, gets the results, post-processes them, and displays them nicely back to the user.
I'd like to add some form of reasonable/heuristic prediction for the constructed query's expected performance -- sometimes users inadvertently make queries that are inevitably going to take a very long time (because they'll return huge result sets, or because they're "going against the grain" of the way the DB is indexed) and I'd like to be able to display to the user some "somewhat reliable" information/guess about how long the query is going to take. It doesn't have to be perfect, as long as it doesn't get so badly and frequently out of whack with reality as to cause a "cry wolf" effect where users learn to disregard it;-) Based on this info, a user might decide to go get a coffee (if the estimate is 5-10 minutes), go for lunch (if it's 30-60 minutes), kill the query and try something else instead (maybe tighter limits on the info they're requesting), etc, etc.
I'm not very familiar with MySQL's EXPLAIN statement -- I see a lot of information around on how to use it to optimize a query or a DB's schema, indexing, etc, but not much on how to use it for my more limited purpose -- simply make a prediction, taking the DB as a given (of course if the predictions are reliable enough I may eventually switch to using them also to choose between alternate forms a query could take, but, that's for the future: for now, I'd be plenty happy just to show the performance guesstimates to the users for the above-mentioned purposes).
Any pointers...?
EXPLAIN won't give you any indication of how long a query will take.
At best you could use it to guess which of two queries might be faster, but unless one of them is obviously badly written then even that is going to be very hard.
You should also be aware that if you're using sub-queries, even running EXPLAIN can be slow (almost as slow as the query itself in some cases).
As far as I'm aware, MySQL doesn't provide any way to estimate the time a query will take to run. Could you log the time each query takes to run, then build an estimate based on the history of past similar queries?
I think if you want to have a chance of building something reasonably reliable out of this, what you should do is build a statistical model out of table sizes and broken-down EXPLAIN result components correlated with query processing times. Trying to build a query execution time predictor based on thinking about the contents of an EXPLAIN is just going to spend way too long giving embarrassingly poor results before it gets refined to vague usefulness.
MySQL EXPLAIN has a column called Key. If there is something in this column, this is a very good indication, it means that the query will use an index.
Queries that use indicies are generally safe to use since they were likely thought out by the database designer when (s)he designed the database.
However
There is another field called Extra. This field sometimes contains the text using_filesort.
This is very very bad. This literally means MySQL knows that the query will have a result set larger than the available memory, and therefore will start to swap the data to disk in order to sort it.
Conclusion
Instead of trying to predict the time a query takes, simply look at these two indicators. If a query is using_filesort, deny the user. And depending on how strict you want to be, if the query is not using any keys, you should also deny it.
Read more about the resultset of the MySQL EXPLAIN statement