Efficient DB2 query pagination and show total pages? - mysql

This post shows some hacks to page data from DB2:
How to query range of data in DB2 with highest performance?
However it does not provide a way to show the total number of rows (like MySQL's CALC_FOUND_ROWS).
SELECT SQL_CALC_FOUND_ROWS thread_id AS id, name, email
FROM threads WHERE email IS NOT NULL
LIMIT 20 OFFSET 200
And in MySQL I can follow that up with
SELECT FOUND_ROWS()
to get the total number of rows. The first part is fairly easy to duplicate with recent versions of DB2. I can't find any results on Google for a reasonable equivalent to the second query (I don't want temp tables, subqueries, or other absurdly inefficient solutions).

I don't think this exists in DB2.
Note that the total number of rows is a value that needs extra calculation to obtain. It isn't just lying around somewhere--it would have to be specifically built into the LIMIT logic. Which it doesn't look like they did.

Related

What is the fastest way to count the number of MySql rows left after a limited results query

If I have a mysql limited query:
SELECT * FROM my_table WHERE date > '2020-12-12' LIMIT 1,16;
Is there a faster way to check and see how many results are left after my limit?
I was trying to do a count with limit, but that wasn't working, i.e.
SELECT count(ID) AS count FROM my_table WHERE date > '2020-12-12' LIMIT 16,32;
The ultimate goal here is just to determine if there ARE any other rows to be had beyond the current result set, so if there is another faster way to do this that would be fine too.
It's best to do this by counting the rows:
SELECT count(*) AS count FROM my_table WHERE date > '2020-12-12'
That tells you how many total rows match the condition. Then you can compare that to the size of the result you got with your query using LIMIT. It's just arithmetic.
Past versions of MySQL had a function FOUND_ROWS() which would report how many rows would have matched if you didn't use LIMIT. But it turns out this had worse performance than running two queries, one to count rows and one to do your limit. So they deprecated this feature.
For details read:
https://www.percona.com/blog/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/
https://dev.mysql.com/worklog/task/?id=12615
(You probably want OFFSET 0, not 1.)
It's simple to test whether there ARE more rows. Assuming you want 16 rows, use 1 more:
SELECT ... WHERE ... ORDER BY ... LIMIT 0,17
Then programmatically see whether it returned only 16 rows (no more available) or 17 (there ARE more).
Because it is piggybacking on the fetch you are already doing and not doing much extra work, it is very efficient.
The second 'page' would use LIMIT 16, 17; 3rd: LIMIT 32,17, etc. Each time, you are potentially getting and tossing an extra row.
I discuss this and other tricks where I point out the evils of OFFSET: Pagination
COUNT(x) checks x for being NOT NULL. This is [usually] unnecessary. The pattern COUNT(*) (or COUNT(1)) simply counts rows; the * or 1 has no significance.
SELECT COUNT(*) FROM t is not free. It will actually do a full index scan, which is slow for a large table. WHERE and ORDER BY are likely to add to that slowness. LIMIT is useless since the result is always 1 row. (That is, the LIMIT is applied to the result, not to the counting.)

MySQL / PHP query performance?

Im in a dilema on which one of these methods are most efficient.
Suppose you have a query joining multiple tables and querying thousand of records. Than, you gotta get the total to paginate throughout all these results.
Is it faster to?
1) Do a complete select (suppose you have to select 50's columns), count the rows and than run another query with limits? (Will the MySQL cache help this case already selecting all the columns you need on the first query used to count?)
2) First do the query using COUNT function and than do the query to select the results you need.
3) Instead of using MySQL COUNT function, do the query selecting the ID's for example and use the PHP function mysql_num_rows?
I think the number 2 is the best option, using MySQL built in COUNT function, but I know MySQL uses cache, so, selecting all the results on first query gonna be faster?
Thanks,
Have a look at Found_Rows()
A SELECT statement may include a LIMIT clause to restrict the number
of rows the server returns to the client. In some cases, it is
desirable to know how many rows the statement would have returned
without the LIMIT, but without running the statement again. To obtain
this row count, include a SQL_CALC_FOUND_ROWS option in the SELECT
statement, and then invoke FOUND_ROWS() afterward:
mysql> SELECT SQL_CALC_FOUND_ROWS * FROM tbl_name WHERE id > 100 LIMIT 10;
mysql> SELECT FOUND_ROWS();
The second SELECT returns a number indicating how many rows the first SELECT`
would have returned had it been written without the LIMIT clause.
My guess is number 2, but the truth is that it will depend entirely on data size, tables, indexing, MySql version etc.
The only way of finding the answer to this is to try each one and measure how long they take. But like I say, my hunch would be number 2.

Questions on how to randomly Query multiple rows from Mysql without using "ORDER BY RAND()"

I need to query the MYSQL with some condition, and get five random different rows from the result.
Say, I have a table named 'user', and a field named 'cash'. I can compose a SQL like:
SELECT * FROM user where cash < 1000 order by RAND() LIMIT 5.
The result is good, totally random, unsorted and different from each other, exact what I want.
But I got from google that the efficiency is bad when the table get large because MySQL creates a temporary table with all the result rows and assigns each one of them a random sorting index. The results are then sorted and returned.
Then I go on searching and got a solution like:
SELECT * FROM `user` AS t1 JOIN (SELECT ROUND(RAND() * ((SELECT MAX(id) FROM `user`)- (SELECT MIN(id) FROM `user`))+(SELECT MIN(id) FROM `user`)) AS id) AS t2 WHERE t1.id >= t2.id AND cash < 1000 ORDER BY t1.id LIMIT 5;
This method uses JOIN and MAX(id). And the efficiency is better than the first one according to my testing. However, there is a problem. Since I also needs a condition "cash<1000" and if the the RAND() is so big that no row behind it has the cash<1000, then no result will return.
Anyone has good idea of how to compose the SQL that has have the same effect as the first one but has better efficiency?
Or, shall I just do simple query in MYSQL and let PHP randomly pick 5 different rows from the query result?
Your help is appreciated.
To make first query faster, just SELECT id - that will make the temporary table rather small (it will contain only IDs and not all fields of each row) and maybe it will fit in memory (temp table with text/blob are always created on-disk for example). Then when you get a result, run another query SELECT * FROM xy WHERE id IN (a,b,c,d,...). As you mentioned this approach is not very efficient, but as a quick fix this modification will make it several times faster.
One of the best approaches seems to be getting the total number of rows, choosing random numbers and for each result run a new query SELECT * FROM xy WHERE abc LIMIT $random,1. It should be quite efficient for random 3-5, but not good if you want 100 random rows each time :)
Also consider caching your results. Often you don't need different random rows to be displayed on each page load. Generate your random rows only once per minute. If you will generate the data for example via cron, you can live also with query which takes several seconds, as users will see the old data while new data are being generated.
Here are some of my bookmarks for this problem for reference:
http://jan.kneschke.de/projects/mysql/order-by-rand/
http://www.titov.net/2005/09/21/do-not-use-order-by-rand-or-how-to-get-random-rows-from-table/

Best way to count rows from mysql database

After facing a slow loading time issue with a mysql query, I'm now looking the best way to count rows numbers. I have stupidly used mysql_num_rows() function to do this and now realized its a worst way to do this.
I was actually making a Pagination to make pages in PHP.
I have found several ways to count rows number. But I'm looking the faster way to count it.
The table type is MyISAM
So the question is now
Which is the best and faster to count -
1. `SELECT count(*) FROM 'table_name'`
2. `SELECT TABLE_ROWS
FROM INFORMATION_SCHEMA.TABLES WHERE table_schema = 'database_name'
AND table_name LIKE 'table_name'`
3. `SHOW TABLE STATUS LIKE 'table_name'`
4. `SELECT FOUND_ROWS()`
If there are others better way to do this, please let me know them as well.
If possible please describe along with the answer- why it is best and faster. So I could understand and can use the method based on my requirement.
Thanks.
Quoting the MySQL Reference Manual on COUNT
COUNT(*) is optimized to return very quickly if the SELECT retrieves
from one table, no other columns are retrieved, and there is no WHERE
clause. For example:
mysql> SELECT COUNT(*) FROM student;
This optimization applies only to
MyISAM tables only, because an exact row count is stored for this
storage engine and can be accessed very quickly. For transactional
storage engines such as InnoDB, storing an exact row count is more
problematic because multiple transactions may be occurring, each of
which may affect the count.
Also read this question
MySQL - Complexity of: SELECT COUNT(*) FROM MyTable;
I would start by using SELECT count(*) FROM 'table_name' because it is the most portable, easiset to understand, and because it is likely that the DBMS developers optimise common idiomatic queries of this sort.
Only if that wasn't fast enough would I benchmark the approaches you list to find if any were significantly faster.
It's slightly faster to count a constant:
select count('x') from table;
When the parser hits count(*) it has to go figure out what all the columns of the table are that are represented by the * and get ready to accept them inside the count().
Using a constant bypasses this (albeit slight) column checking overhead.
As an aside, although not faster, one cute option is:
select sum(1) from table;
I've looked around quite a bit for this recently. it seems that there are a few here that I'd never seen before.
Special needs: This database is about 6 million records and is getting crushed by multi-insert queries all the time. Getting a true count is difficult to say the least.
SELECT TABLE_ROWS FROM INFORMATION_SCHEMA.TABLES WHERE table_schema = 'admin_worldDomination' AND table_name LIKE 'master'
Showing rows 0 - 0 ( 1 total, Query took 0.0189 sec)
This is decent, Very fast but inaccurate. Showed results from 4 million to almost 8 million rows
SELECT count( * ) AS counter FROM `master`
No time displayed, took 8 seconds real time. Will get much worse as the table grows. This has been killing my site previous to today.
SHOW TABLE STATUS LIKE 'master'
Seems to be as fast as the first, no time displayed though. Offers lots of other table information, not much of it is worth anything though (avg record length maybe).
SELECT FOUND_ROWS() FROM 'master'
Showing rows 0 - 29 ( 4,824,232 total, Query took 0.0004 sec)
This is good, but an average. Closer spread than others (4-5 million) so I'll probably end up taking a sample from a few of these queries and averaging.
EDIT: This was really slow when doing a query in php, ended up going with the first. Query runs 30 times quickly and I take an average, under 1 second ... it' still ranges between 5.3 & 5.5 million
One idea I had, to throw this out there, is to try to find a way to estimate the row count. Since it's just to give your user an idea of the number of pages, maybe you don't need to be exact and could even say Page 1 of ~4837 or Page 1 of about 4800 or something.
I couldn't quickly find an estimate count function, but you could try getting the table size and dividing by a determined/constant avg row size. I don't know if or why getting the table size from TABLE STATUS would be faster than getting the rows from TABLE STATUS.

Which is a less expensive query count(id) or order by id

I'd like to know which of the followings would execute faster in MySQL database. The table would have 200 - 1000 entries.
SELECT id
from TABLE
order by id desc
limit 1
or
SELECT count(id)
from TABLE
The story is the Table is cached. So this query is to be executed every time before cache retrieval to determine whether the cache data is invalid by comparing the previous value.
So if there exists a even less expensive query, please kindly let me know. Thanks.
If you
start from 1
never have any gaps
use the InnoDB engine
id is not nullable
Then the 2nd could run [ever so marginally] faster due to not having to visit table data at all (count is stored in metadata).
Otherwise,
if the table has NO index on ID (causing a SCAN), the 2nd one is faster
Barring both the above
the first one is faster
And if you actually meant to ask SELECT .. LIMIT 1 vs SELECT MAX(id).. then the answer is actually that they are the same for MySQL and most sane DBMS, whether or not there is an index.
I think, the first query will run faster, as the query is limited to be executed for one row only, 200-1000 may not matter that much in this case.
As already pointed out in the comments, your table is so small it really doesn't what your solution will be. For this reason the select count(id) should be used as it expresses the intent and doesn't need any further processing.
Now select count(id) comes with an alternative select count(*). These two are not synonyms. select count(*) will count the number of rows and use a cached value if possible when select count(id) counts the number of non null values of the column id exists. If the id columns is set as not null then the cached row count may be used.
The selection between count(*) and count(id) depends once again on your intent. In the general case, count(*) describes the intent better.
The there is the possibility of count(1) which is actually a synonym of count(*) when using mysql but the interpretation may vary if end up using a different RDBMS.
The performance of each type of count also varies depending on whether you are using MyISAM or InnoDB. The row counts are cached on the former but not on the latter, if I've understood correctly.
In the end, you should rely on query plans and running tests and measuring their performance rather than these general ramblings.