Mysql upper limit for count(*) - mysql

I've got a query:
select count(*) from `table` where `something`>123
If the table has few million records, the query runs really slow even though there's an index on column something. However, in fact I'm interested in value of:
min(100000, count(*))
So is there any way to prevent MySQL from counting rows when it already found 100k? I've found something like:
select count(*) from (select 1 from `table` where `something`>123 limit 100000) as `asd`
It's much faster than count(*) if the table has a few million matching entries, but count(*) runs much faster when there are less than 100000 matches.
Is there any way to do it faster?

I don't have the points to comment, so I am posting this as an answer...
Have you tried using EXPLAIN to see if your index on something is actually being used? It sounds like this query is doing a Table Scan. Ideally, you will want to see something like "Extra: Using where; Using index".
Out of curiosity, is something a nullable field?
As an aside, perhaps the query optimizer would do better with the following:
select count(*) as cnt
from table
where something > 123
having count(*) > 100000

Might help to make better use of value range limitation.
select count(*) - (select count(*) from t where something <= 123) as cnt
from t
The other thing might be to have an update trigger counting.

Related

Is there a way to do a distinct count with a condition on the grouped data in mysql faster

I have a SQL query (MYSQL) that I would like to go faster. The general problem is to count distinct keys that has an aggregated condition on them. That is, I like to sum the values of a column in the rows with the same key value and then determine if it should be included in the count. The only solution I have come up with is to do a sub-query that do the summing and then count distinct in the outer query using having there. Like:
SELECT COUNT(DISTINCT key), sum1, sum2, categoryid
FROM
(
SELECT SUM(cnt1) AS sum1,
SUM(cnt2) AS sum2,
key,categoryid
FROM table
GROUP BY key,categoryid
) as SUBQUERY
GROUP BY categoryid
HAVING (8*sum1)/sum2 > 0;
The problem (as I see it) is that the query use a sub-query that will produce a temp table. As the data set large (10M rows, 500K distinct keys) it takes a lot of time. It looks like it should be possible to do better as a straight distinct count without the condition takes just a tenth of the time of this query and summing without grouping takes only a fraction of that.
Anyone with ideas on how to improve on performance?
Thanks in advance!
Lasse
I actually was able to cut the response time myself by moving the count distinct to the inner query. Don't know why I didn't see that earlier. Obviously makes the temp table smaller. However it is still a factor 4-5 slower than the distinct count without a condition.
The new select looks like:
SELECT dist_cnt, sum1, sum2, categoryid
FROM
(
SELECT COUNT(DISTINCT key) AS dist_cnt,
SUM(cnt1) AS sum1,
SUM(cnt2) AS sum2,
key,categoryid
FROM table
GROUP BY key,categoryid
) as SUBQUERY
WHERE (8*sum1)/sum2 > 0
GROUP BY categoryid
Anyway, I think it should be possible to get it at least a factor 2 faster.
Lasse

select count(*) taking considerably longer than select * for same "where" clause?

I am finding a select count(*) is taking considerably longer than select * for the queries with the same where clause.
The table in question has about 2.2 million records (call it detailtable). It has a foreign key field linking to another table (maintable).
This query takes about 10-15 seconds:
select count(*) from detailtable where maintableid = 999
But this takes a second or less:
select * from detailtable where maintableid = 999
UPDATE - It was asked to specify the number of records involved. It is 150.
UPDATE 2 Here is information when the EXPLAIN keyword is used.
For the SELECT COUNT(*), The EXTRA column reports:
Using where; Using index
KEY and POSSIBLE KEYS both have the foreign key constraint as their value.
For the SELECT * query, everything is the same except EXTRA just says:
Using Where
UPDATE 3 Tried OPTIMIZE TABLE and it still does not make a difference.
For sure
select count(*)
should be faster than
select *
count(*), count(field), count(primary key), count(any) are all the same.
Your explain clearly stateas that the optimizer somehow uses the index for count(*) and not for the other making the foreign key the main reason for the delay.
Eliminate the foreign key.
Try
select count(PRIKEYFIELD) from detailtable where maintableid = 999
count(*) will get all data from the table, then count the rows meaning it has more work to do.
Using the primary key field means it's using its index, and should run faster.
Thread Necro!
Crazy idea... In some cases, depending on the query planner and the table size, etc, etc., it is possible for using an index to actually be slower than not using one. So if you get your count without using an index, in some cases, it could actually be faster.
Try this:
SELECT count(*)
FROM detailtable
USING INDEX ()
WHERE maintableid = 999
SELECT count(*)
with that syntax alone is no problem, you can do that to any table.
The main issue on your scenario is the proper use of INDEX and applying [WHERE] clause on your search.
Try to reconfigure your index if you have the chance.
If the table is too big, yes it may take time. Try to check MyISAM locking article.
As the table is 2.2 million records, count can take time. As technically, MySQL should find the records and then count them. This is an extra operation that becomes significant with millions of records. The only way to make it faster is to cache the result in another table and update it behind the scenes.
Or simply Try
SELECT count(1) FROM table_name WHERE _condition;
SELECT count('x') FROM table_name WHERE _condition;

Is there a way to do this query without the sub-select?

I'm not positive but I believe sub-selects are less than optimal (speedwise?).
Is there a way to remove this sub-select from my query (I think the query is self-explanatory).
select *
from tickers
where id = (select max(id) from tickers where annual_score is not null);
Would something like:
SELECT *
FROM ticker
WHERE annual_score IS NOT NULL
ORDER BY id desc
LIMIT 1
work?
That particular sub-select shouldn't be inefficient at all. It should be run once before the main query begins.
There are a certain class of subqueries that are inefficient (those that join columns between the main query and the subquery) because they end up running the subquery for every single row of the main query.
But this shouldn't be one of them, unless MySQL is severely brain-damaged, which I doubt.
However, if you remain keen to get rid of the subquery, you can order the rows by id (descending) and only fetch the first, something like:
select * from tickers
where annual_score is not null
order by id desc
limit 0, 1
Not too familiar with MySQL, but if you want to eliminate the subquery then you could try something like this:
select *
from tickers
where annual_score is not null
order by id desc
limit 1
I don't know if this is more or less performant as MySQL is not my background.

What's the fastest way to check that entry exists in database?

I'm looking for the fastest way to check that entry exists...
All my life, I did with something like this...
SELECT COUNT(`id`) FROM `table_name`
Some people don't use COUNT(id), but COUNT(*). Is that faster?
What about LIMIT 1?
P.S. With id I meant primary key, of course.
Thanks in an advice!
In most situations, COUNT(*) is faster than COUNT(id) in MySQL (because of how grouping queries with COUNT() are executed, it may be optimized in future releases so both versions run the same). But if you only want to find if at least one row exists, you can use EXISTS
simple:
( SELECT COUNT(id) FROM table_name ) > 0
a bit faster:
( SELECT COUNT(*) FROM table_name ) > 0
much faster:
EXISTS (SELECT * FROM table_name)
If you aren't worried about accuracy, explain select count(field) from table is incredibly fast.
http://www.mysqlperformanceblog.com/2007/04/10/count-vs-countcol/
This link explains the difference between count(*) and count(field). When in doubt, count(*)
As for checking that a table is not empty...
SELECT EXISTS(SELECT 1 FROM table)

MySQL: Fastest way to count number of rows

Which way to count a number of rows should be faster in MySQL?
This:
SELECT COUNT(*) FROM ... WHERE ...
Or, the alternative:
SELECT 1 FROM ... WHERE ...
// and then count the results with a built-in function, e.g. in PHP mysql_num_rows()
One would think that the first method should be faster, as this is clearly database territory and the database engine should be faster than anybody else when determining things like this internally.
When you COUNT(*) it takes in count column indexes, so it will be the best result. MySQL with MyISAM engine actually stores row count, it doesn't count all rows each time you try to count all rows. (based on primary key's column)
Using PHP to count rows is not very smart, because you have to send data from MySQL to PHP. Why do it when you can achieve the same on the MySQL side?
If the COUNT(*) is slow, you should run EXPLAIN on the query, and check if indexes are really used, and where they should be added.
The following is not the fastest way, but there is a case, where COUNT(*) doesn't really fit - when you start grouping results, you can run into a problem where COUNT doesn't really count all rows.
The solution is SQL_CALC_FOUND_ROWS. This is usually used when you are selecting rows but still need to know the total row count (for example, for paging).
When you select data rows, just append the SQL_CALC_FOUND_ROWS keyword after SELECT:
SELECT SQL_CALC_FOUND_ROWS [needed fields or *] FROM table LIMIT 20 OFFSET 0;
After you have selected needed rows, you can get the count with this single query:
SELECT FOUND_ROWS();
FOUND_ROWS() has to be called immediately after the data selecting query.
In conclusion, everything actually comes down to how many entries you have and what is in the WHERE statement. You should really pay attention on how indexes are being used, when there are lots of rows (tens of thousands, millions, and up).
After speaking with my team-mates, Ricardo told us that the faster way is:
show table status like '<TABLE NAME>' \G
But you have to remember that the result may not be exact.
You can use it from command line too:
$ mysqlshow --status <DATABASE> <TABLE NAME>
More information: http://dev.mysql.com/doc/refman/5.7/en/show-table-status.html
And you can find a complete discussion at mysqlperformanceblog
This query (which is similar to what bayuah posted) shows a nice summary of all tables count inside a database:
(simplified version of stored procedure by Ivan Cachicatari which I highly recommend).
SELECT TABLE_NAME AS 'Table Name', TABLE_ROWS AS 'Rows' FROM information_schema.TABLES WHERE TABLES.TABLE_SCHEMA = '`YOURDBNAME`' AND TABLES.TABLE_TYPE = 'BASE TABLE';
Example:
+-----------------+---------+
| Table Name | Rows |
+-----------------+---------+
| some_table | 10278 |
| other_table | 995 |
Great question, great answers. Here's a quick way to echo the results if anyone is reading this page and missing that part:
$counter = mysql_query("SELECT COUNT(*) AS id FROM table");
$num = mysql_fetch_array($counter);
$count = $num["id"];
echo("$count");
I've always understood that the below will give me the fastest response times.
SELECT COUNT(1) FROM ... WHERE ...
If you need to get the count of the entire result set you can take following approach:
SELECT SQL_CALC_FOUND_ROWS * FROM table_name LIMIT 5;
SELECT FOUND_ROWS();
This isn't normally faster than using COUNT albeit one might think the opposite is the case because it's doing the calculation internally and doesn't send the data back to the user thus the performance improvement is suspected.
Doing these two queries is good for pagination for getting totals but not particularly for using WHERE clauses.
Try this:
SELECT
table_rows "Rows Count"
FROM
information_schema.tables
WHERE
table_name="Table_Name"
AND
table_schema="Database_Name";
I did some benchmarks to compare the execution time of COUNT(*) vs COUNT(id) (id is the primary key of the table - indexed).
Number of trials:
10 * 1000 queries
Results:
COUNT(*) is faster 7%
VIEW GRAPH: benchmarkgraph
My advice is to use: SELECT COUNT(*) FROM table
EXPLAIN SELECT id FROM .... did the trick for me. and I could see the number of rows under rows column of the result.
Perhaps you may want to consider doing a SELECT max(Id) - min(Id) + 1. This will only work if your Ids are sequential and rows are not deleted. It is however very fast.
This is the best query able to get the fastest results.
SELECT SQL_CALC_FOUND_ROWS 1 FROM `orders`;
SELECT FOUND_ROWS();
In my benchmark test: 0.448s
This query takes 4.835s
SELECT SQL_CALC_FOUND_ROWS * FROM `orders`;
SELECT FOUND_ROWS();
count * takes 25.675s
SELECT count(*) FROM `orders`;
If you don't need super-exact count, then you can set a lower transaction isolation level for the current session. Do it like this:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SELECT count(*) FROM the_table WHERE ...;
COMMIT; /* close the transaction */
It is also good to have an index that matches the WHERE condition.
It really speeds up the counting for big InnoDB tables. I checked it on a table with ~700M rows and heavy load, it works. It reduced the query time from ~451 seconds to ~2 seconds.
I took the idea from this answer: https://stackoverflow.com/a/918092/1743367
A count(*) statement with a where condition on the primary key returned the row count much faster for me avoiding full table scan.
SELECT COUNT(*) FROM ... WHERE <PRIMARY_KEY> IS NOT NULL;
This was much faster for me than
SELECT COUNT(*) FROM ...
I handled tables for the German Government with sometimes 60 million records.
And we needed to know many times the total rows.
So we database programmers decided that in every table is record one always the record in which the total record numbers is stored. We updated this number, depending on INSERT or DELETE rows.
We tried all other ways. This is by far the fastest way.