Count from a table, but stop counting at a certain number - mysql

Is there a way in MySQL to COUNT(*) from a table where if the number is greater than x, it will stop counting there? Basically, I only want to know if the number of records returned from a query is more or less than a particular number. If it's more than that number, I don't really care how many rows there are, if it's less, tell me the count.
I've been able to fudge it like this:
-- let x be 100
SELECT COUNT(*) FROM (
SELECT `id` FROM `myTable`
WHERE myCriteria = 1
LIMIT 100
) AS temp
...but I was wondering if there was some handy built-in way to do this?
Thanks for the suggestions, but I should have been more clear about the reasons behind this question. It's selecting from a couple of joined tables, each with tens of millions of records. Running COUNT(*) using an indexed criteria still takes about 80 seconds, running one without an index takes about 30 minutes or so. It's more about optimising the query rather than getting the correct output.

SELECT * FROM WhateverTable WHERE WhateverCriteria
LIMIT 100, 1
LIMIT 100, 1 returns 101th record, if there is one, or no record otherwise. You might be able to use the above query as a sub-query in EXIST clauses, if that helps.

I can't think of anything. It looks to me like what you're doing exactly fulfils the purpose, and SQL certainly doesn't seem to go out of its way to make it easy for you to do this more succinctly.
Consider this: What you're trying to do doesn't make sense in a strict context of set arithmetic. Mathematically, the answer is to count everything and then take the MIN() with 100, which is what you're (pragmatically and with good reason) trying to avoid.

This works:
select count(*) from ( select * from stockinfo s limit 100 ) s
but was not any faster (that I could tell) from just:
select count(*) from stockinfo
which returned 5170965.

Related

What is the fastest way to count the number of MySql rows left after a limited results query

If I have a mysql limited query:
SELECT * FROM my_table WHERE date > '2020-12-12' LIMIT 1,16;
Is there a faster way to check and see how many results are left after my limit?
I was trying to do a count with limit, but that wasn't working, i.e.
SELECT count(ID) AS count FROM my_table WHERE date > '2020-12-12' LIMIT 16,32;
The ultimate goal here is just to determine if there ARE any other rows to be had beyond the current result set, so if there is another faster way to do this that would be fine too.
It's best to do this by counting the rows:
SELECT count(*) AS count FROM my_table WHERE date > '2020-12-12'
That tells you how many total rows match the condition. Then you can compare that to the size of the result you got with your query using LIMIT. It's just arithmetic.
Past versions of MySQL had a function FOUND_ROWS() which would report how many rows would have matched if you didn't use LIMIT. But it turns out this had worse performance than running two queries, one to count rows and one to do your limit. So they deprecated this feature.
For details read:
https://www.percona.com/blog/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/
https://dev.mysql.com/worklog/task/?id=12615
(You probably want OFFSET 0, not 1.)
It's simple to test whether there ARE more rows. Assuming you want 16 rows, use 1 more:
SELECT ... WHERE ... ORDER BY ... LIMIT 0,17
Then programmatically see whether it returned only 16 rows (no more available) or 17 (there ARE more).
Because it is piggybacking on the fetch you are already doing and not doing much extra work, it is very efficient.
The second 'page' would use LIMIT 16, 17; 3rd: LIMIT 32,17, etc. Each time, you are potentially getting and tossing an extra row.
I discuss this and other tricks where I point out the evils of OFFSET: Pagination
COUNT(x) checks x for being NOT NULL. This is [usually] unnecessary. The pattern COUNT(*) (or COUNT(1)) simply counts rows; the * or 1 has no significance.
SELECT COUNT(*) FROM t is not free. It will actually do a full index scan, which is slow for a large table. WHERE and ORDER BY are likely to add to that slowness. LIMIT is useless since the result is always 1 row. (That is, the LIMIT is applied to the result, not to the counting.)

Does MySQL not use LIMIT to optimize query select functions?

I've got a complex query I have to run in an application that is giving me some performance trouble. I've simplified it here. The database is MySQL 5.6.35 on CentOS.
SELECT a.`po_num`,
Count(*) AS item_count,
Sum(b.`quantity`) AS total_quantity,
Group_concat(`web_sku` SEPARATOR ' ') AS web_skus
FROM `order` a
INNER JOIN `order_item` b
ON a.`order_id` = b.`order_key`
WHERE `store` LIKE '%foobar%'
LIMIT 200 offset 0;
The key part of this query is where I've placed "foobar" as a placeholder. If this value is something like big_store, the query takes much longer (roughly 0.4 seconds in the query provided here, much longer in the query I'm actually using) than if the value is small_store (roughly 0.1 seconds in the query provided). big_store would return significantly more results if there were not limit.
But there is a limit and that's what surprises me. Both datasets have more than the LIMIT, which is only 200. It appears to me that MySQL performing the select functions COUNT, SUM, GROUP_CONCAT for all big_store/small_store rows and then applies the LIMIT retroactively. I would imagine that it'd be best to stop when you get to 200.
Could it not do the select functions COUNT, SUM, GROUP_CONCAT actions after grabbing the 200 rows it will use, making my query much much quicker? This seems feasible to me except in cases where there's an ORDER BY on one of those rows.
Does MySQL not use LIMIT to optimize a query select functions? If not, is there a good reason for that? If so, did I make a mistake in my thinking above?
It can stop short due to the LIMIT, but that is not a reasonable query since there is no ORDER BY.
Without ORDER BY, it will pick whatever 200 rows it feels like and stop short.
With an ORDER BY, it will have to scan the entire table that contains store (please qualify columns with which table they come from!). This is because of the leading wildcard. Only then can it trim to 200 rows.
Another problem -- Without a GROUP BY, aggregates (SUM, etc) are performed across the entire table (or at least those that remain after filtering). The LIMIT does not apply until after that.
Perhaps what you are asking about is MariaDB 5.5.21's "LIMIT_ROWS_EXAMINED".
Think of it this way ... All of the components of a SELECT are done in the order specified by the syntax. Since LIMIT is last, it does not apply until after the other stuff is performed.
(There are a couple of exceptions: (1) SELECT col... must be done after FROM ..., since it would not know which table(s); (2) The optimizer readily reorders JOINed table and clauses in WHERE ... AND ....)
More details on that query.
The optimizer peeks ahead, and sees that the WHERE is filtering on order (that is where store is, yes?), so it decides to start with the table order.
It fetches all rows from order that match %foobar%.
For each such row, find the row(s) in order_item. Now it has some number of rows (possibly more than 200) with which to do the aggregates.
Perform the aggregates - COUNT, SUM, GROUP_CONCAT. (Actually this will probably be done as it gathers the rows -- another optimization.)
There is now 1 row (with an unpredictable value for a.po_num).
Skip 0 rows for the OFFSET part of the LIMIT. (OK, another out-of-order thingie.)
Deliver up to 200 rows. (There is only 1.)
Add ORDER BY (but no GROUP BY) -- big deal, sort the 1 row.
Add GROUP BY (but no ORDER BY) in, now you may have more than 200 rows coming out, and it can stop short.
Add GROUP BY and ORDER BY and they are identical, then it may have to do a sort for the grouping, but not for the ordering, and it may stop at 200.
Add GROUP BY and ORDER BY and they are not identical, then it may have to do a sort for the grouping, and will have to re-sort for the ordering, and cannot stop at 200 until after the ORDER BY. That is, virtually all the work is performed on all the data.
Oh, and all of this gets worse if you don't have the optimal index. Oh, did I fail to insist on providing SHOW CREATE TABLE?
I apologize for my tone. I have thrown quite a few tips in your direction; please learn from them.

Laravel Eloquent Query - limit on a count()

I need to know if a table has more than 50 rows fitting some criteria. So, the raw query should look like this:
SELECT COUNT(id) FROM (SELECT id FROM table WHERE {conditions} LIMIT 50)
but I'm having trouble doing this through eloquent. This is what I've tried so far....
card::where( ... )->limit(50)->count("id");
... but this doesn't work. It doesn't make the subquery for the limit, so the limit becomes useless.
Without running a subquery that is limited, the query takes up to 10 times longer..... I'm afraid it won't be as scalable as I need.
I did eventually come up with a solution to this, I just didn't post it (until now). You can just grab the ID of the 50th record to see if it exists:
$atLeastFifty = Card::offset(50)->value("id");
if(!empty($atLeastFifty)){
// there are at least 50 records
}else{
// there are less than 50 records
}
This is way faster than a count() when there are tons of records in the table.
If you just want to count number of columns, use count() without arguments:
$numberOfRows = Card::where(....)->take(50)->count();

How to return only 10 rows via SQL query

I have 1 query that returns over 180k rows. I need to make a slight change, so that it returns only about 10 less.
How do I show only the 10 rows as a result?
I've tried EXCEPT but it seems to return a lot more than just the 10.
You can use LIMIT. This will show first n rows. Example:
SELECT * FROM Orders LIMIT 10
If you are trying to make pagination add OFFSET. It will return 10 rows starting from row 20. Example:
SELECT * FROM Orders LIMIT 10 OFFSET 20
MySQL doesn't support EXCEPT (to my knowledge).
Probably the most efficient route would be to incorporate the two WHERE clauses into one. I say efficient in the sense of "Do it that way if you're going to run this query in a regular report or production application."
For example:
-- Query 1
SELECT * FROM table WHERE `someDate`>'2016-01-01'
-- Query 2
SELECT * FROM table WHERE `someDate`>'2016-01-10'
-- Becomes
SELECT * FROM table WHERE `someDate` BETWEEN '2016-01-01' AND '2016-01-10'
It's possible you're implying that the queries are quite complicated, and you're after a quick (read: not necessarily efficient) way of getting the difference for a one-off investigation.
That being the case, you could abuse UNION and a sub-query:
(Untested, treat as pseudo-SQL...)
SELECT
*
FROM (
SELECT * FROM table WHERE `someDate`>'2016-01-01'
UNION ALL
SELECT * FROM table WHERE `someDate`>'2016-01-10'
) AS sub
GROUP BY
`primaryKey`
HAVING
COUNT(1) = 1;
It's ugly though. And not efficient.
Assuming that the only difference is only that one side (I'll call it the "right hand side") is missing records that the left includes, you could LEFT JOIN the two queries (as subs) and filter to right-side-is-null. But that'd be dependent on all those caveats being true.
Temporary tables can be your friend - especially given they're so easily created (and can be indexed):
CREATE TEMPORARY TABLE tmp_xyz AS SELECT ... FROM ... WHERE ...;

How to dynamically add SELECT statements if not enough results?

I am querying a table for a set of data that may or may not have enough results for the correct operation of my page.
Typically, I would just broaden the range of my SELECT statement to insure enough results, but in this particular case, we need to start with as small of a range of results as possible before expanding (if we need to).
My goal, therefore, is to create a query that will search the db, determine if it got a sufficient amount of rows, and continue searching if it didn't. In PHP something like this would be very simple, but I can't figure out how to dynamically add select statements in a query based off of a current count of rows.
Here is a crude illustration of what I'd like to do - in a SINGLE query:
SELECT *, COUNT(`id`) FROM `blogs` WHERE `date` IS BETWEEN '2011-01-01' AND '2011-01-02' LIMIT 25
IF COUNT(`id`) < 25 {
SELECT * FROM `blogs` WHERE `date` IS BETWEEN '2011-01-02' AND '2011-01-03' LIMIT 25
}
Is this possible to do with a single query?
You have 2 possible solutions:
Compare the count on the programming language side. And if there is not enough - perform one more query. (it is not as bad as you think: query cache, memcached, proper indexes, enough memory on server, etc - there are a lot of possibilities to improve performance)
Create stored procedure