mysql> SELECT COUNT(*) vs SHOW TABLE STATUS for row count - mysql

We have a table in our database that has teens of millions of entries (10.1.21-MariaDB; InnoDB table engine; Windows OS). We are able to get the number of rows in the table instantaneously using the command SHOW TABLE STATUS LIKE 'my_table_name'. However, SELECT COUNT(*) FROM my_table_name takes a few minutes to complete.
Q) Why is SHOW TABLE STATUS LIKE 'my_table_name' so so much quicker than SELECT COUNT(*) FROM my_table_name?

Because one is a query that counts all the rows and the other is a command that retrieves stats the DB engine maintains about the table. There isn't any firm guarantee that the table statistic will be up to date so the only way to get an accurate count is to count the rows, but it might be that you don't need it to be perfectly accurate all the time. You can thus choose either, depending on your desire for accuracy vs speed etc.
See here this screenshot from https://pingcap.com/docs/stable/sql-statements/sql-statement-show-table-status/
You can see the example inserts 5 rows but the table stats are out of date and the table still reports 0 rows. Running ANALYZE TABLE will (probably) take longer than counting the rows, but the stats will be up to date (for a while at least) afterwards.

A suggested approach to get a reasonably accurate count of the table size, when SELECT COUNT(*) is taking a long time to complete, could be:
ANALYZE TABLE my_table_name; SHOW TABLE STATUS LIKE 'my_table_name';
This comes in especially handy when importing a large amount of data into a table, and you want to track the progress of the import process.

Related

Why is SELECT COUNT(*) much slower than SELECT * even without WHERE clause in MySQL?

I have a view that has a duration time of ~0.2 seconds when I do a simple SELECT * from it, but has a duration time of ~25 seconds when I do simply SELECT COUNT(*) from it. What would cause this? It seems like if it takes 0.2 seconds to compute the output data then it could run a simple length calculation on that dataset in a trivial amount of time. MySQL 5.7. Details below.
mysql> select count(*) from Lots;
+----------+
| count(*) |
+----------+
| 4136666 |
+----------+
1 row in set (25.29 sec)
In MySQL workbench, the following query produces durations like: 0.217 sec
select * from Lots;
The fetch time is significant given the amount of data, but my understanding is the "Duration" is how long it takes to compute the output dataset of the view.
Definition of Lots view:
select
lot.*,
coalesce(overrides.streetNumber, address.streetNumber, lot.rawStreetNumber) as streetNumber,
coalesce(overrides.street, address.street, lot.rawStreet) as street,
coalesce(overrides.postalCode, address.postalCode, lot.rawPostalCode) as postalCode,
coalesce(overrides.city, address.city, lot.rawCity) as city
from LotsData lot
left join Address address on address.lotNumber = lot.lotNumber
left join Override overrides on overrides.lotId = lot.lotNumber
The data in VIEW objects isn't materialized. That is, it doesn't exist in any sort of tabular form in your database server. Rather, the server pulls it together from its tables when a query (like your COUNT query) references the VIEW. So, there's no simple metadata hanging around in the server that can satisfy your COUNT query instantaneously. The server has to pull together all your joined tables to generate a row count. It takes a while. Remember, your database server may have other clients concurrently INSERTing or DELETEing rows to one or more of the tables in your view.
It's worse than that. In the InnoDB storage engine, even COUNTing the rows of a table is slow. To achieve high concurrency InnoDB doesn't attempt to store any kind of precise row count. So the database server has to count those rows one-by-one as well. (The older MyISAM storage engine does maintain precise row count metadata for tables, but it offers less concurrency.)
Wise data programmers avoid using COUNT(*) on whole tables or views composed from them in production for those reasons.
The real question is why your SELECT * FROM view is so fast. It's unlikely that your database server can compose and deliver a 4-megarow view from its JOINs in less than a second, nor is it likely that Workbench can absorb that many rows in that time. Like #ysth said, many GUI-based SQL client programs, like Workbench and HeidiSQL, sometimes silently append something like LIMIT 1000 to interactive operations calling for the display of whole tables or views. You might look for evidence of that.

count(*) using left join taking long time to respond

I have a MySQL query for showing the count of data in the Application listing page.
query
SELECT COUNT(*) FROM cnitranr left join cnitrand on cnitrand.tx_no=cnitranr.tx_no
Explain screen shot
Indexes on cnitranr
tx_no (primary )approx 1 crore of data[ENGINE MYISAM]
index on cnitrand
(tx_no secondary)approx 2 crore of data[ENGINE MYISAM]
Profiler output is like this
Can anyone suggest possibilities in optimizing this query or may i want to run a crone job for counting the count .Please help.
You would need to implement a materialized view.
Since MySQL does not support them directly, you would need to create a table like that:
CREATE TABLE totals (cnt INT)
and write a trigger on both tables that would increment and decrement cnt on INSERT, UPDATE and DELETE to each of the tables.
Note that if you have a record with many linked records in either table, the DML affecting such a record would be slow.
On large data volumes, you very rarely need exact counts, especially for pagination. As I said in a comment above, Google, Facebook etc. only show approximate numbers on paginated results.
It's very unlikely that a person would want to browse through 20M+ records on page only able to show 100 or so.

Efficient DB2 query pagination and show total pages?

This post shows some hacks to page data from DB2:
How to query range of data in DB2 with highest performance?
However it does not provide a way to show the total number of rows (like MySQL's CALC_FOUND_ROWS).
SELECT SQL_CALC_FOUND_ROWS thread_id AS id, name, email
FROM threads WHERE email IS NOT NULL
LIMIT 20 OFFSET 200
And in MySQL I can follow that up with
SELECT FOUND_ROWS()
to get the total number of rows. The first part is fairly easy to duplicate with recent versions of DB2. I can't find any results on Google for a reasonable equivalent to the second query (I don't want temp tables, subqueries, or other absurdly inefficient solutions).
I don't think this exists in DB2.
Note that the total number of rows is a value that needs extra calculation to obtain. It isn't just lying around somewhere--it would have to be specifically built into the LIMIT logic. Which it doesn't look like they did.

Running count and count distinct on many rows (tens of thousands)

I'm trying to run this query:
SELECT
COUNT(events.event_id) AS total_events,
COUNT(matches.fight_id) AS total_matches,
COUNT(players.fighter_id) AS total_players,
COUNT(DISTINCT events.organization) AS total_organizations,
COUNT(DISTINCT players.country) AS total_countries
FROM
events, matches, players
These are table details:
Events = 21k
Players = 90k
Matches = 155k
All of those are uniques, so the query's first 3 things will be those numbers. The other two values should be total_organizations, where the organization column is in the events (should return couple hundred), and total_countries should count distinct countries using country column in players table (also couple hundred).
All three of those ID columns are unique and indexed.
This query as it stands now takes forever. I never even have patience to see it complete. Is there a faster way of doing this?
Also, I need this to load these results on every page load, so should I just put this query in some hidden file, and set a cron job to run every midnight or something and populate a "totals" table or something so I can retrieve it from that table quickly?
Thanks!
First, remove the unnecessary join here; it's preventing most (if not all) of your indexes from being used. You want three different queries:
SELECT
COUNT(events.event_id) AS total_events,
COUNT(DISTINCT events.organization) AS total_organizations
FROM
events;
SELECT
COUNT(matches.fight_id) AS total_matches
FROM
matches;
SELECT
COUNT(players.fighter_id) AS total_players,
COUNT(DISTINCT players.country) AS total_countries
FROM
players;
This should go a long way to improving the performance of these queries.
Now, consider adding these indexes:
CREATE INDEX "events_organization" ON events (organization);
CREATE INDEX "players_country" ON events (country);
Compare the EXPLAIN SELECT ... results before and after adding these indexes. They might help and they might not.
Note that if you are using the InnoDB storage engine then all table rows will be visited anyway, to enforce transactional isolation. In this case, indexes will only be used to determine which table rows to visit. Since you are counting the entire table, the indexes will not be used at all.
If you are using MyISAM, which does not fully support MVCC, then COUNT() queries should be able to execute using only index cardinality, which will result in nearly instant results. This is possible because transactions are not supported on MyISAM, which means that isolation becomes a non-issue.
So if you are using InnoDB, then you may wind up having to use a cronjob to create a cache of this data anyway.

Best way to count rows from mysql database

After facing a slow loading time issue with a mysql query, I'm now looking the best way to count rows numbers. I have stupidly used mysql_num_rows() function to do this and now realized its a worst way to do this.
I was actually making a Pagination to make pages in PHP.
I have found several ways to count rows number. But I'm looking the faster way to count it.
The table type is MyISAM
So the question is now
Which is the best and faster to count -
1. `SELECT count(*) FROM 'table_name'`
2. `SELECT TABLE_ROWS
FROM INFORMATION_SCHEMA.TABLES WHERE table_schema = 'database_name'
AND table_name LIKE 'table_name'`
3. `SHOW TABLE STATUS LIKE 'table_name'`
4. `SELECT FOUND_ROWS()`
If there are others better way to do this, please let me know them as well.
If possible please describe along with the answer- why it is best and faster. So I could understand and can use the method based on my requirement.
Thanks.
Quoting the MySQL Reference Manual on COUNT
COUNT(*) is optimized to return very quickly if the SELECT retrieves
from one table, no other columns are retrieved, and there is no WHERE
clause. For example:
mysql> SELECT COUNT(*) FROM student;
This optimization applies only to
MyISAM tables only, because an exact row count is stored for this
storage engine and can be accessed very quickly. For transactional
storage engines such as InnoDB, storing an exact row count is more
problematic because multiple transactions may be occurring, each of
which may affect the count.
Also read this question
MySQL - Complexity of: SELECT COUNT(*) FROM MyTable;
I would start by using SELECT count(*) FROM 'table_name' because it is the most portable, easiset to understand, and because it is likely that the DBMS developers optimise common idiomatic queries of this sort.
Only if that wasn't fast enough would I benchmark the approaches you list to find if any were significantly faster.
It's slightly faster to count a constant:
select count('x') from table;
When the parser hits count(*) it has to go figure out what all the columns of the table are that are represented by the * and get ready to accept them inside the count().
Using a constant bypasses this (albeit slight) column checking overhead.
As an aside, although not faster, one cute option is:
select sum(1) from table;
I've looked around quite a bit for this recently. it seems that there are a few here that I'd never seen before.
Special needs: This database is about 6 million records and is getting crushed by multi-insert queries all the time. Getting a true count is difficult to say the least.
SELECT TABLE_ROWS FROM INFORMATION_SCHEMA.TABLES WHERE table_schema = 'admin_worldDomination' AND table_name LIKE 'master'
Showing rows 0 - 0 ( 1 total, Query took 0.0189 sec)
This is decent, Very fast but inaccurate. Showed results from 4 million to almost 8 million rows
SELECT count( * ) AS counter FROM `master`
No time displayed, took 8 seconds real time. Will get much worse as the table grows. This has been killing my site previous to today.
SHOW TABLE STATUS LIKE 'master'
Seems to be as fast as the first, no time displayed though. Offers lots of other table information, not much of it is worth anything though (avg record length maybe).
SELECT FOUND_ROWS() FROM 'master'
Showing rows 0 - 29 ( 4,824,232 total, Query took 0.0004 sec)
This is good, but an average. Closer spread than others (4-5 million) so I'll probably end up taking a sample from a few of these queries and averaging.
EDIT: This was really slow when doing a query in php, ended up going with the first. Query runs 30 times quickly and I take an average, under 1 second ... it' still ranges between 5.3 & 5.5 million
One idea I had, to throw this out there, is to try to find a way to estimate the row count. Since it's just to give your user an idea of the number of pages, maybe you don't need to be exact and could even say Page 1 of ~4837 or Page 1 of about 4800 or something.
I couldn't quickly find an estimate count function, but you could try getting the table size and dividing by a determined/constant avg row size. I don't know if or why getting the table size from TABLE STATUS would be faster than getting the rows from TABLE STATUS.