I have a MySQL 5.0 database with a few tables containing over 50M rows. But how do I know this? By running "SELECT COUNT(1) FROM foo", of course. This query on one table containing 58.8M rows took 10 minutes to complete!
mysql> SELECT COUNT(1) FROM large_table;
+----------+
| count(1) |
+----------+
| 58778494 |
+----------+
1 row in set (10 min 23.88 sec)
mysql> EXPLAIN SELECT COUNT(1) FROM large_table;
+----+-------------+-------------------+-------+---------------+----------------------------------------+---------+------+-----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+-------+---------------+----------------------------------------+---------+------+-----------+-------------+
| 1 | SIMPLE | large_table | index | NULL | fk_large_table_other_table_id | 5 | NULL | 167567567 | Using index |
+----+-------------+-------------------+-------+---------------+----------------------------------------+---------+------+-----------+-------------+
1 row in set (0.00 sec)
mysql> DESC large_table;
+-------------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+---------------------+------+-----+---------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| created_on | datetime | YES | | NULL | |
| updated_on | datetime | YES | | NULL | |
| other_table_id | int(11) | YES | MUL | NULL | |
| parent_id | bigint(20) unsigned | YES | MUL | NULL | |
| name | varchar(255) | YES | | NULL | |
| property_type | varchar(64) | YES | | NULL | |
+-------------------+---------------------+------+-----+---------+----------------+
7 rows in set (0.00 sec)
All of the tables in question are InnoDB.
Any ideas why this is so slow, and how I can speed it up?
Counting all the rows in a table is a very slow operation; you can't really speed it up, unless you are prepared to keep a count somewhere else (and of course, that can become out of sync).
People who are used to MyISAM tend to think that they get count(*) "for free", but it's not really. MyISAM cheats by not having MVCC, which makes it fairly easy.
The query you're showing is doing a full index scan of a not-null index, which is generally the fastest way of counting the rows in an innodb table.
It is difficult to guess from the information you've given, what your application is, but in general, it's ok for users (etc) to see close approximations of the number of rows in large tables.
If you need to have the result instantly and you don't care if it's 58.8M or 51.7M, you can find out the approximate number of rows by calling
show table status like 'large_table';
See the column rows
For more information about the result take a look at the manual at http://dev.mysql.com/doc/refman/5.1/en/show-table-status.html
select count(id) from large_table will surely run faster
Related
I have a table with 54k rows, contains 10G of data
I am running this update query on it:
UPDATE my_table SET blog_object_version='19'
takes more than 1 hour to run,
how can I improve performance?
additional information:
I am running on AMAZON rds, db.m5.4xlarge
this is my instance:
this is what I see in the aws performance insights:
wait/io/file/innodb/innodb_data_file
I do not have any other queries running on my db:
mysql> show processlist;
+----+----------+---------------------+----------+---------+------+----------+----------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+----------+---------------------+----------+---------+------+----------+----------------------------------------------+
| 3 | rdsadmin | localhost:65182 | NULL | Sleep | 0 | | NULL |
| 4 | rdsadmin | localhost | NULL | Sleep | 1 | | NULL |
| 6 | admin | 123.45.67.890:6170 | my_table | Query | 3901 | updating | UPDATE my_table SET blog_object_version='19' |
| 12 | admin | 123.45.67.890:6360 | NULL | Sleep | 2981 | | NULL |
| 18 | admin | 123.45.67.890:7001 | NULL | Query | 0 | starting | show processlist |
+----+----------+---------------------+----------+---------+------+----------+----------------------------------------------+
and this is my table:
mysql> show create table my_table\G;
*************************** 1. row ***************************
Table: my_table
Create Table: CREATE TABLE `my_table` (
`index` int(11) NOT NULL AUTO_INCREMENT,
`id` varchar(100) DEFAULT NULL,
`user_id` varchar(50) NOT NULL,
`associate_object_id` varchar(50) NOT NULL,
`type` varchar(50) DEFAULT NULL,
`creation_date` datetime DEFAULT NULL,
`version_id` varchar(50) NOT NULL,
`blog_object` longtext,
`blog_object_version` varchar(100) DEFAULT NULL,
`last_update` datetime DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`index`),
UNIQUE KEY `id_user_id_version_id` (`id`,`user_id`,`version_id`) USING BTREE,
KEY `user_id_associate_object_id` (`user_id`,`associate_object_id`),
KEY `user_id_associate_object_id_version_id` (`user_id`,`associate_object_id`,`version_id`)
) ENGINE=InnoDB AUTO_INCREMENT=54563151 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
these are my indexes:
mysql> SHOW INDEX FROM my_table;
+----------+------------+----------------------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+----------------------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| my_table | 0 | PRIMARY | 1 | index | A | 43915 | NULL | NULL | | BTREE | | |
| my_table | 0 | id_user_id_version_id | 1 | id | A | 3659 | NULL | NULL | YES | BTREE | | |
| my_table | 0 | id_user_id_version_id | 2 | user_id | A | 8783 | NULL | NULL | | BTREE | | |
| my_table | 0 | id_user_id_version_id | 3 | version_id | A | 43915 | NULL | NULL | | BTREE | | |
| my_table | 1 | user_id_associate_object_id | 1 | user_id | A | 378 | NULL | NULL | | BTREE | | |
| my_table | 1 | user_id_associate_object_id | 2 | associate_object_id | A | 4391 | NULL | NULL | | BTREE | | |
| my_table | 1 | user_id_associate_object_id_version_id | 1 | user_id | A | 385 | NULL | NULL | | BTREE | | |
| my_table | 1 | user_id_associate_object_id_version_id | 2 | associate_object_id | A | 6273 | NULL | NULL | | BTREE | | |
| my_table | 1 | user_id_associate_object_id_version_id | 3 | version_id | A | 43915 | NULL | NULL | | BTREE | | |
+----------+------------+----------------------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Very basic issue, with a very basic solution:
INDEX(blog_object_version)
Why? Without this index, the UPDATE must read every one of the 54K (or 54M?) rows to check for '19'.
With that index, only the relevant rows need to be read.
Tips:
Many of the VARCHAR columns sound like they should be INT (or something smaller, like SMALLINT)? (Changing the types is not likely to speed up the query.)
Toss user_id_associate_object_id; the index user_id_associate_object_id_version_id handles the same things.
Update all rows
Updating up to 1K rows is reasonable. Updating less than 20% of the table will probably use the index if it is suitable.
But... If you need to update all of 54K rows, there are a couple of issues.
It will take a long time and probably a lot of disk space because both the old and new copies are held on to until the Update is finished. (This is so that it can commit or rollback the entire Update atomically.)
Generally, it is "poor design" to ever need to update a column in all rows of an entire table. Sometimes, it may be possible put the column in another table in a single row. Then it is a one-row query to update blog_object_version. But it means doing a JOIN when you need it in a SELECT. (This may not be a problem.) And if you aren't changing all the rows, then it is messier.
So,... If you decide to update "a lot of" (or all of) a big table, I recommend doing it in chunks of 100-1000 rows each. More details: http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks
Change Buffer
Another issue (less important) is that when updating an a non-unique indexed column, the index needs updating. That requires modifications to the BTree that represents the INDEX. For non-unique indexes, this is done in the background, mostly after the query is commited.
There is no risk of having an incorrect index before it finishes updating the BTree. This is because of the "Change buffer". That construct keeps pending index updates for later persisting to disk.
With this statement:
UPDATE my_table SET blog_object_version='19'
All records needs to be fetched, checked and updated. Because there is not WHERE clause.
If only some records needs to be updated (because the other ones already have blog_object_version='19' then you might see a (small) improvement if you do:
UPDATE my_table SET blog_object_version='19' WHERE blog_object_version != '19'
Because with his statement only the records that need a change are being updated, but still all records needs to be fetched.
If not all records have a blog_object_version whic is unequal to '19', then adding an index on this field might improve things, because then only thos records needs to be fetched with a blob_object_version unequal to '19'.
If all records need to have the update, then this will not improve anything ...
I have table ip_per_nat_vlan, innodb format. When I give truncate table, table is empty.
Then I have php script, which fill data into this table.
When is this script finished without errors (simple insert statemets) situation is following:
select * from ip_per_nat_vlan;
Empty set (0.00 sec)
.
select count(*) from ip_per_nat_vlan;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)
.
show table status;
+----------------------------------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+------------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+----------------------------------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+------------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+---------+
| ip_per_nat_vlan | InnoDB | 10 | Dynamic | 141291 | 100 | 14172160 | 0 | 6832128 | 25165824 | 143563 | 2017-12-24 16:26:40 | 2018-06-13 09:01:33 | NULL | utf8_unicode_ci | NULL |
MySQL says, that there should be 14172160 rows, but I dont see any. Where could be a problem? Transactions? But I dont see any running thread and no any fault.
Thank you. D
Structure of table is:
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| ipAddress | varchar(255) | NO | UNI | NULL | |
| nat | int(11) | NO | | NULL | |
| vlan | int(11) | NO | | NULL | |
| district | varchar(255) | YES | | NULL | |
| idOblasti | int(11) | YES | | NULL | |
| type | varchar(255) | NO | | NULL | |
| macAddress | varchar(255) | NO | | NULL | |
+------------+--------------+------+-----+---------+----------------+
There are various ways to "count" rows in a table.
The normal way. Just count them.
select count(*) as table_rows from table_name ;
Accuracy: 100% accurate count at the time of the query is run.
using the information_schema tables
select table_rows
from information_schema.tables
where table_schema = 'database_name'
and table_name = 'table_name' ;
Accuracy: Only an approximation. If the table is the target of frequent inserts and deletes, the result can be way off the actual count. This can be improved by running ANALYZE TABLE more often.
Efficiency: Very good, it doesn't touch the table at all.
As count option is 100% accurate, your table doesn't contain any data.
Check your code and default commit option of MySQL.
Looks like you are inserting rows, but not committing them, check your index length.
Check more details here
https://dba.stackexchange.com/questions/151769/mysql-difference-between-using-count-and-information-schema-tables-for-coun
First thing, I am not sure how mysql run this line and produce the result
select count() from ip_per_nat_vlan
count() will return [Err] 1064.
count(*) or else a field name should be mentioned inside.
The table is in InnoDB table. Here is some information that might be helpful.
EXPLAIN SELECT COUNT(*) AS y0_ FROM db.table this_ WHERE this_.id IS NOT NULL;
+----+-------------+-------+-------+---------------+---------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+--------------------------+
| 1 | SIMPLE | this_ | index | PRIMARY | PRIMARY | 8 | NULL | 4711235 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+---------+--------------------------+
1 row in set (0.00 sec)
mysql> DESCRIBE db.table;
+--------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+-------+
| id | bigint(20) | NO | PRI | NULL | |
| id2 | varchar(28) | YES | | NULL | |
| photo | longblob | YES | | NULL | |
| source | varchar(10) | YES | | NULL | |
| file_name | varchar(120) | YES | | NULL | |
| file_type | char(1) | YES | | NULL | |
| created_date | datetime | YES | | NULL | |
| updated_date | datetime | YES | | NULL | |
| createdby | varchar(50) | YES | | NULL | |
| updatedby | varchar(50) | YES | | NULL | |
+--------------+--------------+------+-----+---------+-------+
10 rows in set (0.05 sec)
The explain query gives me the result right there. But the actual query has been running for quite a while. How can I fix this? What am I doing wrong?
I basically need to figure out how many photos there are in this table. Initially the original coder had a query which checked WHERE photo IS NOT NULL (which took 3hours+) but I changed this query to check the id column as it is a primary key. I expected a huge performance gain there and was expecting an answer in under a second but that seems to not be the case.
What sort of optimizations on the database do I need to do? I think the query is fine but feel free to correct me if I am wrong.
Edit: mysql Ver 14.14 Distrib 5.1.52, for redhat-linux-gnu (x86_64) using readline 5.1
P.S: I renamed the tables for some crazy reason. I don't actually have the database named db and the table in question named table.
How long is 'long'? How many rows are there in this table?
A MyISAM table keeps track of how many rows it has, so a simple COUNT(*) will always return almost instantly.
InnoDB, on the other hand works differently: an InnoDB table doesn't keep track of how many rows it has, and so when you COUNT(*), it literally has to go and count each row. If you have a large table, this can take a number of seconds.
EDIT: Try COUNT(ID) instead of COUNT(*), where ID is an indexed column that has no NULLs in it. That may run faster.
EDIT2: If you're storing the binary data of the files in the longblob, your table will be massive, which will slow things down.
Possible solutions:
Use MyISAM instead of InnoDB.
Maintain your own count, perhaps using triggers on inserts and deletes.
Strip out the binary data into another table, or preferably regular files.
I think that my question can be solved by just knowing how, for example, stackoverflow works.
For example, this page, loads in a few ms (< 300ms):
https://stackoverflow.com/questions?page=61440&sort=newest
The only query i can think about for that page is something like SELECT * FROM stuff ORDER BY date DESC LIMIT {pageNumber}*{stuffPerPage}, {pageNumber}*{stuffPerPage}+{stuffPerPage}
A query like that might take several seconds to run, but the stack overflow page loads almost in no time. It can't be a cached query, since that question are posted over time and rebuild the cache every time a question is posted is simply madness.
So, how do this works in your opinion?
(to make the question easier, let's forget about the ORDER BY)
Example (the table is fully cached in ram and stored in an ssd drive)
mysql> select * from thread limit 1000000, 1;
1 row in set (1.61 sec)
mysql> select * from thread limit 10000000, 1;
1 row in set (16.75 sec)
mysql> describe select * from thread limit 1000000, 1;
+----+-------------+--------+------+---------------+------+---------+------+----------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------+
| 1 | SIMPLE | thread | ALL | NULL | NULL | NULL | NULL | 64801163 | |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------+
mysql> select * from thread ORDER BY thread_date DESC limit 1000000, 1;
1 row in set (1 min 37.56 sec)
mysql> SHOW INDEXES FROM thread;
+--------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| thread | 0 | PRIMARY | 1 | newsgroup_id | A | 102924 | NULL | NULL | | BTREE | | |
| thread | 0 | PRIMARY | 2 | thread_id | A | 47036298 | NULL | NULL | | BTREE | | |
| thread | 0 | PRIMARY | 3 | postcount | A | 47036298 | NULL | NULL | | BTREE | | |
| thread | 0 | PRIMARY | 4 | thread_date | A | 47036298 | NULL | NULL | | BTREE | | |
| thread | 1 | date | 1 | thread_date | A | 47036298 | NULL | NULL | | BTREE | | |
+--------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
5 rows in set (0.00 sec)
Create a BTREE index on date column and the query will run in a breeze.
CREATE INDEX date ON stuff(date) USING BTREE
UPDATE: Here is a test I just did:
CREATE TABLE test( d DATE, i INT, INDEX(d) );
Filled the table with 2,000,000 rows with different unique is and ds
mysql> SELECT * FROM test LIMIT 1000000, 1;
+------------+---------+
| d | i |
+------------+---------+
| 1897-07-22 | 1000000 |
+------------+---------+
1 row in set (0.66 sec)
mysql> SELECT * FROM test ORDER BY d LIMIT 1000000, 1;
+------------+--------+
| d | i |
+------------+--------+
| 1897-07-22 | 999980 |
+------------+--------+
1 row in set (1.68 sec)
And here is an interesiting observation:
mysql> EXPLAIN SELECT * FROM test ORDER BY d LIMIT 1000, 1;
+----+-------------+-------+-------+---------------+------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------+
| 1 | SIMPLE | test | index | NULL | d | 4 | NULL | 1001 | |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------+
mysql> EXPLAIN SELECT * FROM test ORDER BY d LIMIT 10000, 1;
+----+-------------+-------+------+---------------+------+---------+------+---------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+----------------+
| 1 | SIMPLE | test | ALL | NULL | NULL | NULL | NULL | 2000343 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+---------+----------------+
MySql does use the index for OFFSET 1000 but not for 10000.
Even more interesting, if I do FORCE INDEX query takes more time:
mysql> SELECT * FROM test FORCE INDEX(d) ORDER BY d LIMIT 1000000, 1;
+------------+--------+
| d | i |
+------------+--------+
| 1897-07-22 | 999980 |
+------------+--------+
1 row in set (2.21 sec)
I think StackOverflow doesn't need to reach the rows at offset 10000000. The query below should be fast enough if you have an index on date and the numbers in LIMIT clause are from real world examples, not millions :)
SELECT *
FROM stuff
ORDER BY date DESC
LIMIT {pageNumber}*{stuffPerPage}, {stuffPerPage}
UPDATE:
If records in a table are relatively rarely deleted (like in StackOverflow) then you can use the following solution:
SELECT *
FROM stuff
WHERE id between
{stuffCount}-{pageNumber}*{stuffPerPage}+1 AND
{stuffCount}-{pageNumber-1}*{stuffPerPage}
ORDER BY id DESC
Where {stuffCount} is:
SELECT MAX(id) FROM stuff
If you have some deleted records in a database then some pages will have less than {stuffPerPage} records, but it should not be the problem. StackOverflow uses some inaccurate algorithm too. For instance try to go to the first page and to the last page and you'll see that both pages return 30 records per page. But mathematically it's nonsense.
Solutions designed to work with large databases often uses some hacks which usually are unnoticeable for regular users.
Nowadays paging with millions of records is not modish, because it's impractical. Currently it's popular to use infinite scrolling (automatic or manual with button click). It has more sense and pages load faster because they don't need to be reloaded. If you think that old records can be useful for your users too, then it's a good idea to create a page with random records (with infinite scrolling too). This was my opinion :)
I have a question about, how to analyze a query to know performance of its (good or bad).
I searched a lot and got something like below:
SELECT count(*) FROM users; => Many experts said it's bad.
SELECT count(id) FROM users; => Many experts said it's good.
Please see the table:
+---------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-------------+------+-----+---------+----------------+
| userId | int(11) | NO | PRI | NULL | auto_increment |
| f_name | varchar(50) | YES | | NULL | |
| l_name | varchar(50) | YES | | NULL | |
| user_name | varchar(50) | NO | | NULL | |
| password | varchar(50) | YES | | NULL | |
| email | varchar(50) | YES | | NULL | |
| active | char(1) | NO | | Y | |
| groupId | smallint(4) | YES | MUL | NULL | |
| created_date | datetime | YES | | NULL | |
| modified_date | datetime | YES | | NULL | |
+---------------+-------------+------+-----+---------+----------------+
But when I try to using EXPLAIN command for that, I got the results:
EXPLAIN SELECT count(*) FROM `user`;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | user | index | NULL | groupId | 3 | NULL | 83 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
EXPLAIN SELECT count(userId) FROM user;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | user | index | NULL | groupId | 3 | NULL | 83 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
So, the first thing for me:
Can I understand it's the same performance?
P/S: MySQL version is 5.5.8.
No, you cannot. Explain doesn't reflect all the work done by mysql, it just gives you a plan of how it will be performed.
What about specifically count(*) vs count(id). The first one is always not slower than the second, and in some cases it is faster.
count(col) semantic is amount of not null values, while count(*) is - amount of rows.
Probably mysql can optimize count(col) by rewriting into count(*) as well as id is a PK thus cannot be NULL (if not - it looks up for NULLS, which is not fast), but I still propose you to use COUNT(*) in such cases.
Also - the internall processes depend on used storage engine. For myisam the precalculated number of rows returned in both cases (as long as you don't use WHERE).
In the example you give the performance is identical.
The execution plan shows you that the optimiser is clever enough to know that it should use the Primary key to find the total number of records when you use count(*).
There is not significant difference when it comes on counting. The reason is that most optimizers will figure out the best way to count rows by themselves.
The performance difference comes to searching for values and lack of indexing. So if you search for a field that has no index assigned {f_name,l_name} and a field that has{userID(mysql automatically use index on primary keys),groupID(seems like foraign key)} then you will see the difference in performance.