why is mysql select count(1) taking so long? - mysql

When I first started using MySQL, a select count(*) or select count(1) was almost instantaneous. But I'm now using version 5.6.25 hosted at Dreamhost, and it's taking 20-30 seconds, sometimes, to do a select count(1). However, the second time it's fast---like the index is cached---but not super fast, like the data are coming from just the metadata index.
Anybody understand what's going on, and why it has changed?
mysql> select count(1) from times;
+----------+
| count(1) |
+----------+
| 1511553 |
+----------+
1 row in set (22.04 sec)
mysql> select count(1) from times;
+----------+
| count(1) |
+----------+
| 1512007 |
+----------+
1 row in set (0.54 sec)
mysql> select version();
+------------+
| version() |
+------------+
| 5.6.25-log |
+------------+
1 row in set (0.00 sec)
mysql>

I guess when you first started, you used MyISAM, and now you are using InnoDB. InnoDB just doesn't store this information. See documentation: Limits on InnoDB Tables
InnoDB does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. To process a SELECT COUNT(*) FROM t statement, InnoDB scans an index of the table, which takes some time if the index is not entirely in the buffer pool. To get a fast count, you have to use a counter table you create yourself and let your application update it according to the inserts and deletes it does. If an approximate row count is sufficient, SHOW TABLE STATUS can be used. See Section 9.5, “Optimizing for InnoDB Tables”.
So when your index is entirely in the buffer pool after the (slower) first query, the second query is fast again.
MyISAM doesn't need to care about problems that concurrent transactions might create, because it doesn't support transactions, and so select count(*) from t will just look up and return a stored value very fast.

Related

MySQL many-to-many relation slow on big table

I have 2 tables which are connected with a relationship table.
More details about the tables:
stores (Currently 140.000 rows)
id (index)
store_name
city_id (index)
...
categories (Currently 400 rows)
id (index)
cat_name
store_cat_relation
store_id
cat_id
Every store belongs in one or more categories.
In the store_cat_relation table, I have indexes on (store_id, cat_id) and (cat_id, store_id).
I need to find the total amount of let's say supermarkets (cat_id = 1) in Paris (city_id = 1). I have a working query, but it takes too long when the database contains lots of stores in Paris or the database has lots of supermarkets.
This is my query:
SELECT COUNT(*) FROM stores s, store_cat_relation r WHERE s.city_id = '1' AND r.cat_id = '1' AND s.id = r.store_id
This query takes about 0,05s. Database contains about 8000 supermarkets (stores with category 1) and about 8000 stores in Paris (store_id = 1). Combined 550 supermarkets in Paris at the moment.
I want to reduce the query time to below 0,01s because the database is only getting bigger.
The result of EXPLAIN is this:
id: 1
select_type: SIMPLE
table: store_cat_relation
type: ref
possible_keys: cat_id_store_id, store_id_cat_id
key: cat_id_store_id
key_len: 4
ref: const
rows: 8043
Extra: Using index
***************************************
id: 1
select_type: SIMPLE
table: stores
type: eq_ref
possible_keys: PRIMARY, city_id
key: PRIMARY
key_len: 4
ref: store_cat_relation.store_id
rows: 1
Extra: Using index condition; Using where
Anyone an idea why this query takes so long?
EDIT: I also created a SQL fiddle with 300 rows per table. With low amount of rows, it's quite fast, but I need it to be fast with +100.000 rows.
http://sqlfiddle.com/#!9/675a3/1
i have made some test and the best performance is to use the Query cache. You can enable them and use it ON DEMAND. so you can say which query are insert into the cache. if you want to use it you must make the changes in the /etc/my.cnf to make them persistent. If you change the tables you can also run some queries to warm up the cache
Here a Sample
Table size
MariaDB [yourSchema]> select count(*) from stores;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (1 min 23.50 sec)
MariaDB [yourSchema]> select count(*) from store_cat_relation;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (2.45 sec)
MariaDB [yourSchema]>
Verify cache is on
MariaDB [yourSchema]> SHOW VARIABLES LIKE 'have_query_cache';
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| have_query_cache | YES |
+------------------+-------+
1 row in set (0.01 sec)
set cache size and on DEMAND
MariaDB [yourSchema]> SET GLOBAL query_cache_size = 1000000;
Query OK, 0 rows affected, 1 warning (0.00 sec)
MariaDB [yourSchema]> SET GLOBAL query_cache_type=DEMAND;
Query OK, 0 rows affected (0.00 sec)
Enable Profiling
MariaDB [yourSchema]> set profiling=on;
First execute your query - takes 0.68 sec
MariaDB [yourSchema]> SELECT SQL_CACHE COUNT(*) FROM stores s, store_cat_relation r WHERE s.city_id = '1' AND r.cat_id = '1' AND s.id = r.store_id;
+----------+
| COUNT(*) |
+----------+
| 192 |
+----------+
1 row in set (0.68 sec)
now get it from cache
MariaDB [yourSchema]> SELECT SQL_CACHE COUNT(*) FROM stores s, store_cat_relation r WHERE s.city_id = '1' AND r.cat_id = '1' AND s.id = r.store_id;
+----------+
| COUNT(*) |
+----------+
| 192 |
+----------+
1 row in set (0.00 sec)
see the Profile with duration in uS
MariaDB [yourSchema]> show profile;
+--------------------------------+----------+
| Status | Duration |
+--------------------------------+----------+
| starting | 0.000039 |
| Waiting for query cache lock | 0.000008 |
| init | 0.000005 |
| checking query cache for query | 0.000056 |
| checking privileges on cached | 0.000026 |
| checking permissions | 0.000014 |
| checking permissions | 0.000025 |
| sending cached result to clien | 0.000027 |
| updating status | 0.000048 |
| cleaning up | 0.000025 |
+--------------------------------+----------+
10 rows in set (0.05 sec)
MariaDB [yourSchema]>
What you are looking at is index scenarios:
Using the optimizer a DBMS tries to find the optimal path to the data. Depending on the data itself, this can lead to different access paths depending on the conditions (WHERE/JOINS/GROUP BY, sometimes ORDER BY) supplied. The data distribution in this can be key to fast queries or very slow queries.
So you have at this moment 2 tables, store and store_cat_relation. On store you have 2 indexes:
id (primary)
city_id
You have a where on city_id, and a join on id. The internal execution in the DBMS engine is then as follows:
1) Read index city_id
2) Then read table (ok, primary key index) to find id
3) Join on ID
This can be a bit more optimized with a multi column index:
CREATE INDEX idx_nn_1 ON store(city_id,id);
This should result in:
1) Read index idx_nn_1
2) Join using this index idx_nn_1
You do have fairly lob sided data in your current example with all city_id=1 in your example. This kind of distribution of the data in the real data, can give you problems since where city_id= is then similar to saying "Just select everything from table store". The histogram information on that column can result into a different plan in those kind of cases, however if your data distribution is not so lob sided, it should work nicely.
On your second table store_cat_relation you might try an index like this:
CREATE INDEX idx_nn_2 ON store_cat_relation(store_id,cat_id);
To see if the DBMS then decides that leads to a better data access path.
With every join you see, study the join and see if a multi column index can reduce the number of reads.
Do not index all your columns: Too many columns in an index will lead to slower inserts and updates.
Also some scenarios might require you to create indexes in different order, leading to many indexes on a table (one with column(1,2,3), the next with column(1,3,2), etc). That is also not a real happy scenario, in which single column or a limitation of the columns and just reading the table for column 2,3 might be preferred.
Indexing requires testing your most common scenarios, which can be a lot of fun since you will see how a slow query running for seconds can suddenly run within 100s of seconds or even faster.

MySQL InnoDB "SELECT FOR UPDATE" - SKIP LOCKED equivalent

Is there any way to skip "locked rows" when we make "SELECT FOR UPDATE" in MySQL with an InnoDB table?
E.g.: terminal t1
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)
mysql> select id from mytable ORDER BY id ASC limit 5 for update;
+-------+
| id |
+-------+
| 1 |
| 15 |
| 30217 |
| 30218 |
| 30643 |
+-------+
5 rows in set (0.00 sec)
mysql>
At the same time, terminal t2:
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)
mysql> select id from mytable where id>30643 order by id asc limit 2 for update;
+-------+
| id |
+-------+
| 30939 |
| 31211 |
+-------+
2 rows in set (0.01 sec)
mysql> select id from mytable order by id asc limit 5 for update;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
mysql>
So if I launch a query forcing it to select other rows, it's fine.
But is there a way to skip the locked rows?
I guess this should be a redundant problem in the concurrent process, but I did not find any solution.
EDIT:
In reality, my different concurrent processes are doing something apparently really simple:
take the first rows (which don't contain a specific flag - e.g.: "WHERE myflag_inUse!=1").
Once I get the result of my "select for update", I update the flag and commit the rows.
So I just want to select the rows which are not already locked and where myflag_inUse!=1...
The following link helps me to understand why I get the timeout, but not how to avoid it:
MySQL 'select for update' behaviour
mysql> SHOW VARIABLES LIKE "%version%";
+-------------------------+-------------------------+
| Variable_name | Value |
+-------------------------+-------------------------+
| innodb_version | 5.5.46 |
| protocol_version | 10 |
| slave_type_conversions | |
| version | 5.5.46-0ubuntu0.14.04.2 |
| version_comment | (Ubuntu) |
| version_compile_machine | x86_64 |
| version_compile_os | debian-linux-gnu |
+-------------------------+-------------------------+
7 rows in set (0.00 sec)
MySQL 8.0 introduced support for both SKIP LOCKED and NO WAIT.
SKIP LOCKED is useful for implementing a job queue (a.k.a batch queue) so that you can skip over locks that are already locked by a concurrent transaction.
NO WAIT is useful for avoiding waiting until a concurrent transaction releases the locks that we are also interested in locking.
Without NO WAIT, we either have to wait until the locks are released (at commit or release time by the transaction that currently holds the locks) or the lock acquisition times out. NO WAIT acts as a lock timeout with a value of 0.
For more details about SKIP LOCK and NO WAIT.
This appears to now exist in MySQL starting in 8.0.1:
https://mysqlserverteam.com/mysql-8-0-1-using-skip-locked-and-nowait-to-handle-hot-rows/
Starting with MySQL 8.0.1 we are introducing the SKIP LOCKED modifier
which can be used to non-deterministically read rows from a table
while skipping over the rows which are locked. This can be used by
our booking system to skip orders which are pending. For example:
However, I think that version is not necessarily production ready.
Unfortunately, it seems that there is no way to skip the locked row in a select for update so far.
It would be great if we could use something like the Oracle 'FOR UPDATE SKIP LOCKED'.
In my case, the queries launched in parallel are both exactly the same, and contain a 'where' clause and a 'group by' on a several millions of rows...because the queries need between 20 and 40 seconds to run, that was (as I already knew) a big part of the problem.
The only -temporary and not the best- solution I saw was to move some (i.e.: millions of) rows that I would not (directly) use in order to reduce the time the query will take.
So I will still have the same behavior but I will wait less time...
I was expecting a way to not select the locked row in the select.
I don't mark this as an answer, so if a new clause from mysql is added (or discovered), I can accept it later...
I'm sorry, but I think you approach the problem from a wrong angle. If your user wants to list records from a table that satisfy certain selection criteria, then your query should return them all, or return with an error message and provide no resultset whatsoever. But the query should not reurn only a subset of the results leading the user to belive that he has all the matching records.
The issue should be addressed by making sure that your application locks as few rows as possible, for as little time as possible.
Walk through the table in chunks of the PRIMARY KEY, using some suitable LIMIT so you are not looking at "too many" rows at once.
By using the PK, you are ordering things in a predictable way; this virtually eliminates deadlocks.
By using LIMIT, you will keep from hogging too much at once. The LIMIT should be embodied as a range over the PK. This makes it quite clear if two threads are about to step on each other.
More details are (indirectly) in my blog on big deletes.

How can I shrink the size of conn_log table

In my MySQL instance, conn_log table contains thousands of billion connections records. As a result, it consume too much storage size. Is there a way to cut down the size or disable connection logs?
mysql> select count(*) from conn_log;
+------------+
| count(*) |
+------------+
| 4215139229 |
+------------+
1 row in set (0.00 sec)
I figured it out myself. The cause why test.conn_log table had thousands of billion records is I set the init_connect variable as follows.
insert into test.conn_log values(connection_id(),now(),#user,#cur_user,'');
So when a connection is established, a record will be insert into test.conn_log table.

Mysql-If I insert multiple values in a column of a table simultaneously ,is it possible that the inserting orders of values get change?

I am doing these :
insert into table_name(maxdate) values
((select max(date1) from table1)), -- goes in row1
((select max(date2) from table2)), -- goes in row2
.
.
.
((select max(date500) from table500));--goes in row500
is it possible that while insertion , order of inserting might get change ?.Eg when i will do
select maxdate from table_name limit 500;
i will get these
date1 date2 . . date253 date191 ...date500
Short answer:
No, not possible.
If you want to double check :
mysql> create table letest (f1 varchar(50), f2 varchar(50));
Query OK, 0 rows affected (0.00 sec)
mysql> insert into letest (f1,f2) values
( (SELECT SLEEP(5)), 'first'),
( (SELECT SLEEP(1)), 'second');
Query OK, 2 rows affected, 1 warning (6.00 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> select * from letest;
+------+--------+
| f1 | f2 |
+------+--------+
| 0 | first |
| 0 | second |
+------+--------+
2 rows in set (0.00 sec)
mysql>
SLEEP(5) is the first row to be inserted after 5 seconds,
SLEEP(1) is the second row to be inserted after 5+1 seconds
that is why query takes 6 seconds.
The warning that you see is
mysql> show warnings;
+-------+------+-------------------------------------------------------+
| Level | Code | Message |
+-------+------+-------------------------------------------------------+
| Note | 1592 | Statement may not be safe to log in statement format. |
+-------+------+-------------------------------------------------------+
1 row in set (0.00 sec)
This can affect you only if you are using a master-slave setup, because the replication binlog will not be safe. For more info on this http://dev.mysql.com/doc/refman/5.1/en/replication-rbr-safe-unsafe.html
Later edit: Please consider a comment if you find this answer not usefull.
Yes, very possible.
You should consider a database table unordered, and a SELECT statement without ORDER clause as well. Every DBMS can choose how to implement tables (often even depending on Storage Engine) and return the rows. Sure, many DBMS's happen to return your data in the order you inserted, but never rely on it.
The order of the retrieved data my depend on the execution plan, and may even be different when running the same query multiple times. Especially when only retrieving part of the data (TOP/LIMIT).
If you want to impose an order, add a field which orders your data. Yes, an autoincrement primary key will be enough in many cases. If you think you'll be wanting to change the order someday, add another field.

MySQL Integer vs DateTime index

Let me start by saying I have looked at many similar questions asked, but all of them relate to Timestamp and DateTime field type without indexing. At least that is my understanding.
As we all know, there are certain advantages when it comes to DateTime. Putting them aside for a minute, and assuming table's engine is InnoDB with 10+ million records, which query would perform faster when criteria is based on:
DateTime with index
int with index
In other words, it is better to store date and time as DateTime or UNIX timestamp in int? Keep in mind there is no need for any built-in MySQL functions to be used.
Update
Tested with MySQL 5.1.41 (64bit) and 10 million records, initial testing showed significant speed difference in favour of int. Two tables were used, tbl_dt with DateTime and tbl_int with int column. Few results:
SELECT SQL_NO_CACHE COUNT(*) FROM `tbl_dt`;
+----------+
| COUNT(*) |
+----------+
| 10000000 |
+----------+
1 row in set (2 min 10.27 sec)
SELECT SQL_NO_CACHE COUNT(*) FROM `tbl_int`;
+----------+
| count(*) |
+----------+
| 10000000 |
+----------+
1 row in set (25.02 sec)
SELECT SQL_NO_CACHE COUNT(*) FROM `tbl_dt` WHERE `created` BETWEEN '2009-01-30' AND '2009-12-30';
+----------+
| COUNT(*) |
+----------+
| 835663 |
+----------+
1 row in set (8.41 sec)
SELECT SQL_NO_CACHE COUNT(*) FROM `tbl_int` WHERE `created` BETWEEN 1233270000 AND 1262127600;
+----------+
| COUNT(*) |
+----------+
| 835663 |
+----------+
1 row in set (1.56 sec)
I'll post another update with both fields in one table as suggested by shantanuo.
Update #2
Final results after numerous server crashes :) Int type is significantly faster, no matter what query was run, the speed difference was more or less the same as results above.
"Strange" thing observed was execution time was more or less the same when two both field types are stored in the same table. It seems MySQL is smart enough to figure out when the values are the same when stored in both DateTime and int. Haven't found any documentation on the subject, therefore is just an observation.
I see that in the test mentioned in the above answer, the author basically proves it that when the UNIX time is calculated in advance, INT wins.
My instinct would be to say that ints are always faster. However, this seems not to be the case
http://gpshumano.blogs.dri.pt/2009/07/06/mysql-datetime-vs-timestamp-vs-int-performance-and-benchmarking-with-myisam/
Edited to add: I realize that you're using InnoDB, rather than MyISAM, but I haven't found anything to contradict this in the InnoDB case. Also, the same author did an InnoDB test
http://gpshumano.blogs.dri.pt/2009/07/06/mysql-datetime-vs-timestamp-vs-int-performance-and-benchmarking-with-innodb/
it depends on your application, as you can see in an awesome comparison and benchmark of DATETIME , TIMESTAMP and INT type in Mysql server in MySQL Date Format: What Datatype Should You Use? We Compare Datetime, Timestamp and INT. you can see in some situation INT has better perfomance than other and in some cases DATETIME has better performance. and It completely depends on your application