How can I optimize/measure UPSERT performance on MYSQL? - mysql

How can I measure mysql "UPSERT" performance? More specifically, get information about the implied search before the insert/update/replace?
using mysql 8, with a schema that has three fields. Two are part of the primary key. Table is currently innodb but that is not a hard requirement.
CREATE TABLE IF NOT EXISTS `test`.`recent`
( `uid` int NOT NULL, `gid` int NOT NULL, `last` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`uid`,`gid`),
KEY `idx_last` (`last`) USING BTREE
) ENGINE=InnoDB;
+----------+----------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+----------+------+-----+-------------------+-------+
| uid | int(11) | NO | PRI | NULL | |
| gid | int(11) | NO | PRI | NULL | |
| last | datetime | NO | MUL | CURRENT_TIMESTAMP | |
+----------+----------+------+-----+-------------------+-------+
I plan to insert values using
INSERT INTO test.recent (uid,gid) VALUES (1, 1)
ON DUPLICATE KEY UPDATE last=NOW();
How do I go about figuring out the performance of this query, since EXPLAIN will not show the implied search, only the insert:
MYSQL> explain INSERT INTO test.recent (uid,gid) VALUES (1, 1) ON DUPLICATE KEY UPDATE last=NOW();
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+-------+
| 1 | INSERT | recent | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+-------+
1 row in set (0.00 sec)
MYSQL> explain INSERT INTO test.recent (uid,gid) VALUES (1, 1);
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+-------+
| 1 | INSERT | recent | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+-------+
1 row in set (0.00 sec)
which is different from the explain on an actual search:
MYSQL> explain select last from test.recent where uid=1 and gid=1;
+----+-------------+--------+------------+-------+---------------+---------+---------+-------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+---------------+---------+---------+-------------+------+----------+-------+
| 1 | SIMPLE | recent | NULL | const | PRIMARY | PRIMARY | 8 | const,const | 1 | 100.00 | NULL |
+----+-------------+--------+------------+-------+---------------+---------+---------+-------------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
One of the variables I am trying to figure out, is if performance would change at all if I use a blind update instead:
MYSQL> explain REPLACE INTO test.recent VALUES (1, 1, NOW());
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+-------+
| 1 | REPLACE | recent | NULL | ALL | NULL | NULL | NULL | NULL | NULL | NULL | NULL |
+----+-------------+--------+------------+------+---------------+------+---------+------+------+----------+-------+
1 row in set (0.01 sec)
But as you can see, the information i get is the same (unhelpful) as I get for an "explain insert".
Another question I would like to answer based on measurements, is if things would change for better or worse (and by how much) If i tested both upserts approaches (on duplicate vs replace) with a DATE field (instead of the DATETIME), which in theory would results in less writes (but still the same number of implied searches). But again, explain is no help here.

Don't trust EXPLAIN with very few rows.
Your IODKU is optimal. Here's how it will work:
1-. Like a SELECT, drill down the PRIMARY KEY BTree to find the row with (1,1) or a gap where (1,1) should be. That is about as fast a lookup as can be had.
2a. If the row exists, UPDATE it.
2b. If the row does not exist, INSERT (and set last to the DEFAULT of `CURRENT_TIMESTAMP).
If you want, I can go into details of the steps. But we are talking sub-millisecond for each.
If I wanted to time the query, I would use some high-res timer. Very likely the timings would bounce around, depending of whether there is a breeze blowing, or a butterfly is flapping its wings, or the phase of the moon.
Caveat: If you "simplified" your query for this question, I may not be giving you correct info for the real query.
If you are doing a thousand IODKUs, then there may be optimizations that involve combining them. There are some typical optimizations that can easily give 10x speedup.

Related

MySQL : Indexes for View based on an aggregated query

I have a working, nice, indexed SQL query aggregating notes (sum of ints) for all my users and others stuffs. This is "query A".
I want to use this aggregated notes in others queries, say "query B".
If I create a View based on "query A", will the indexes of the original query will be used when needed if I join it in "query B" ?
Is that true for MySQL ? For others flavors of SQL ?
Thanks.
In MySQL, you cannot create an index on a view. MySQL uses indexes of the underlying tables when you query data against the views that use the merge algorithm. For the views that use the temptable algorithm, indexes are not utilized when you query data against the views.
https://www.percona.com/blog/2007/08/12/mysql-view-as-performance-troublemaker/
Here's a demo table. It has a userid attribute column and a note column.
mysql> create table t (id serial primary key, userid int not null, note int, key(userid,note));
If you do an aggregation to get the sum of note per userid, it does an index-scan on (userid, note).
mysql> explain select userid, sum(note) from t group by userid;
+----+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
| 1 | SIMPLE | t | index | userid | userid | 9 | NULL | 1 | Using index |
+----+-------------+-------+-------+---------------+--------+---------+------+------+-------------+
1 row in set (0.00 sec)
If we create a view for the same query, then we can see that querying the view uses the same index on the underlying table. Views in MySQL are pretty much like macros — they just query the underlying table.
mysql> create view v as select userid, sum(note) from t group by userid;
Query OK, 0 rows affected (0.03 sec)
mysql> explain select * from v;
+----+-------------+------------+-------+---------------+--------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+--------+---------+------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 2 | NULL |
| 2 | DERIVED | t | index | userid | userid | 9 | NULL | 1 | Using index |
+----+-------------+------------+-------+---------------+--------+---------+------+------+-------------+
2 rows in set (0.00 sec)
So far so good.
Now let's create a table to join with the view, and join to it.
mysql> create table u (userid int primary key, name text);
Query OK, 0 rows affected (0.09 sec)
mysql> explain select * from v join u using (userid);
+----+-------------+------------+-------+---------------+-------------+---------+---------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+-------------+---------+---------------+------+-------------+
| 1 | PRIMARY | u | ALL | PRIMARY | NULL | NULL | NULL | 1 | NULL |
| 1 | PRIMARY | <derived2> | ref | <auto_key0> | <auto_key0> | 4 | test.u.userid | 2 | NULL |
| 2 | DERIVED | t | index | userid | userid | 9 | NULL | 1 | Using index |
+----+-------------+------------+-------+---------------+-------------+---------+---------------+------+-------------+
3 rows in set (0.01 sec)
I tried to use hints like straight_join to force it to read v then join to u.
mysql> explain select * from v straight_join u on (v.userid=u.userid);
+----+-------------+------------+-------+---------------+--------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+--------+---------+------+------+----------------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 7 | NULL |
| 1 | PRIMARY | u | ALL | PRIMARY | NULL | NULL | NULL | 1 | Using where; Using join buffer (Block Nested Loop) |
| 2 | DERIVED | t | index | userid | userid | 9 | NULL | 7 | Using index |
+----+-------------+------------+-------+---------------+--------+---------+------+------+----------------------------------------------------+
"Using join buffer (Block Nested Loop)" is MySQL's terminology for "no index used for the join." It's just looping over the table the hard way -- by reading batches of rows from start to finish of the table.
I tried to use force index to tell MySQL that type=ALL is to be avoided.
mysql> explain select * from v straight_join u force index(PRIMARY) on (v.userid=u.userid);
+----+-------------+------------+--------+---------------+---------+---------+----------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+---------+---------+----------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 7 | NULL |
| 1 | PRIMARY | u | eq_ref | PRIMARY | PRIMARY | 4 | v.userid | 1 | NULL |
| 2 | DERIVED | t | index | userid | userid | 9 | NULL | 7 | Using index |
+----+-------------+------------+--------+---------------+---------+---------+----------+------+-------------+
Maybe this is using an index for the join? But it's weird that table u is before table t in the EXPLAIN. I'm frankly not sure how to understand what it's doing, given the order of rows in this EXPLAIN report. I would expect the joined table should come after the primary table of the query.
I only put a few rows of data into each table. One might get some different EXPLAIN results with a larger representative sample of test data. I'll leave that to you to try.

How can one break up an UPDATE into subqueries?

There exists a table:
CREATE TABLE person
(
id INT(10) PRIMARY KEY AUTO_INCREMENT,
nameFirst VARCHAR(255) DEFAULT '?',
nameSecond VARCHAR(255) DEFAULT '',
fatherNameFirst VARCHAR(255) DEFAULT NULL
);
Note: There are actually other columns there, 18 in total, but they are not being used here.
The goal is to set up father's first name from using second name of the child. It can be predicted from the second name (patronymic) in russian language, but not always correctly. So i plan to do what can be done automatically and some will do by hand later.
So the UPDATE done as follows:
UPDATE person AS child
LEFT JOIN
(SELECT DISTINCT nameFirst FROM person) AS parent
ON CONCAT(parent.nameFirst,'овна')=child.nameSecond OR CONCAT(parent.nameFirst,'ович')=child.nameSecond
SET child.fatherNameFirst=parent.nameFirst;
Eventually it will need to run on a table that has >2m entries, for now i have tried with the sample data of 400k. The problem is that after about an hour of my computer using one if its cores at 100% the query has not yet finished.
So i was thinking if i can break it up into subqueries, so these can be set to run one after another, but they should each take 5-10 minutes. This way if i need to do something, i can terminate currently running one and not lose a day of CPU time.
I have attempted to add: WHERE child.id<1000 but either it was still way too long or had little impact (perhaps i misunderstand how MariaDB opens up this update).
In case sample data will actually help somebody understand it better:
select id, nameFirst, nameSecond from person limit 10;
+----+--------------------+----------------------------+
| id | nameFirst | nameSecond |
+----+--------------------+----------------------------+
| 1 | Туликович | |
| 2 | Август | Михайлович |
| 3 | Август | Христианович |
| 4 | Александр | Александрович |
| 5 | Александр | Христьянович |
| 6 | Альберт | Викторович |
| 7 | Альбрехт | Александрович |
| 8 | Амалия | Андреевна |
| 9 | Амалия | Ивановна |
| 10 | Ангелина | Андреевна |
+----+--------------------+----------------------------+
fatherNameFirst is empty at this time.
You could break this down alphabetically by adding where nameFirst like 'A%' to the update query - and then run the query multiple times.
Given this sample data:
CREATE TABLE person
(
id INT(10) PRIMARY KEY AUTO_INCREMENT,
nameFirst VARCHAR(255) DEFAULT '?',
nameSecond VARCHAR(255) DEFAULT '',
fatherNameFirst VARCHAR(255) DEFAULT NULL
) DEFAULT CHARSET=utf8;
INSERT INTO person
(`id`, `nameFirst`, `nameSecond`)
VALUES
(1, 'Туликович', NULL),
(2, 'Август', 'Михайлович'),
(3, 'Август', 'Христианович'),
(4, 'Александр', 'Александрович'),
(5, 'Александр', 'Христьянович'),
(6, 'Альберт', 'Викторович'),
(7, 'Альбрехт', 'Александрович'),
(8, 'Амалия', 'Андреевна'),
(9, 'Амалия', 'Ивановна'),
(10, 'Ангелина', 'Андреевна')
;
with your query you get this EXPLAIN output:
+----+-------------+------------+------+---------------+------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+------+----------------------------------------------------+
| 1 | PRIMARY | child | ALL | NULL | NULL | NULL | NULL | 10 | NULL |
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 10 | Using where; Using join buffer (Block Nested Loop) |
| 2 | DERIVED | person | ALL | NULL | NULL | NULL | NULL | 10 | Using temporary |
+----+-------------+------------+------+---------------+------+---------+------+------+----------------------------------------------------+
That's probably the worst you can get.
Let's see if we can rewrite this. First, there's absolutely no need for this subquery and the DISTINCT.
mysql > explain UPDATE person AS child
-> LEFT JOIN person parent
-> ON CONCAT(parent.nameFirst,'овна')=child.nameSecond OR CONCAT(parent.nameFirst,'ович')=child.nameSecond
-> SET child.fatherNameFirst=parent.nameFirst;
+----+-------------+--------+------+---------------+------+---------+------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+------+----------------------------------------------------+
| 1 | SIMPLE | child | ALL | NULL | NULL | NULL | NULL | 10 | NULL |
| 1 | SIMPLE | parent | ALL | NULL | NULL | NULL | NULL | 10 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+--------+------+---------------+------+---------+------+------+----------------------------------------------------+
This eliminates the Using temporary. That's good.
With an index on nameFirst, we can speed this up further.
CREATE INDEX idx_person_nameFirst ON person(nameFirst);
Then explaining again:
+----+-------------+--------+-------+---------------+----------------------+---------+------+------+-----------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+----------------------+---------+------+------+-----------------------------------------------------------------+
| 1 | SIMPLE | child | ALL | NULL | NULL | NULL | NULL | 10 | NULL |
| 1 | SIMPLE | parent | index | NULL | idx_person_nameFirst | 768 | NULL | 10 | Using where; Using index; Using join buffer (Block Nested Loop) |
+----+-------------+--------+-------+---------------+----------------------+---------+------+------+-----------------------------------------------------------------+
Not yet perfect, but it's using the index. This should speed things up a lot.
From here on, it gets hard to optimize further. You can experiment a bit by adjusting the join buffer size, but I recommend you do this in a session only.
SET SESSION join_buffer_size = <whatever value>;
Every thread connecting to your server uses its own join buffer. That's why you should only test it in a session. When you have very much connections on your server, memory consumption could get out of hand.

Composite index not used in mysql

According to MySQL docs a composite index will still be used if the leftmost fields are part of the criteria. However, this table will not join correctly with the primary key; I had to add another index of the left two fields which is then used.
One of the tables is memory, and I know that by default memory uses a hash index which can't be used for group/order. However I'm using all rows of the memory table and not the index, so I don't think that relates to the problem.
What am I missing?
mysql> show create table pr_temp;
| pr_temp | CREATE TEMPORARY TABLE `pr_temp` (
`player_id` int(10) unsigned NOT NULL,
`insert_date` date NOT NULL,
[...]
PRIMARY KEY (`player_id`,`insert_date`) USING BTREE,
KEY `insert_date` (`insert_date`)
) ENGINE=MEMORY DEFAULT CHARSET=utf8 |
mysql> show create table player_game_record;
| player_tank_record | CREATE TABLE `player_game_record` (
`player_id` int(10) unsigned NOT NULL,
`game_id` smallint(5) unsigned NOT NULL,
`insert_date` date NOT NULL,
[...]
PRIMARY KEY (`player_id`,`insert_date`,`game_id`),
KEY `insert_date` (`insert_date`),
KEY `player_date` (`player_id`,`insert_date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 DATA DIRECTORY='...' INDEX DIRECTORY='...' |
mysql> explain select pgr.* from player_game_record pgr inner join pr_temp on pgr.player_id = pr_temp.player_id and pgr.insert_date = pr_temp.date_prev;
+----+-------------+---------+------+---------------------------------+-------------+---------+-------------------------------------------------------------------------+--------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------------------------+-------------+---------+-------------------------------------------------------------------------+--------+-------+
| 1 | SIMPLE | pr_temp | ALL | PRIMARY | NULL | NULL | NULL | 174683 | |
| 1 | SIMPLE | pgr | ref | PRIMARY,insert_date,player_date | player_date | 7 | test_gamedb.pr_temp.player_id,test_gamedb.pr_temp.date_prev | 21 | |
+----+-------------+---------+------+---------------------------------+-------------+---------+-------------------------------------------------------------------------+--------+-------+
2 rows in set (0.00 sec)
mysql> explain select pgr.* from player_game_record pgr force index (primary) inner join pr_temp on pgr.player_id = pr_temp.player_id and pgr.insert_date = pr_temp.date_prev;
+----+-------------+---------+------+---------------+---------+---------+-------------------------------------------------------------------------+---------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+---------+---------+-------------------------------------------------------------------------+---------+-------+
| 1 | SIMPLE | pr_temp | ALL | PRIMARY | NULL | NULL | NULL | 174683 | |
| 1 | SIMPLE | pgr | ref | PRIMARY | PRIMARY | 7 | test_gamedb.pr_temp.player_id,test_gamedb.pr_temp.date_prev | 2873031 | |
+----+-------------+---------+------+---------------+---------+---------+-------------------------------------------------------------------------+---------+-------+
2 rows in set (0.00 sec)
I think the primary key should work, with the two left columns (player_id, insert_date) being used. However it will use the player_date index by default, and if I force it to use the primary index it looks like it's only using one field rather than both.
Update2: Mysql version 5.5.27-log
Update3:
(note this is after removing the player_date index while trying some other tests)
mysql> show indexes in player_game_record;
+--------------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| player_game_record | 0 | PRIMARY | 1 | player_id | A | NULL | NULL | NULL | | BTREE | | |
| player_game_record | 0 | PRIMARY | 2 | insert_date | A | NULL | NULL | NULL | | BTREE | | |
| player_game_record | 0 | PRIMARY | 3 | game_id | A | 576276246 | NULL | NULL | | BTREE | | |
| player_game_record | 1 | insert_date | 1 | insert_date | A | 33304 | NULL | NULL | | BTREE | | |
+--------------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (1.08 sec)
mysql> select count(*) from player_game_record;
+-----------+
| count(*) |
+-----------+
| 576276246 |
+-----------+
1 row in set (0.00 sec)
I agree that your use of the MEMORY storage engine for one of the tables should not at all be an issue here, since we're talking about the other table.
I also agree that the leftmost prefix of an index can be used exactly how you are trying to use it, and I cannot think of any reason why the primary key could not be used in exactly the same way as any other index.
This has been a head-scratcher. The new index you created "should" be the same as the left side of the primary key, so why don't they behave the same way? I have two thoughts, both of which lead me to the same recommendation, even though I am not as familiar with the internals of MyISAM as I am with InnoDB. (As an aside, I'd recommend InnoDB over MyISAM.)
The index on your primary key was presumably on the table when you began inserting data, while the new index was added while most or all of the data was already there. This suggests that your new index is nice and cleanly-organized internally, while your primary key index may be highly fragmented, having been built as the data was loaded.
The row count the optimizer shows is based on index statistics, which may be inaccurate on your primary key due to the insert order.
The fragmentation theory may explain why querying with the primary key as your index is not as fast; the index statistics theory may explain why the optimizer comes up with such a different row count and it may explain why the optimizer might have been choosing a full table scan instead of using that index (which is only a guess, since we don't have the explain available).
The thing I would suggest based on these two thoughts is running OPTIMIZE TABLE on your table. If it took 12 hours to build that new index, then optimizing the table may very possibly take that long or longer.
Possibly helpful: http://www.dbasquare.com/2012/07/09/data-fragmentation-problem-in-mysql-myisam/

Is there anything wrong with nested Views in MySQL

Is there anything wrong with having one view reference another view? For example, say I have a
users table
CREATE TABLE `users` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`first_name` varchar(255) NOT NULL,
`last_name` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
);
Then for the sake of argument a view that just shows all users
CREATE VIEW all_users AS SELECT * FROM users
And then a view that just returns their first_name and last_name
CREATE VIEW full_names AS SELECT first_name, last_name FROM all_users
Are there performance issue with basing one view off another? Let's also pretend this is the simplest of examples and a real world scenario would be much more complex, but the same general concept of basing one view of another view.
It depends on the ALGORITHM used. TEMPTABLE can be pretty expensive while MERGE should be the same as using the table directly so no loss there.
And for your example is the same:
mysql> explain select * from users;
+----+-------------+-------+--------+---------------+------+---------+------+------+---------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+------+---------+------+------+---------------------+
| 1 | SIMPLE | users | system | NULL | NULL | NULL | NULL | 0 | const row not found |
+----+-------------+-------+--------+---------------+------+---------+------+------+---------------------+
1 row in set (0.01 sec)
mysql> explain select * from all_users;
+----+-------------+-------+--------+---------------+------+---------+------+------+---------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+------+---------+------+------+---------------------+
| 1 | SIMPLE | users | system | NULL | NULL | NULL | NULL | 0 | const row not found |
+----+-------------+-------+--------+---------------+------+---------+------+------+---------------------+
1 row in set (0.00 sec)
mysql> explain select * from full_names ;
+----+-------------+-------+--------+---------------+------+---------+------+------+---------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+------+---------+------+------+---------------------+
| 1 | SIMPLE | users | system | NULL | NULL | NULL | NULL | 0 | const row not found |
+----+-------------+-------+--------+---------------+------+---------+------+------+---------------------+
1 row in set (0.02 sec)

analyze query mysql

I have a question about, how to analyze a query to know performance of its (good or bad).
I searched a lot and got something like below:
SELECT count(*) FROM users; => Many experts said it's bad.
SELECT count(id) FROM users; => Many experts said it's good.
Please see the table:
+---------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-------------+------+-----+---------+----------------+
| userId | int(11) | NO | PRI | NULL | auto_increment |
| f_name | varchar(50) | YES | | NULL | |
| l_name | varchar(50) | YES | | NULL | |
| user_name | varchar(50) | NO | | NULL | |
| password | varchar(50) | YES | | NULL | |
| email | varchar(50) | YES | | NULL | |
| active | char(1) | NO | | Y | |
| groupId | smallint(4) | YES | MUL | NULL | |
| created_date | datetime | YES | | NULL | |
| modified_date | datetime | YES | | NULL | |
+---------------+-------------+------+-----+---------+----------------+
But when I try to using EXPLAIN command for that, I got the results:
EXPLAIN SELECT count(*) FROM `user`;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | user | index | NULL | groupId | 3 | NULL | 83 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
EXPLAIN SELECT count(userId) FROM user;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | user | index | NULL | groupId | 3 | NULL | 83 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
So, the first thing for me:
Can I understand it's the same performance?
P/S: MySQL version is 5.5.8.
No, you cannot. Explain doesn't reflect all the work done by mysql, it just gives you a plan of how it will be performed.
What about specifically count(*) vs count(id). The first one is always not slower than the second, and in some cases it is faster.
count(col) semantic is amount of not null values, while count(*) is - amount of rows.
Probably mysql can optimize count(col) by rewriting into count(*) as well as id is a PK thus cannot be NULL (if not - it looks up for NULLS, which is not fast), but I still propose you to use COUNT(*) in such cases.
Also - the internall processes depend on used storage engine. For myisam the precalculated number of rows returned in both cases (as long as you don't use WHERE).
In the example you give the performance is identical.
The execution plan shows you that the optimiser is clever enough to know that it should use the Primary key to find the total number of records when you use count(*).
There is not significant difference when it comes on counting. The reason is that most optimizers will figure out the best way to count rows by themselves.
The performance difference comes to searching for values and lack of indexing. So if you search for a field that has no index assigned {f_name,l_name} and a field that has{userID(mysql automatically use index on primary keys),groupID(seems like foraign key)} then you will see the difference in performance.