Why Primary Key was not used in counting Primary Key number? [duplicate] - mysql

Recently,I have reviwed the basic of SQL and found A question about index.
My Working environment is as follows:
os: Centos 7
mysql: 5.7.39
database: sakila
table: customer
AND My question is why Innodb uses idx_fk_store_id instead of the primary index when I use select count(customer_id) from customer
mysql> explain select count(customer_id) from customer;
AND Result:
type
key
Extra
index
idx_fk_store_id
Using index
The code to create this table is as follow:
CREATE TABLE `customer`(
`customer_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`store_id` tinyint(3) unsigned NOT NULL,
PRIMARY KEY(`customer_id`),
KEY `idx_fk_store_id`(`store_id`)
) ENGINE=InnoDE AUTO_INCREMENT=600 DEFAULT CHARSET=utf8mb4
I've considered that it caused by MYSQL's Optimizer , even if it's hard to understand.

The type: index in the EXPLAIN report indicates it is doing an index-scan. That is, reading every entry in the index idk_fk_store_id.
It can get the values of the primary key from a secondary index, so it can count them.
The alternative of using the primary key would be a table-scan, which reads every row of the table.
The primary key index in InnoDB is the clustered index. It stores all the columns of the table.
The secondary index stores only the values of store_id plus the primary key values of rows where a given store_id value occurs.
So it will be able to get the same answer by doing an index-scan, by reading fewer pages than doing the table-scan.

Saying COUNT(x) is common mistake.
COUNT(*) counts the number of rows
COUNT(x) counts the number of rows where x is NOT NULL
So, if you try SELECT COUNT(*) FROM customer, the Explain will again say that it is using that index. But for a different reason than Bill gave. This time the Optimizer's logic goes this way:
find the "smallest" index. That will happen to be the same store_id index.
count the "rows" in that index. But no need to test whether customer_id IS NOT NULL.
Another note: In MySQL, the PRIMARY KEY is required to be NOT NULL, so customer_id cannot be NULL. Also you declared it to be NOT NULL. Hence COUNT(customer_id) is necessarily the same as COUNT(*).
TMI.

Related

Why is not primary index here?

Recently,I have reviwed the basic of SQL and found A question about index.
My Working environment is as follows:
os: Centos 7
mysql: 5.7.39
database: sakila
table: customer
AND My question is why Innodb uses idx_fk_store_id instead of the primary index when I use select count(customer_id) from customer
mysql> explain select count(customer_id) from customer;
AND Result:
type
key
Extra
index
idx_fk_store_id
Using index
The code to create this table is as follow:
CREATE TABLE `customer`(
`customer_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`store_id` tinyint(3) unsigned NOT NULL,
PRIMARY KEY(`customer_id`),
KEY `idx_fk_store_id`(`store_id`)
) ENGINE=InnoDE AUTO_INCREMENT=600 DEFAULT CHARSET=utf8mb4
I've considered that it caused by MYSQL's Optimizer , even if it's hard to understand.
The type: index in the EXPLAIN report indicates it is doing an index-scan. That is, reading every entry in the index idk_fk_store_id.
It can get the values of the primary key from a secondary index, so it can count them.
The alternative of using the primary key would be a table-scan, which reads every row of the table.
The primary key index in InnoDB is the clustered index. It stores all the columns of the table.
The secondary index stores only the values of store_id plus the primary key values of rows where a given store_id value occurs.
So it will be able to get the same answer by doing an index-scan, by reading fewer pages than doing the table-scan.
Saying COUNT(x) is common mistake.
COUNT(*) counts the number of rows
COUNT(x) counts the number of rows where x is NOT NULL
So, if you try SELECT COUNT(*) FROM customer, the Explain will again say that it is using that index. But for a different reason than Bill gave. This time the Optimizer's logic goes this way:
find the "smallest" index. That will happen to be the same store_id index.
count the "rows" in that index. But no need to test whether customer_id IS NOT NULL.
Another note: In MySQL, the PRIMARY KEY is required to be NOT NULL, so customer_id cannot be NULL. Also you declared it to be NOT NULL. Hence COUNT(customer_id) is necessarily the same as COUNT(*).
TMI.

Optimising a query that uses index merge by intersection

I have a MySQL 8 database table accounts that has the following columns:
id (primary)
city_id (foreign key)
province_id (foreign key)
country_id (foreign key)
school_id (foreign key)
age (indexed)
EDIT: See bottom for complete table structure.
Now, imagine the following SQL query:
SELECT
COUNT(`id`) AS AGGREGATE
FROM
`accounts`
WHERE
`city_id` = 1
AND
`country_id` = 7
AND
`age` = 3
At 1 million records, this query becomes slow (~200ms).
When running EXPLAIN, I receive the following output:
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
accounts
NULL
index_merge
accounts_city_id_foreign accounts_country_id_foreign accounts_age_index
accounts_city_id_foreign accounts_country_id_foreign accounts_age_index
9,2,9
NULL
15542
100.00
Using intersect(accounts_city_id_foreign, accounts_country_id_foreign, accounts_age_index); Using where; Using index
Given that MySQL appears to be using the indexes, I'm not sure what I can do to bring the execution time down. Does anyone have any ideas?
EDIT: In the future, the table will include more columns that will make it impossible to use a composite index as it will exceed the 16 column limit.
EDIT: Here's the complete table structure:
CREATE TABLE `accounts` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`city_id` bigint unsigned DEFAULT NULL,
`school_id` bigint unsigned DEFAULT NULL,
`country_id` bigint unsigned DEFAULT NULL,
`province_id` bigint unsigned DEFAULT NULL,
`age` tinyint unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `accounts_city_id_foreign` (`city_id`),
KEY `accounts_school_id_foreign` (`school_id`),
KEY `accounts_country_id_foreign` (`country_id`),
KEY `accounts_province_id_foreign` (`province_id`),
KEY `accounts_age_index` (`age`),
CONSTRAINT `accounts_city_id_foreign` FOREIGN KEY (`city_id`) REFERENCES `cities` (`id`) ON DELETE SET NULL,
CONSTRAINT `accounts_country_id_foreign` FOREIGN KEY (`country_id`) REFERENCES `countries` (`id`) ON DELETE SET NULL,
CONSTRAINT `accounts_province_id_foreign` FOREIGN KEY (`province_id`) REFERENCES `provinces` (`id`) ON DELETE SET NULL,
CONSTRAINT `accounts_school_id_foreign` FOREIGN KEY (`school_id`) REFERENCES `schools` (`id`) ON DELETE SET NULL
) ENGINE=InnoDB AUTO_INCREMENT=1000002 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Try creating a composite index on all three columns, e.g. CREATE INDEX idx_city_country_age ON table (city_id, country_id, age)
Indexes are to help your querying. So as suggested by Marko and agreed by others, having an index on (city_id, country_id, age) should significantly help. Now, yes, you will add other columns to the table, but are you trying to filter on 16+ criteria??? I doubt it. And of the queries you would be running, even if you have multiple composite indexes to help optimize those queries, how many columns might you need at any single time? 4, 5, 6? After that, I mean how granular do you plan on getting with your data. Country, State/Province, City, Town, Village, Neighborhood, Street, House? and by the time you are that low in the data, you would be at the page level data anyhow, wouldn't you?
So, your query of Country = 7, that already chops off a ton of stuff. Then to a given city within that country? Great, now you are at a finite level.
if you are going do be doing queries against large data that requires any aggregations, and the data is rather fixed from a historical perspective, maybe having pre-aggregated tables by some common elements might help long term.
FEEDBACK
The performance of querying is not necessarily where you will be hit, it would be in the inserts, updates, deletes as whatever may change has to update all the indexes on the table - single or composite. If you are getting more than 5 columns in an index, ask yourself, really??? How granular is it that you need for the index to be optimized. Querying out the data should be very fast with proper indexes. Updating indexes is also quick, but if you are dealing with millions of inserts in a month, quarter, year? The user doing theirs may have a slight delay ( 1/4 second?) but adding up a million seconds starts to get delay. But again, over what period of time would insert/update/delete be done anyhow.
You asked what will bring the query time down, and using a composite index will do that. Searching a single composite index is faster than searching several single-column indexes and performing an intersection merge on the results.
You commented that you will be adding more columns in the future, and there will eventually be more than 16 columns.
You don't have to add ALL the columns to the composite index!
Index design is not magic. It follows rules. You will create indexes designed to support specific queries that you need to run. You don't add add columns to an index unless they help the given query. You may have multiple composite indexes in the table, created to help different queries.
You might like my presentation How to Design Indexes, Really (or the video).
Re your comment:
I won't know every possible query combination ahead of time.
Yes, that's true. You can only create indexes for queries that you know. Other queries will not be optimized. If you need to optimize queries in the future, you might need to add new indexes to support them.
In my experience, this happens regularly, and I address this in the presentation. You will review your queries from time to time, because of course your application code changes and the queries you need change. You may add new indexes, or replace an index with a different index, or drop indexes that are no longer needed.

is there any difference with joint primary key order?

I am curious about that , is there any difference with joint primary key order?
For example, is there any difference between the two tables' primary key? the key order would make no difference on the table?
CREATE TABLE `Q3` (
`user_id` VARCHAR(20) NOT NULL,
`retweet_id` VARCHAR(20) NOT NULL,
PRIMARY KEY (`user_id`,`retweet_id`)
)
vs
CREATE TABLE `Q3` (
`user_id` VARCHAR(20) NOT NULL,
`retweet_id` VARCHAR(20) NOT NULL,
PRIMARY KEY (`retweet_id`,`user_id`)
)
It would make difference in an index structure.
In composite index the index value consists of several values that go one after another. And the order determines what queries can be optimized using this particular index.
IE:
For the index created as
PRIMARY KEY (`user_id`,`retweet_id`)
The query like WHERE user_id = 42 will be optimized (not guaranteed, but technically possible), whereas for the query WHERE retweet_id = 4242 it won't be.
PS: it's a good idea to always have an artificial primary key, like a sequence (or an autoincrement column in case of mysql), instead of using natural primary keys. It would be better because the primary key is a clustered key, which means it defines how rows are physically stored in pages on disk. Which means it's a good idea for a PK to be monotonously growing (or decreasing, doesn't matter)
The order does affect how the index is used in queries. When you use multiple columns, each column is a sub-tree of the preceding column.
In your first case (user_id, retweet_id) - if you searched the index for user_id 1, you then have all the retweet_ids under that.
Subsequently if you wish to search for only retweet_id=7 (for all users) - the index cannot be used because you need to first step through each users item in the index.
So if you wish to query for user_id, or retweet_id individually (without the other), put that column first. If you need both you could consider adding a secondary index.
There are also limitations for range scans, you can only effectively use the last column queried for the range scan. You can read more about all of this here:
http://dev.mysql.com/doc/refman/5.6/en/multiple-column-indexes.html
Additionally if using InnoDB, the tables are stored in order of the PRIMARY KEY. This might matter for performance depending on how you query your data.

Why MYSQL doesn't use the index for the same query if I query more columns?

I have the following table:
create table stuff (
id mediumint unsigned not null auto_increment primary key,
title varchar(150) not null,
link varchar(250) not null,
time timestamp default current_timestamp not null,
content varchar(1500)
);
If I EXPLAIN the query
select id from stuff order by id;
then it says it uses they primary key as an index for ordering the results. But with this query:
select id,title from stuff order by id;
EXPLAIN says no possible keys and it resorts to filesort.
Why is that? Isn't the data of a certain row stored together in the database? If it can order the results using the index when I'm querying only the id then why adding an other column to the query makes a difference? The primary key identifies the row already, so I think it should use the primary key for ordering in the second case too.
Can you explain why this is not the case?
Sure, because it is more performant in this query: you need to read full index and after that iteratively read row by row from data. This is extremely unefficient. Instead of this mysql just prefers to read the data right from the data file.
Also, what kind of storage engine do you use? Seems like mysam.
For this case innodb would be more efficient, since it uses clustered indexes over primary key (which is monotonously growing in your case).

Query optimization

SELECT nar.name, nar.reg, stat.lvl
FROM members AS nar
JOIN stats AS stat
ON stat.id = nar.id
WHERE nar.ref = 9
I have indexes on id in both tables and I have index referavo either. But still, it checks all rows in stats table (I use Explain to get this information), but in members table it checks only one row how it supposed to be. What's wrong with stats table? Thank you very much.
CREATE TABLE `members` (
`id` int(11) NOT NULL
`ref` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT
CREATE TABLE `stats` (
`id` int(11) NOT NULL AUTO_INCREMENT
PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=37 DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE stat ALL PRIMARY NULL NULL NULL 22
1 SIMPLE nar eq_ref PRIMARY PRIMARY 4 table_nme.stat.id 1 Using where
Your tables are ridiculously small - just 23 rows is tiny.
MySQL chooses different query plans depending on how many rows there are in the table and based on how many it estimates will be selected (from the statistics). You should performance test your queries with realistic data - both the amount of data and the distribution of values in the data should be as realistic as possible. Otherwise the query plan MySQL chooses in testing might not be the same the actual query plan for your live system.
Your tables are so small that using an index could be slower than just checking the table directly. Remember that checking data that is already in memory is fast, but reads are slow. Accessing an index can require an extra read - first the index has to be fetched and read to find which rows to select, then if your index isn't a covering index the relevant rows in the table have to be fetched and read to get the values that aren't in the index. MySQL is perfectly entitled to not use an index even if one is available if it believes that doing so will result in a slower plan.
Put some more rows in your table (thousands) and try running EXPLAIN again. You will probably find that when you have more rows that the PRIMARY KEY index will be used for the join.
MySQL can use only one index at a time per table, thus it sees the member row using the index, and then performs a sequential search for the ID.
You have to create a multi columns index for the members table
CREATE INDEX idref ON members(id,ref);
please try the reverse one as well if it doesn't get better (first: drop index idref on members)
CREATE INDEX idref ON members(ref,id);
(I cannot try it myself now)