Which index should I choose?(Mysql) - mysql

Table:
CREATE TABLE `table1` (
`f1` int(11) NOT NULL default '0',
`f2` int(4) NOT NULL default '0',
`f3` bigint(20) NOT NULL default '0',
PRIMARY KEY (`f1`)
) TYPE=MyISAM
Query:
select `f1` from table1 where `f2`=123 order by `f3` desc;
I want create a "covering index" for this query
ALTER TABLE `table1` ADD INDEX (`f2`,`f3`,`f1`);
or
ALTER TABLE `table1` ADD INDEX (`f2`,`f1`,`f3`);
which should I choose?

The first one. MySQL can use either index to obtain the result set without needing to read from the actual table. The first index is slightly more efficient because it is not necessary to perform the extra step of re-ordering the rows.

for you query you need an index on f2 only.
if you a query with a whereclause like "where f1=12 and f2=15", you might want an index on f1 and f2 too. However, it might be that the primary key will give you results faster, depending on the data and complete query.
you (might) need an index covering the 3 fields if you have queries ranging on the 3 (in the where clause).
in 15 years, I never faced the need to create an index for ordering results only. This operation is quite fast. What is slow is finding the rows (the where clause), and the different set matches (joins).
Now if you are not sure, create both. Do you query and check with explain_plan which one mysql uses. Then drop the other ^^.

Related

why mysql still use index to get data when use the 2nd col of multiple column index in mysql?

Why mysql still use index to get data when use the 2nd col of multiple column index in mysql?
We know mysql use leftmost match rule, but here I didn't use the 1st col and I use the 2nd col, the two select operation results bellow show that mysql sometimes use index and sometimes didn't use it. Why? In addtion, my mysql version is 5.6.17.
1.create table:
CREATE TABLE `student` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`cid` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `name_cid_INX` (`name`,`cid`)
) ENGINE=InnoDB AUTO_INCREMENT=101 DEFAULT CHARSET=utf8
2.run select:
EXPLAIN SELECT * FROM student WHERE cid=1;
3. result:
Result with index
It shows that mysql use index to get data.
The following is another table.
1.create table:
CREATE TABLE `test_table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(45) DEFAULT NULL,
`birthday` datetime DEFAULT NULL,
`address` varchar(45) DEFAULT NULL,
`phone` varchar(45) DEFAULT NULL,
`note` varchar(45) DEFAULT NULL,
`age` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `NAME` (`name`),
KEY `AGE` (`age`),
KEY `LeftMostPreFix` (`name`,`address`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
2.run select:
explain SELECT * FROM test.test_table where address = '东京'
3.result:
Result without index
On the contrary here it shows that mysql didn't use index to get data.
Comparing above two results, I feel puzzled why the 1st result use index which is against leftmost match rule.
From the mysql manual
it is possible that key will name an index that is not present in the possible_keys value. This can happen if none of the possible_keys indexes are suitable for looking up rows, but all the columns selected by the query are columns of some other index. That is, the named index covers the selected columns, so although it is not used to determine which rows to retrieve, an index scan is more efficient than a data row scan.
So while there is a key used here, it's not actually used in the normal sense. In some situations it is still more efficient to use that as a table scan (in your first example), in others it might not be (in your second)
Most of the times these things are decided by the optimizer based on several things (usage of the table, etc).
Best thing to remember is that here you can NOT "use the index", and that's why there is no index in possible keys. You can only use the index if the first column is in there.
Neither index in either Case starts with what is in the WHERE, so there will be a full scan of table or of index.
Case 1: The index is "covering", so it is a tossup as to which (table scan vs index scan) is better. The Optimizer happened to pick the secondary index. EXPLAIN FORMAT=JSON SELECT ... may have enough details to explain 'why' in this case.
Case 2: Because of * (in SELECT *), the secondary index is at a disadvantage -- it is not "covering", so the processing will bounce back and forth between the index and the data. So it is clearly better to simply scan the table.
Instead of trying to understand EXPLAIN (in these cases), turn the question around: "What is the optimal index for this query against this table?" Then follow the guidelines here.

Mysql, In my matrix, an index is not used

given this table:
CREATE TABLE `matrix` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`city1_id` int(10) unsigned NOT NULL DEFAULT '0',
`city2_id` int(10) unsigned NOT NULL DEFAULT '0',
`timeinmin` mediumint(8) unsigned NOT NULL DEFAULT '0',
`distancem` mediumint(8) unsigned NOT NULL DEFAULT '0',
`OWNER` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `city12_index` (`city1_id`,`city2_id`),
UNIQUE KEY `city21_index` (`city2_id`,`city1_id`),
KEY `city1_index` (`city1_id`),
KEY `city2_index` (`city2_id`),
KEY `ownerIndex` (`OWNER`),
CONSTRAINT `PK_city_city1` FOREIGN KEY (`city1_id`) REFERENCES `city` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `PK_city_city2` FOREIGN KEY (`city2_id`) REFERENCES `city` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=5118409 DEFAULT CHARSET=utf8;
there are very huge amount of datas.
This SQL runs very fast:
select count(*) from city_matrix where owner=1
since there is index on "owner"
select count(*) from city_matrix where owner=1 order by id
this also runs very fast. But this:
select count(*) from city_matrix where owner=1 order by city1_id
requires some seconds, BUT there is index on city1_id too!
The explain tells this:
1, 'SIMPLE', 'city_matrix', '', 'ref', 'ownerIndex', 'ownerIndex', '4', 'const', 169724, 100.00, ''
This is a great question. MySQL determines the right index based on many different cases. Its main goal is to find the most suitable index that can retrieve the data fast.
select count(*) from city_matrix where owner=1 order by id
In this query, MySQL determined that where owner=1 reduced the results to a small enough number that ordering by ID was relatively easy. For example, if ID is also a key (primary/unique/index), which I suspect it is, MySQL could take advantage of ID for sorting.
In case of this:
select count(*) from city_matrix where owner=1 order by city1_id
MySQL can still filter out all the records for owner but will take time to shuffle all the city1_id data so that you receive sorted result. Since it took time, show processlist during that time could have showed you that the query was reordering data.
To help MySQL do the job faster, we can use something called a covering index. Covering index has all fields used in the query so that MySQL just has to read through the index to get you the data without having to touch the underlying table. A composite index on owner and city1_id will help MySQL use one single index to filter data, and that same index again to sort data and then do a count on it.
So, let's create the covering index:
create index idx_city_matrix_city1_owner on city_matrix(owner, city1_id)
As you noticed, MySQL took some time to make the index and once the index was ready, it could zip through data pretty quickly to give you counts.
EDIT: It is important to note that when you do count(*) like the statements about do, you don't need ordering. The resultset is scalar - just one value. Ordering by any field does not impact your count. For example, count all the fruits on the table will give you the same results as count all the fruits on the table ordered by its size.
The process for retrieval and index application is as follows:
The intermediate result which is retrieved by MySQL for the key owner is "stored" in a temporary table (either in memory or on disk depending on the size of the result).
Based on histogram data on the intermediate result an index can be applied. If the data is not unique enough, the index can be discarded as not useful (for example: There are only 5 cities in this 169k results).
Work around:
Apply the index with a hint: This is considered poor since it can lead to unwanted index use speeding up one query and slowing down the next one (Yes, an index can make a query slower);
Create a multi column index which contains both the owner and city1_id.
One last remark
An order by on a COUNT(*) always slows down everything since the order by does not change anything of your result.

MySQL - multiple column index

I'm learning MySQL index and found that index should be applied to any column named in the WHERE clause of a SELECT query.
Then I found Multiple Column Index vs Multiple Indexes.
First Q, I was wondering what is multiple column index. I found code bellow from Joomla, is this Multiple Column Index?
CREATE TABLE `extensions` (
`extension_id` INT(11) NOT NULL AUTO_INCREMENT,
`name` VARCHAR(100) NOT NULL,
`type` VARCHAR(20) NOT NULL,
`element` VARCHAR(100) NOT NULL,
`folder` VARCHAR(100) NOT NULL,
`client_id` TINYINT(3) NOT NULL,
... ...
PRIMARY KEY (`extension_id`),
// does code below is multiple column index?
INDEX `element_clientid` (`element`, `client_id`),
INDEX `element_folder_clientid` (`element`, `folder`, `client_id`),
INDEX `extension` (`type`, `element`, `folder`, `client_id`)
)
Second Q, am I correct if thinking that one Multiple Column Index is used on one SELECT ?
SELECT column_x WHERE element=y AND clinet_id=y; // index: element_clientid
SELECT ex.col_a, tb.col_b
FROM extensions ex
LEFT JOIN table2 tb
ON (ex.ext_id = tb.ext_id)
WHERE ex.element=x AND ex.folder=y AND ex.client_id=z; // index: element_folder_clientid
General rule of thumb for indexes is to slap one onto any field used in a WHERE or JOIN clause.
That being said, there are some optimizations you can do. If you KNOW that a certain combination of fields are the only one that will ever be used in WHERE on a particular table, then you can create a single multi-field key on just those fields, e.g.
INDEX (field1, field2, field5)
v.s.
INDEX (field1),
INDEX (field2),
INDEX (field5)
A multi-field index can be more efficient in many cases, v.s having to scan multiple indexes. The downside is that the multi-field index is only usable if the fields in question are actually used in a WHERE clause.
With your sample queries, since element and field_id are in all three indexes, you might be better off splitting them off into their own dedicated index. If these are changeable fields, then it's better to keep it their own dedicated index. e.g. if you ever have to change field_id in bulk, the DB has to update 3 different indexes, v.s. updating just one dedicated one.
But it all comes down to benchmarking - test your particular setup with various index setups and see which performs best. Rules of thumbs are handy, but don't work 100% of the time.

mysql where + group by very slow

one question that I should be able to answer myself but I don't and I also don't find any answer in google:
I have a table that contains 5 million rows with this structure:
CREATE TABLE IF NOT EXISTS `files_history2` (
`FILES_ID` int(10) unsigned DEFAULT NULL,
`DATE_FROM` date DEFAULT NULL,
`DATE_TO` date DEFAULT NULL,
`CAMPAIGN_ID` int(10) unsigned DEFAULT NULL,
`CAMPAIGN_STATUS_ID` int(10) unsigned DEFAULT NULL,
`ON_HOLD` decimal(1,0) DEFAULT NULL,
`DIVISION_ID` int(11) DEFAULT NULL,
KEY `DATE_FROM` (`DATE_FROM`),
KEY `FILES_ID` (`FILES_ID`),
KEY `CAMPAIGN_ID` (`CAMPAIGN_ID`),
KEY `CAMP_DATE` (`CAMPAIGN_ID`,`DATE_FROM`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
When I execute
SELECT files_id, min( date_from )
FROM files_history2
WHERE campaign_id IS NOT NULL
GROUP BY files_id
the query rests with status "Sending data" for more than eight hours (then I killed the process).
Here the explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE files_history2 ALL CAMPAIGN_ID,CAMP_DATE NULL NULL NULL 5073254 Using where; Using temporary; Using filesort
I assume that I generated the necessary keys but then the query should take that long, does it?
I would suggest a different index... Index on (Files_ID, Date_From, Campaign_ID)...
Since your group by is on Files_ID, you want THOSE grouped. Then the MIN( Date_From), so that is in second position... Then FINALLY the Campaign_ID to qualify for not null and here's why...
If you put all your campaign IDs first, great, get all the NULLs out of the way... Now, you have 1,000 campaigns and the Files_ID spans MANY campaigns and they also span many dates, you are going to choke.
By the index I'm projecting, by the Files_ID first, you have each "files_id" already ordered to match your group by. Then, within that, all the earliest dates are at the top of the indexed list... great, almost there, then, by campaign ID. Skip over whatever NULL may be there and you are done, on to the next Files_ID
Hope this makes sense -- unless you have TONs of entries with NULL value campaigns.
Also, by having all 3 parts of the index matching the criteria and output columns of your query, it never has to go back to the raw data file for the data, it gets it all from the index directly.
I'd create a covering index (CAMPAIGN_ID, files_id, date_from) and check that performance. I suspect your issue is due to the grouping not and date_from not being able to use the same index.
CREATE INDEX your_index_name ON files_history2 (CAMPAIGN_ID, files_id, date_from);
If this works you could drop the point index CAMPAIGN_ID as it's included in the composite index.
Well the query is slow due to the aggregation ( function MIN ) along with grouping.
One of the solution is altering your query by moving the aggregating subquery from the WHERE clause to the FROM clause, which will be lot faster than the approach you are using.
try following:
SELECT f.files_id
FROM file_history2 AS f
JOIN (
SELECT campaign_id, MIN(date_from) AS datefrom
FROM file_history2
GROUP BY files_id
) AS f1 ON f.campaign_id = f1.campaign_id AND f.date_from = f1.datefrom;
This should have lot better performance, if doesn't work temporary table would only be the choice to go with.

Optimizing MySQL table structure. Advice needed

I have these table structures and while it works, using EXPLAIN on certain SQL queries gives 'Using temporary; Using filesort' on one of the table. This might hamper performance once the table is populated with thousands of data. Below are the table structure and explanations of the system.
CREATE TABLE IF NOT EXISTS `jobapp` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`fullname` varchar(50) NOT NULL,
`icno` varchar(14) NOT NULL,
`status` tinyint(1) NOT NULL DEFAULT '1',
`timestamp` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `icno` (`icno`)
) ENGINE=MyISAM;
CREATE TABLE IF NOT EXISTS `jobapplied` (
`appid` int(11) NOT NULL,
`jid` int(11) NOT NULL,
`jobstatus` tinyint(1) NOT NULL,
`timestamp` int(10) NOT NULL,
KEY `jid` (`jid`),
KEY `appid` (`appid`)
) ENGINE=MyISAM;
Query I tried which gives aforementioned statement:
EXPLAIN SELECT japp.id, japp.fullname, japp.icno, japp.status, japped.jid, japped.jobstatus
FROM jobapp AS japp
INNER JOIN jobapplied AS japped ON japp.id = japped.appid
WHERE japped.jid = '85'
AND japped.jobstatus = '2'
AND japp.status = '2'
ORDER BY japp.`timestamp` DESC
This system is for recruiting new staff. Once registration is opened, hundreds of applicant will register in a single time. They are allowed to select 5 different jobs. Later on at the end of registration session, the admin will go through each job one by one. I have used a single table (jobapplied) to store 2 items (applicant id, job id) to record who applied what. And this is the table which causes aforementioned statement. I realize this table is without PRIMARY key but I just can't figure out any other way later on for the admin to search specifically which job who have applied.
Any advice on how can I optimize the table?
Apart from the missing indexes and primary keys others have mentioned . . .
This might hamper performance once the
table is populated with thousands of
data.
You seem to be assuming that the query optimizer will use the same execution plan on a table with thousands of rows as it will on a table with just a few rows. Optimizers don't work like that.
The only reliable way to tell how a particular vendor's optimizer will execute a query on a table with thousands of rows--which is still a small table, and will probably easily fit in memory--is to
load a scratch version of the
database with thousands of rows
"explain" the query you're interested
in
FWIW, the last test I ran like this involved close to a billion rows--about 50 million in each of about 20 tables. The execution plan for that query--which included about 20 left outer joins--was a lot different than it was for the sample data (just a few thousand rows).
You are ordering by jobapp.timestamp, but there is no index for timestamp so the tablesort (and probably the temporary) will be necessary try adding and index for timestamp to jobapp something like KEY timid (timestamp,id)