Not sure what index to apply to my SQL table - mysql

I have a normalized database structure, which I will try to explain.
3 tables:
profiles
keywords
keyword_profile
Every profile on my website can have a various number of keywords linked to it. Every keyword gets an ID-number in the keywords-table. Every profile gets an ID-number in the profiles table. The keyword_profile table has about 600k rows with a keywordID linked to a profileID.
I have a PRIMARY index on my ID column in the profiles table.
I have a PRIMARY index on my ID column in the keywords table.
I have a UNIQUE index on my keyword-name column in the keywords table.
I have a PRIMARY index on the keyword_profile table like this: (profile_id, keyword_id)
I have a index on the profile_ID column in the keyword_profile table
Next: when I execute the following query (the specific keyword is named 'dienst'):
EXPLAIN SELECT profiles.hoofdrubriek, profiles.plaats, profiles.bedrijfsnaam, profiles.gemeente, profiles.bedrijfsslogan, profiles.straatnaam, profiles.huisnummer, profiles.postcode, profiles.telefoonnummer, profiles.fax,profiles.email, profiles.website, profiles.bedrijfslogo
FROM profiles
INNER JOIN profile_dienst ON profiles.ID = profile_dienst.profile_id
INNER JOIN diensten ON profile_dienst.dienst_id = diensten.ID
WHERE (
diensten.dienst = 'Aannemersdiensten'
)
ORDER BY profiles.grade DESC , profiles.bedrijfsnaam
I get the following result. It scans all 600k rows!! That's not really the result I was hoping for.. What indexes can I apply so it won't scan the entire table?
id - select_type - table - type - key - rows - Extra
1 - SIMPLE - diensten - const - dienst - 1 - Using temporary; Using filesort
1 - SIMPLE - profile_dienst - index - PRIMARY - 662000 - Using where; Using index
1 - SIMPLE - profiles - eq_ref - PRIMARY - 1 - Using where
Thanks for the help guys!!
EDIT: Added SHOW CREATE TABLE results:
CREATE TABLE `diensten` (
`ID` mediumint(9) NOT NULL AUTO_INCREMENT,
`dienst` varchar(255) NOT NULL,
PRIMARY KEY (`ID`),
UNIQUE KEY `dienst` (`dienst`)
) ENGINE=MyISAM AUTO_INCREMENT=1903 DEFAULT CHARSET=utf8
CREATE TABLE `profile_dienst` (
`profile_id` varchar(20) NOT NULL,
`dienst_id` varchar(20) NOT NULL,
PRIMARY KEY (`dienst_id`,`profile_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
CREATE TABLE `profiles` (
`ID` varchar(255) NOT NULL DEFAULT '',
`username` varchar(255) DEFAULT NULL,
...more columns...,
`grade` int(5) NOT NULL,
PRIMARY KEY (`ID`),
KEY `IDX_TIMESTAMP` (`timestamp`),
KEY `IDX_NIEUW` (`nieuw`),
KEY `IDX_HOOFDRUBRIEK` (`hoofdrubriek`),
KEY `bedrijfsnaam` (`bedrijfsnaam`),
KEY `grade` (`grade`),
KEY `gemeente` (`gemeente`),
KEY `plaats` (`plaats`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8

I think it's fine, it's scanning thought all the values of profile_dienst because MySql has to look for the diensten.ID. The good news is that is using and index.
You can check for more info about the Extra column of the MySql explain plan here: EXPLAIN Extra Information

you need to do normalization better.. you are JOINING an varrhar(20) with an mediumint profile_dienst.dienst_id = diensten.ID thats why an FULL INDEX SCAN is needed.. that is what Explain columns type:index and Extra: "using index" means.. MySQL only can use indexes if the datatypes are the same
little demo with an inner self join http://sqlfiddle.com/#!2/1ef09/4 when MySQL can use indexes.. INT, SMALLINT, CHAR and VARCHAR datatypes used.. here you can see that JOIN ON and INT and SMALLINT can use indexes and an JOIN on CHAR and VARCHAR also.. but mixing INT with CHAR MySQL can't use indexes and an FULL table scan is needed look at TYPE: ALL

Related

Indexes for a large MYSQL table

hope you will allow me to pick your brains so I can gain some knowledge in the process.
We have 3 tables - data_product, data_issuer, data_accountbalance
CREATE TABLE `data_issuer` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`issuer_name` varchar(128) NOT NULL
PRIMARY KEY (`id`)
) ENGINE=InnoDB
CREATE TABLE `data_product` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL,
`issuer_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `data_product_name_issuer_id_260fec65_uniq` (`name`,`issuer_id`),
KEY `data_product_issuer_id_d07fa696_fk_data_issuer_id` (`issuer_id`),
CONSTRAINT `data_product_issuer_id_d07fa696_fk_data_issuer_id` FOREIGN KEY
(`issuer_id`) REFERENCES `data_issuer` (`id`)
) ENGINE=InnoDB
CREATE TABLE `data_accountbalance` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`date` date NOT NULL,
`nominee_name` varchar(128) NOT NULL,
`beneficiary_name` varchar(128) NOT NULL,
`nominee_id` varchar(128) NOT NULL,
`account_id` varchar(16) NOT NULL,
`product_id` int(11) NOT NULL,
`register_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `data_accountbalance_date_product_id_nominee__7b8d2c6a_uniq` (`date`,`product_id`,`nominee_id`,`beneficiary_name`),
KEY `data_accountbalance_product_id_nominee_id_date_8ef8754f_idx` (`product_id`,`nominee_id`,`date`),
KEY `data_accountbalance_register_id_4e78ec16_fk_data_register_id` (`register_id`),
KEY `data_accountbalance_product_id_date_nominee_i_c3a41e39_idx` (`product_id`,`date`,`nominee_id`,`beneficiary_name`,`balance_amount`),
CONSTRAINT `data_accountbalance_product_id_acfb18f6_fk_data_product_id` FOREIGN KEY (`product_id`) REFERENCES `data_product` (`id`),
CONSTRAINT `data_accountbalance_register_id_4e78ec16_fk_data_register_id` FOREIGN KEY (`register_id`) REFERENCES `data_register` (`id`)
) ENGINE=InnoDB
When running the query below, the system takes about an hour to respond -
SELECT SQL_NO_CACHE *
from data_product
INNER JOIN `data_issuer` ON (`data_issuer`.`id` = `data_product`.`issuer_id`)
INNER JOIN `data_accountbalance` ON (`data_accountbalance`.`product_id` = `data_product`.`id`)
LIMIT 100000000;
Both data_issuer and data_product only have few 100 records in them, but the data_accountbalance is huge with about 15,384,358 records.
The explain plan produced is below -
# id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE data_product ALL PRIMARY,data_product_issuer_id_d07fa696_fk_data_issuer_id 459 100
1 SIMPLE data_issuer eq_ref PRIMARY PRIMARY 4 pnl.data_product.issuer_id 1 100
1 SIMPLE data_accountbalance ref data_accountbalance_product_id_nominee_id_date_8ef8754f_idx,data_accountbalance_product_id_date_nominee_i_c3a41e39_idx data_accountbalance_product_id_date_nominee_i_c3a41e39_idx 4 pnl.data_product.id 493 100
Can someone help tune the query so it does not take an hour to run please? Appreciate any pointers you might have for me.
If your query is literally what you are showing there... Then thats the problem. It has no WHERE clause.
That query would literally return 15,384,358 results. As the two smaller tables are typical domain tables with NOT NULL relations all the way across, it will return 1 to 1 results for every row in data_accountbalance.
The actual time cost will probably be in creating a Massive temp table (tho I'm not sure about that). Just to download the entire database, all 3 tables, you could look into optimize your temp table MySQL config to possibly speed this up, OR preferably make it so that when you start executing the query that you can read the results as MySQL gets them ready (avoids a temp table). Alternatively, maybe your script that runs this query is trying to read the whole data set into memory, which takes a long time?
Is there a particular reason to download All the data? Usually you just download the data you are meaning to operate on. Or have MySQL do the grouping, summing, etc then return the answer you wanted based on All the data.
How many rows did you expect the query to return? If you are thinking something less than 15 million, then the answer is to add some kind of WHERE statement, or an aggregate function. Depending on what table and column in you use to reduce the result set, those columns will have to be indexed.
I hope this helps. :)

mysql select with order by using filesort no index used

Sorry fot long post but this is really strange and I am close to give it up. 2 tables:
CREATE TABLE `endu_results` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`base_name` varchar(200) NOT NULL,
`base_nr` int(11) DEFAULT NULL,
`base_yob` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `endu_results_206a6355` (`base_name`),
KEY `endu_results_63df4402` (`base_nr`),
KEY `base_yob` (`base_yob`)
) ENGINE=InnoDB AUTO_INCREMENT=3424028 DEFAULT CHARSET=utf8;enter code here
and 2nd:
CREATE TABLE `endu_resultinterest` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`result_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `endu_resultinterest_3b529087` (`result_id`),
CONSTRAINT `result_id_refs_id_19e24435` FOREIGN KEY (`result_id`) REFERENCES `endu_results` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=48590 DEFAULT CHARSET=utf8;
There are about 2mln records in endu_resultstable and less then 100K i endu_resultinterest. I have slow query:
explain select base_yob from endu_resultinterest
inner join endu_results
on (endu_results.id = endu_resultinterest.result_id)
order by endu_results.base_yob;
1 SIMPLE endu_resultinterest index endu_resultinterest_3b529087 endu_resultinterest_3b529087 4 NULL 47559 Using index; Using temporary; Using filesort
The question is: Why mysql is using this index: endu_resultinterest_3b529087 - but it should use base_yob - this is where sorting is requested ?
To test it further I have manaully created 2 additional identical tables endu_testresults and endu_testresultintrest and filled those with some records:
CREATE TABLE `endu_testresults` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`base_yob` int(11) DEFAULT NULL,
`base_name` varchar(200) NOT NULL,
`base_nr` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `endu_testresults_a65b2616` (`base_yob`),
KEY `endu_testresults_ba0ab39c` (`base_name`),
KEY `endu_testresults_d75ba04d` (`base_nr`)
) ENGINE=InnoDB AUTO_INCREMENT=20 DEFAULT CHARSET=utf8;
So I go again for explain:
explain select base_yob from endu_testresultinterest
inner join endu_testresults
on (endu_testresults.id = endu_testresultinterest.result_id)
order by endu_testresults.base_yob;
and suprise suprise:
1 SIMPLE endu_testresults index PRIMARY endu_testresults_a65b2616 5 NULL 19 Using index
Index sort column base_yob (endu_testresults_a65b2616) is now used.
Why is that in one case index is used and in other I got 'using filesort;using temporary ? Does size matters ? I will try to copy records from one to another but do not get it with indexes. MySql is 5.6.16
Short answer: Because it is faster.
Long answer...
Your EXPLAINs seem to be incomplete -- I would expect 2 lines in each.
The first table is 20 (70?) times as big as the second. The optimizer picked the smaller table to start with. Hence it is initially doing 1/20th the amount of work. The sort that comes later (ORDER BY ...) is much less work than if it had to do 20 times as much work to start with.
The output is only 48K rows, correct? And that is how many rows in the 2nd table, correct?
Your test tables did not have the same bigger/smaller ratio, did they? Hence the different EXPLAIN.

MySQL Query Optimization for large tables

I have a query that take 50 seconds
SELECT `security_tasks`.`itemid` AS `itemid`
FROM `security_tasks`
INNER JOIN `relations` ON (`relations`.`user_id` = `security_tasks`.`user_id` AND `relations`.`relation_type_id` = `security_tasks`.`relation_type_id` AND `relations`.`relation_with` = 3001 )
Records in security_tasks = 841321 || Records in relations = 234254
CREATE TABLE `security_tasks` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`itemid` int(11) DEFAULT NULL,
`relation_type_id` int(11) DEFAULT NULL,
`Task_id` int(2) DEFAULT '0',
`job_id` int(2) DEFAULT '0',
`task_type_id` int(2) DEFAULT '0',
`name` int(2) DEFAULT '0'
PRIMARY KEY (`id`),
KEY `itemid` (`itemid`),
KEY `relation_type_id` (`relation_type_id`),
KEY `user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1822995 DEFAULT CHARSET=utf8;
CREATE TABLE `relations` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`relation_with` int(11) DEFAULT NULL,
`relation_type_id` int(11) DEFAULT NULL,
`manager_level` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`),
KEY `relation_with` (`relation_with`),
KEY `relation_type_id` (`relation_type_id`)
) ENGINE=InnoDB AUTO_INCREMENT=1082882 DEFAULT CHARSET=utf8;
what can i do to make it fast, like 1 or 2 seconds fast
EXPLAIN :
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE relations ref user_id,relation_with,relation_type_id relation_with 5 const 169 Using where
1 SIMPLE security_tasks ref relation_type_id,user_id user_id 5 transparent.relations.user_id 569 Using where
UPDATE :
adding a composite key minimized the time to 20 seconds
ALTER TABLE security_tasks ADD INDEX (user_id, relation_type_id) ; ALTER TABLE relations ADD INDEX (user_id, relation_type_id) ; ALTER TABLE relations ADD INDEX (relation_with) ;
The problem is when the relations table has large data for the selected user (relations.relation_with` = 3001 )
any ideas ?
Adjust your compound index slightly, don't do just two, but all three parts
ALTER TABLE relations ADD INDEX (user_id, relation_type_id, relation_with)
The index does not just have to be on the joined columns, but SHOULD be based on joined columns PLUS anything else that makes sense as querying criteria is concerned (within reason, takes time to learn more efficiencies). So, in the case suggested, you know the join on the user and type, but are also specific to the relation with... so that is added to the same index.
Additionally, your security task table, you could add the itemID to the index to make it a covering index (ie: covers the join conditions AND the data element(s) you want to retrieve). This too is a technique, and should NOT include all other elements in a query, but since this is a single column might make sense for your scenario. So, look into "covering indexes", but in essence, a covering index qualifies the join, but since it also has this "itemid", the engine does not have to go back to the raw data pages of the entire security tasks table to get that one column. It's part of the index so it grabs whatever qualified the join and comes along for the ride and you are done.
ALTER TABLE security_tasks ADD INDEX (user_id, relation_type_id, itemid) ;
And for readability purposes, especially with long table names, it's good to use aliases
SELECT
st.itemid
FROM
security_tasks st
INNER JOIN relations r
ON st.user_id = r.user_id
AND st.relation_type_id = r.relation_type_id
AND r.relation_with = 3001

Using temporary; using filesort.. slow query

i have a very simple query that im trying to optimize, its taking 2~5 secs to execute.
This is my CREATE TABLE
CREATE TABLE `artist` (
`id` INTEGER NOT NULL AUTO_INCREMENT,
`name` VARCHAR(100) character set utf8 NOT NULL,
`bio` MEDIUMTEXT character set utf8 DEFAULT NULL,
`hits` INTEGER NOT NULL,
PRIMARY KEY (`id`)
);
CREATE TABLE `album` (
`id` INTEGER NOT NULL AUTO_INCREMENT,
`artist_id` INTEGER NOT NULL,
`title` VARCHAR(100) character set utf8 NOT NULL,
`year` INTEGER,
`hits` INTEGER NOT NULL,
PRIMARY KEY (`id`),
KEY (`artist_id`)
);
CREATE TABLE `track` (
`id` INTEGER NOT NULL AUTO_INCREMENT,
`name` VARCHAR(100) character set utf8 NOT NULL,
`lyric` MEDIUMTEXT character set utf8,
`album_id` INTEGER NOT NULL,
`hits` INTEGER NOT NULL,
`date` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY (`album_id`)
);
ALTER TABLE `album` ADD FOREIGN KEY (artist_id) REFERENCES `artist` (`id`);
ALTER TABLE `track` ADD FOREIGN KEY (album_id) REFERENCES `album` (`id`);
and this is the query im running
SELECT DISTINCT artist.name, track.name
FROM track
LEFT JOIN album ON track.album_id = album.id
LEFT JOIN artist ON album.artist_id = artist.id
ORDER BY track.hits DESC
LIMIT 5
Explain selects show this:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE track ALL NULL NULL NULL NULL 103796 Using temporary; Using filesort
1 SIMPLE album eq_ref PRIMARY PRIMARY 4 lyrics.track.album_id 1
1 SIMPLE artist eq_ref PRIMARY PRIMARY 4 lyrics.album.artist_id 1
I'm new to MySQL but i guess using Using temporary; Using filesort is bad and thats why the query is very slow, can you guys hint me here? thanks!
update: The main problem here is that the very same song can be 5 times in the DB with different ID's, because the same song can be in different albums. If i dont use distinct, this doesnt happen, bust i must for this reason
This answer isn't 100% an answer for the original question. The original question is what came up when searching using the messages from my problem though, so just in case it helps someone else, I'll leave the solution for a problem that is closely related.
The "using temporary; using filesort" was actually a red herring and the index that was added was never getting used. The index was not getting used because one of the joined tables had a different character encoding on it than the other.
Converting all tables in the query so that they all used the same character encoding fixed it instantly.
(In our case converting a utf8 encoded table to a latin1 encoding)
Hope it helps someone.
You can get it to use an index by adding
create index idx_tracks_on_album_id_name_hits on track(album_id, name, hits);
And since you are doing a DISTINCT across two tables, there will be no index to possibly find the unique rows so it puts it into a temp table to get rid of the duplicates.
I think if you create an index on track.hits, you might get rid of "using temporary; using filesort", the reason for which might be because MySQL cannot find an index to do the sort.
ALTER TABLE `track`
ADD KEY `idx_hits` (`hits`);
Let me know if it worked.
why do you use DISTINCT? why do you use LEFT JOIN (insted of JOIN)?

MySQL gurus: Why 2 queries give different 'explain' index use results?

This query:
explain
SELECT `Lineitem`.`id`, `Donation`.`id`, `Donation`.`order_line_id`
FROM `order_line` AS `Lineitem`
LEFT JOIN `donations` AS `Donation`
ON (`Donation`.`order_line_id` = `Lineitem`.`id`)
WHERE `Lineitem`.`session_id` = '1'
correctly uses the Donation.order_line_id and Lineitem.id indexes, shown in this EXPLAIN output:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE Lineitem ref session_id session_id 97 const 1 Using where; Using index
1 SIMPLE Donation ref order_line_id order_line_id 4 Lineitem.id 2 Using index
However, this query, which simply includes another field:
explain
SELECT `Lineitem`.`id`, `Donation`.`id`, `Donation`.`npo_id`,
`Donation`.`order_line_id`
FROM `order_line` AS `Lineitem`
LEFT JOIN `donations` AS `Donation`
ON (`Donation`.`order_line_id` = `Lineitem`.`id`)
WHERE `Lineitem`.`session_id` = '1'
Shows that the Donation table does not use an index:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE Lineitem ref session_id session_id 97 const 1 Using where; Using index
1 SIMPLE Donation ALL order_line_id NULL NULL NULL 3
All of the _id fields in the tables are indexed, but I can't figure out how adding this field into the list of selected fields causes the index to be dropped.
As requested by James C, here are the table definitions:
CREATE TABLE `donations` (
`id` int(10) unsigned NOT NULL auto_increment,
`npo_id` int(10) unsigned NOT NULL,
`order_line_detail_id` int(10) unsigned NOT NULL default '0',
`order_line_id` int(10) unsigned NOT NULL default '0',
`created` datetime default NULL,
`modified` datetime default NULL,
PRIMARY KEY (`id`),
KEY `npo_id` (`npo_id`),
KEY `order_line_id` (`order_line_id`),
KEY `order_line_detail_id` (`order_line_detail_id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8
CREATE TABLE `order_line` (
`id` bigint(20) unsigned NOT NULL auto_increment,
`order_id` bigint(20) NOT NULL,
`npo_id` bigint(20) NOT NULL default '0',
`session_id` varchar(32) collate utf8_unicode_ci default NULL,
`created` datetime default NULL,
PRIMARY KEY (`id`),
KEY `order_id` (`order_id`),
KEY `npo_id` (`npo_id`),
KEY `session_id` (`session_id`)
) ENGINE=InnoDB AUTO_INCREMENT=23 DEFAULT CHARSET=utf8
I also did some reading about cardinality, and it looks like both the Donations.npo_id and Donations.order_line_id have a cardinality of 2. Hopefully this suggests something useful?
I'm thinking that a USE INDEX might solve the problem, but I'm using an ORM that makes this a bit tricky, and I don't understand why it wouldn't grab the correct index when the JOIN specifically names indexed fields?!?
Thanks for your brainpower!
The first explain has "uses index" at the end. This means that it was able to find the rows and return the result for the query by just looking at the index and not having to fetch/analyse any row data.
In the second query you add a row that's likely not indexed. This means that MySQL has to look at the data of the table. I'm not sure why the optimiser chose to do a table scan but I think it's likely that if the table is fairly small it's easier for it to just read everything than trying to pick out details for individual rows.
edit: I think adding the following indexes will improve things even more and let all of the join use indexes only:
ALTER TABLE order_line ADD INDEX(session_id, id);
ALTER TABLE donations ADD INDEX(order_line_id, npo_id, id)
This will allow order_line to to find the rows using session_id and then return id and also allow donations to join onto order_line_id and then return the other two columns.
Looking at the auto_increment values can I assume that there's not much data in there. It's worth noting that the amount of data in the tables will have an effect on the query plan and it's good practice to put some sample data in there to test things out. For more detail have a look in this blog post I made some time back: http://webmonkeyuk.wordpress.com/2010/09/27/what-makes-a-good-mysql-index-part-2-cardinality/