Quite a simple question. But i'm a little bit lost when it come to sql optimization and index, i'm learning.
Query
SELECT A.*, count(A.ID) as count
FROM tableB B
JOIN tableA A ON A.ID = B.ID
WHERE B.otherID=xx and B.value='test' and B.languageID=3
Table A
CREATE TABLE `tableA` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`info1` varchar(64) NOT NULL default '',
`info2` varchar(64) NOT NULL default '',
PRIMARY KEY (`ID`)
) TYPE=MyISAM
Table B
CREATE TABLE `tableB` (
`ID` int(11) NOT NULL default '0',
`otherID` int(11) NOT NULL default '0',
`value` varchar(64) NOT NULL default '',
`languageID` int(11) NOT NULL default '0',
PRIMARY KEY (`ID`,`otherID`,`languageID`)
) TYPE=MyISAM
So the query is quite simple, i'm looking for the fields with a specific id and value in the table B, and i'm doing a join on table A because i need some infos which are in there.
I guess the query itself can't be optimized, but maybe i can speed up thing if i create an index, an index on (B.otherID,B.value) maybe ?
Thanks for you lights!
Normally the name ID is used for the PRIMARY KEY. A PRIMARY KEY is necessarily Unique. Yet you say
PRIMARY KEY (`ID`,`otherID`,`languageID`)
Is ID not unique, but this triple is? (Just checking.)
Back to your question...
WHERE B.otherID=xx and B.value='test' and B.languageID=3
Says that B needs those 3 columns in a composite index in any order. With that, the Optimizer will start with B, quickly find the row(s) needed there. Then it will move over to A, which already has an index on ID to handle ON A.ID = B.ID.
My Cookbook on creating indexes.
The normal pattern is COUNT(*). COUNT(x) has the extra burden of checking all the x values for being not NULL. (I suspect you did not need that.)
Use InnoDB, not MyISAM.
Related
Among these two schemas solutions about user matching which could be the best with big data?
Solution 1:
CREATE TABLE `user_matches` (
`user_id_1` int(11) NOT NULL,
`user_id_2` int(11) NOT NULL,
`like_user_1` tinyint(1) DEFAULT '0',
`like_user_2` tinyint(1) DEFAULT '0',
PRIMARY KEY (`user_id_1`,`user_id_2`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
To select the matching among two users I should write this query:
SELECT *
FROM user_matches
WHERE (user_id_1 = 123 OR user_id_2 = 123) AND (like_user_1 = 1 AND like_user_2 = 1)
PS: Imagine that like_user_1 and like_user_2 are both indexed
Solution 2:
CREATE TABLE `user_matches` (
`user_id` int(11) NOT NULL,
`user_id_liked` int(11) NOT NULL,
PRIMARY KEY (`user_id`,`user_id_liked`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
To select the matching among users I should write this query:
SELECT me.user_id_liked
FROM user_matches me
INNER JOIN user_matches you ON me.user_id = you.user_id_liked
AND you.user_id = me.user_id_liked
AND me.user_id = 123
I think that the 2nd solution is the best for the schema and for querying because from and joins clauses are executed before where clause, but on the same times in the first solution I don't need to join tables.
I you index like_user_1 and like_user_2 on solution 1 queries should be fast.
I would test both solutions and compare execution plans.
EXPLAIN and EXPLAIN EXTENDED will be useful.
I am optimising my queries and found something I can't get my head around.
I am using the following query to select a bunch of categories, combining them with an alias from a table containing old and new aliases for categories:
SELECT `c`.`id` AS `category.id`,
(SELECT `alias`
FROM `aliases`
WHERE category_id = c.id
AND `old` = 0
AND `lang_id` = 1
ORDER BY `id` DESC
LIMIT 1) AS `category.alias`
FROM (`categories` AS c)
WHERE `c`.`status` = 1 AND `c`.`parent_id` = '11';
There are only 2 categories with a value of 11 for parent_id, so it should look up 2 categories from the alias table.
Still if I use EXPLAIN it says it has to process 48 rows. The alias table contains 1 entry per category as well (in this case, it can be more). Everything is indexed and if I understand correctly therefore it should find the correct alias immediately.
Now here's the weird thing. When I don't compare the aliases by the categories from the conditions, but manually by the category ids the query returns, it does process only 1 row, as intended with the index.
So I replace WHERE category_id = c.id by WHERE category_id IN (37, 43) and the query gets faster:
The only thing I can think of is that the subquery isn't run over the results from the query but before some filtering is done. Any kind of explanation or help is welcome!
Edit: silly me, the WHERE IN doesn't work as it doesn't make a unique selection. The question still stands though!
Create table schema
CREATE TABLE `aliases` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`lang_id` int(2) unsigned NOT NULL DEFAULT '1',
`alias` varchar(255) DEFAULT NULL,
`product_id` int(10) unsigned DEFAULT NULL,
`category_id` int(10) unsigned DEFAULT NULL,
`brand_id` int(10) unsigned DEFAULT NULL,
`page_id` int(10) unsigned DEFAULT NULL,
`campaign_id` int(10) unsigned DEFAULT NULL,
`old` tinyint(1) unsigned DEFAULT '0',
PRIMARY KEY (`id`),
KEY `product_id` (`product_id`),
KEY `category_id` (`category_id`),
KEY `page_id` (`page_id`),
KEY `alias_product_id` (`product_id`,`alias`),
KEY `alias_category_id` (`category_id`,`alias`),
KEY `alias_page_id` (`page_id`,`alias`),
KEY `alias_brand_id` (`brand_id`,`alias`),
KEY `alias_product_id_old` (`alias`,`product_id`,`old`),
KEY `alias_category_id_old` (`alias`,`category_id`,`old`),
KEY `alias_brand_id_old` (`alias`,`brand_id`,`old`),
KEY `alias_page_id_old` (`alias`,`page_id`,`old`),
KEY `lang_brand_old` (`lang_id`,`brand_id`,`old`),
KEY `id_category_id_lang_id_old` (`lang_id`,`old`,`id`,`category_id`)
) ENGINE=InnoDB AUTO_INCREMENT=112392 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;
SELECT ...
WHERE x=1 AND y=2
ORDER BY id DESC
LIMIT 1
will be performed in one of several ways.
Since you have not shown us the indexes you have (SHOW CREATE TABLE), I will cover some likely cases...
INDEX(x, y, id) -- This can find the last row for that condition, so it does not need to look at more than one row.
Some other index, or no index: Scan DESCending from the last id checking each row for x=1 AND y=2, stopping when (if) such a row is found.
Some other index, or no index: Scan the entire table, checking each row for x=1 AND y=2; collect them into a temp table; sort by id; deliver one row.
Some of the EXPLAIN clues:
Using where -- does not say much
Using filesort -- it did a sort, apparently for the ORDER BY. (It may have been entirely done in RAM; ignore 'file'.)
Using index condition (not "Using index") -- this indicates an internal optimization in which it can check the WHERE clause more efficiently than it used to in older versions.
Do not trust the "Rows" in EXPLAIN. Often they are reasonably correct, but sometimes they are off by orders of magnitude. Here is a better way to see "how much work" is being done in a rather fast query:
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';
With the CREATE TABLE, I may have suggestions on how to improve the index.
I need to fill the location field in users table with a country name from geoip table, depending on the user's IP.
Here is the tables' CREATE code.
CREATE TABLE `geoip` (
`IP_FROM` INT(10) UNSIGNED ZEROFILL NOT NULL DEFAULT '0000000000',
`IP_TO` INT(10) UNSIGNED ZEROFILL NOT NULL DEFAULT '0000000000',
`COUNTRY_NAME` VARCHAR(50) NOT NULL DEFAULT '',
PRIMARY KEY (`IP_FROM`, `IP_TO`)
)
ENGINE=InnoDB;
CREATE TABLE `users` (
`id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`login` VARCHAR(25) NOT NULL DEFAULT ''
`password` VARCHAR(64) NOT NULL DEFAULT ''
`ip` VARCHAR(128) NULL DEFAULT ''
`location` VARCHAR(128) NULL DEFAULT ''
PRIMARY KEY (`id`),
UNIQUE INDEX `login` (`login`),
INDEX `ip` (`ip`(10))
)
ENGINE=InnoDB
ROW_FORMAT=DYNAMIC;
The update query I try to run is:
UPDATE users u
SET u.location =
(SELECT COUNTRY_NAME FROM geoip WHERE INET_ATON(u.ip) BETWEEN IP_FROM AND IP_TO)
The problem is that this query refuses to use PRIMARY index on the geoip table, though it would speed things up a lot. The EXPLAIN gives me:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY u index NULL PRIMARY 4 NULL 1254395
2 DEPENDENT SUBQUERY geoip ALL PRIMARY NULL NULL NULL 62271 Using where
I've ended up converting geoip table to the MEMORY engine for this query only, but I'd like to know what was the right way to do it.
UPDATE
The DBMS I'm using is MariaDB 10.0.17, if it could make a difference.
Did you try to force the index like this
UPDATE users u
SET u.location =
(SELECT COUNTRY_NAME FROM geoip FORCE INDEX (PRIMARY)
WHERE INET_ATON(u.ip) BETWEEN IP_FROM AND IP_TO)
Also since ip can be NULL it probably messing with index optimiziation.
The IP ranges are non-overlapping, correct? You are not getting any IPv6 addresses? (The world ran out of IPv4 a couple of years ago.)
No, the index won't be used, or at least won't perform as well as you would like. So, I have devised a scheme to solve that. However it requires reformulating the schema and building a Stored Routine. See my IP-ranges blog; It has links to code for IPv4 and for IPv6. It will usually touch only one row in the table, not have to scan half the table.
Edit
MySQL does not know that there is only one range (from/to) that should match. So, it scans far too much. The difference between the two encodings of IP (INT UNSIGNED vs VARCHAR) makes it difficult to use a JOIN (instead of a subquery). Alas a JOIN would not be any better because it does not understand that there is exactly one match. Give this a try:
UPDATE users u
SET u.location =
( SELECT COUNTRY_NAME
FROM geoip
WHERE INET_ATON(u.ip) BETWEEN IP_FROM AND IP_TO
LIMIT 1 -- added
)
If that fails to significantly improve the speed, then change from VARCHAR to INT UNSIGNED in users and try again (without INET_ATON).
Having some real issues with a few queries, this one inparticular. Info below.
tgmp_games, about 20k rows
CREATE TABLE IF NOT EXISTS `tgmp_games` (
`g_id` int(8) NOT NULL AUTO_INCREMENT,
`site_id` int(6) NOT NULL,
`g_name` varchar(255) NOT NULL,
`g_link` varchar(255) NOT NULL,
`g_url` varchar(255) NOT NULL,
`g_platforms` varchar(128) NOT NULL,
`g_added` datetime NOT NULL,
`g_cover` varchar(255) NOT NULL,
`g_impressions` int(8) NOT NULL,
PRIMARY KEY (`g_id`),
KEY `g_platforms` (`g_platforms`),
KEY `site_id` (`site_id`),
KEY `g_link` (`g_link`),
KEY `g_release` (`g_release`),
KEY `g_genre` (`g_genre`),
KEY `g_name` (`g_name`),
KEY `g_impressions` (`g_impressions`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
tgmp_reviews - about 200k rows
CREATE TABLE IF NOT EXISTS `tgmp_reviews` (
`r_id` int(8) NOT NULL AUTO_INCREMENT,
`site_id` int(6) NOT NULL,
`r_source` varchar(128) NOT NULL,
`r_date` date NOT NULL,
`r_score` int(3) NOT NULL,
`r_copy` text NOT NULL,
`r_link` text NOT NULL,
`r_int_link` text NOT NULL,
`r_parent` int(8) NOT NULL,
`r_platform` varchar(12) NOT NULL,
`r_impressions` int(8) NOT NULL,
PRIMARY KEY (`r_id`),
KEY `site_id` (`site_id`),
KEY `r_parent` (`r_parent`),
KEY `r_platform` (`r_platform`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
Here is the query, takes 3 seconds ish
SELECT * FROM tgmp_games g
RIGHT JOIN tgmp_reviews r ON g_id = r.r_parent
WHERE g.site_id = '34'
GROUP BY g_name
ORDER BY g_impressions DESC LIMIT 15
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE r ALL r_parent NULL NULL NULL 201133 Using temporary; Using filesort
1 SIMPLE g eq_ref PRIMARY,site_id PRIMARY 4 engine_comp.r.r_parent 1 Using where
I am just trying to grab the 15 most viewed games, then grab a single review (doesnt really matter which, I guess highest rated would be ideal, r_score) for each game.
Can someone help me figure out why this is so horribly inefficient?
I don't understand what is the purpose of having a GROUP BY g_name in your query, but this makes MySQL performing aggregates on the columns selected, or all columns from both table. So please try to exclude it and check if it helps.
Also, RIGHT JOIN makes database to query tgmp_reviews first, which is not what you want. I suppose LEFT JOIN is a better choice here. Please, try to change the join type.
If none of the first options helps, you need to redesign your query. As you need to obtain 15 most viewed games for the site, the query will be:
SELECT g_id
FROM tgmp_games g
WHERE site_id = 34
ORDER BY g_impressions DESC
LIMIT 15;
This is the very first part that should be executed by the database, as it provides the best selectivity. Then you can get the desired reviews for the games:
SELECT r_parent, max(r_score)
FROM tgmp_reviews r
WHERE r_parent IN (/*1st query*/)
GROUP BY r_parent;
Such construct will force database to execute the first query first (sorry for the tautology) and will give you the maximal score for each of the wanted games. I hope you will be able to use the obtained results for your purpose.
Your MyISAM table is small, you can try converting it to see if that resolves the issue. Do you have a reason for using MyISAM instead of InnoDB for that table?
You can also try running an analyze on each table to update the statistics to see if the optimizer chooses something different.
This query:
explain
SELECT `Lineitem`.`id`, `Donation`.`id`, `Donation`.`order_line_id`
FROM `order_line` AS `Lineitem`
LEFT JOIN `donations` AS `Donation`
ON (`Donation`.`order_line_id` = `Lineitem`.`id`)
WHERE `Lineitem`.`session_id` = '1'
correctly uses the Donation.order_line_id and Lineitem.id indexes, shown in this EXPLAIN output:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE Lineitem ref session_id session_id 97 const 1 Using where; Using index
1 SIMPLE Donation ref order_line_id order_line_id 4 Lineitem.id 2 Using index
However, this query, which simply includes another field:
explain
SELECT `Lineitem`.`id`, `Donation`.`id`, `Donation`.`npo_id`,
`Donation`.`order_line_id`
FROM `order_line` AS `Lineitem`
LEFT JOIN `donations` AS `Donation`
ON (`Donation`.`order_line_id` = `Lineitem`.`id`)
WHERE `Lineitem`.`session_id` = '1'
Shows that the Donation table does not use an index:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE Lineitem ref session_id session_id 97 const 1 Using where; Using index
1 SIMPLE Donation ALL order_line_id NULL NULL NULL 3
All of the _id fields in the tables are indexed, but I can't figure out how adding this field into the list of selected fields causes the index to be dropped.
As requested by James C, here are the table definitions:
CREATE TABLE `donations` (
`id` int(10) unsigned NOT NULL auto_increment,
`npo_id` int(10) unsigned NOT NULL,
`order_line_detail_id` int(10) unsigned NOT NULL default '0',
`order_line_id` int(10) unsigned NOT NULL default '0',
`created` datetime default NULL,
`modified` datetime default NULL,
PRIMARY KEY (`id`),
KEY `npo_id` (`npo_id`),
KEY `order_line_id` (`order_line_id`),
KEY `order_line_detail_id` (`order_line_detail_id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8
CREATE TABLE `order_line` (
`id` bigint(20) unsigned NOT NULL auto_increment,
`order_id` bigint(20) NOT NULL,
`npo_id` bigint(20) NOT NULL default '0',
`session_id` varchar(32) collate utf8_unicode_ci default NULL,
`created` datetime default NULL,
PRIMARY KEY (`id`),
KEY `order_id` (`order_id`),
KEY `npo_id` (`npo_id`),
KEY `session_id` (`session_id`)
) ENGINE=InnoDB AUTO_INCREMENT=23 DEFAULT CHARSET=utf8
I also did some reading about cardinality, and it looks like both the Donations.npo_id and Donations.order_line_id have a cardinality of 2. Hopefully this suggests something useful?
I'm thinking that a USE INDEX might solve the problem, but I'm using an ORM that makes this a bit tricky, and I don't understand why it wouldn't grab the correct index when the JOIN specifically names indexed fields?!?
Thanks for your brainpower!
The first explain has "uses index" at the end. This means that it was able to find the rows and return the result for the query by just looking at the index and not having to fetch/analyse any row data.
In the second query you add a row that's likely not indexed. This means that MySQL has to look at the data of the table. I'm not sure why the optimiser chose to do a table scan but I think it's likely that if the table is fairly small it's easier for it to just read everything than trying to pick out details for individual rows.
edit: I think adding the following indexes will improve things even more and let all of the join use indexes only:
ALTER TABLE order_line ADD INDEX(session_id, id);
ALTER TABLE donations ADD INDEX(order_line_id, npo_id, id)
This will allow order_line to to find the rows using session_id and then return id and also allow donations to join onto order_line_id and then return the other two columns.
Looking at the auto_increment values can I assume that there's not much data in there. It's worth noting that the amount of data in the tables will have an effect on the query plan and it's good practice to put some sample data in there to test things out. For more detail have a look in this blog post I made some time back: http://webmonkeyuk.wordpress.com/2010/09/27/what-makes-a-good-mysql-index-part-2-cardinality/