How to optimize SELECT against a constant value? - mysql

Because I am using a data structure beyond my control, there is a table in my DB which will potentially have millions of Foreign-key => (key => value) pairs. Now, I know that one of the keys will be a certain value (in this case the key is related_content). Is it possible for MySQL to optimize the query so that it does not have to search the entire table for results?
Example table (called meta):
fk | key | value
====================================
1 | 'related_content' | '[2,3,4]'
1 | 'condiment' | 'mayo'
1 | 'condiment' | 'bananas'
29 | 'condiment' | 'ketchup'
29 | 'related_content' | '[1,7,9]'
95 | 'condiment' | 'mustard'
95 | 'related_content' | '[5,6,8]'
Example query:
SELECT value FROM meta WHERE fk = 29 AND key = 'related_content';
What I would like to do is:
ALTER TABLE `meta` ADD INDEX `meta_related` ON (`key`) WHERE `key` = 'related_content';
(Before anyone asks, the key column already has an index on it)

Add a 'composite' index
INDEX(fk, key)

what if you will add the key_index (int11) column that will represent the string in a key column like : related_content =1, condiment=2 and you will add index on key_index column, the index on int will be faster than on string, at the end you will select using two indexed integer field. It's gonna be fast

Related

Is there a general rule for where in the order of a primary-key index to place the partition key?

Assume that I properly query the partition key in every query. Is there any sensible reason to place the partition key anywhere but first in line?
I feel like there's something I'm not understanding about how the index works. Assume MySQL and InnoDB.
I think I get that, ordinarily, you place the most selective keys first and the less selective ones later. And the partition key would ordinarily be one of the less selective ones. But if the partition key is included in every query, what difference does it make to include the partition key first? Wouldn't this help in other ways, too? E.g., I won't have to include the partition key in every index if it's up front in the primary-key index: queries using other indexes can borrow the primary key from the primary-key index consistent with the leftmost-key constraint.
And I don't know if an index itself is ever partitioned but it seems like it could be if it's a covering index. (Am I right?) If so, the partition key would have to be first, no, for the partitions to work?
E.g.:
CREATE TABLE `fee` (
`fi` INT ,
`fo` INT ,
PRIMARY KEY ( `fi` , `fo` ) ,
) ENGINE = INNODB
PARTITION BY RANGE ( `fi` ) (
. . .
);
Or . . .
CREATE TABLE `fee` (
`fi` INT ,
`fo` INT ,
PRIMARY KEY ( `fo` , `fi` ) ,
) ENGINE = INNODB
PARTITION BY RANGE ( `fi` ) (
. . .
);
Which, if either, is inherently better, and why or why not?
Thank you for your time.
The selectivity of the two columns doesn't matter as much as some people think.
If you were to query the table as:
SELECT ... FROM fee WHERE fi=? AND fo=?
Then what does it matter if it searches the B-tree by fi,fo or by fo,fi? It'll find the same record in the end, and it'll take roughly the same number of steps to do that. There's a theoretical difference, but in most cases it won't make a significant difference.
What's more important is if you have queries that only search for one or the other column of the primary key.
You mentioned that all queries search on the partition column, that's fi in this example. Do you have any queries that search on fi but not fo?
SELECT ... FROM fee WHERE fi=?
If fi were the first column of the primary key, this would do partition-pruning, and also use the PRIMARY KEY index because your search term is on the first column.
mysql> explain partitions select * from fee where fi = 175;
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| 1 | SIMPLE | fee | p2 | ref | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+---------+---------+-------+------+----------+-------+
Whereas if fi were the second column of the primary key, then it could do partition-pruning, but not use the index.
mysql> explain partitions select * from fee where fi = 175;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | fee | p2 | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
Indexes are also partitioned. Think of partitioning as a series of completely separate tables, with the same columns and same indexes, just a subset of the rows. Once the query determines which partition to read, it does the query the same way it would against a non-partitioned table, choosing an index based on the query criteria. Will it use the primary key to search?
mysql> explain partitions select * from fee where fi = 175 and created_at < now();
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | fee | p2 | range | created_at | created_at | 6 | NULL | 1 | 100.00 | Using index condition |
+----+-------------+-------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
Here we see the condition on fi resulted in partition pruning, and yet the index on created_at was preferred by the optimizer. It searches that index in the respective partition.
"you place the most selective keys first and the less selective ones later" -- No. That is an old wives tale.
Put keys that are tested with '=' first is a simple and more important rule.
Think of a composite InnoDB BTree index as working this way. Concatenate all the columns together, then picture the BTree as having a single string as the key.
Putting the "partition key" first in an index is the least useful place! You are already pruning on that; having it in the index is actually redundant. However, it is necessary for any Unique key (that includes the `PRIMARY KEY').
Yes, you correctly observed that the PK columns are implicitly included in every secondary key, hence the partition key is included.
Note that if the partition key is not really part of a desired UNIQUE key, then the uniqueness constraint is not possible (in MySQL). However, the tacked-on PK is not part of the uniqueness constraint. Since MySQL is only willing to check uniqueness for one partition, you must include the partition key to also provide the semantics that states "Unique" across the entire table. (Yeah, it is a bit convoluted; live with it.)
In your example, if you do SELECT .. WHERE fi BETWEEN 1 and 2 AND fo=3, any index (the PK is an index) starting with fi would work harder than if fo were first in the index.
So, a Rule of Thumb is to move the partition key to the end of any index that includes it. (I have seen only one rare exception; I forget the details.)

Reduce number of joins in mysql

I have 12 fixed tables (group, local, element, sub_element, service, ...), each table with different numbers of rows.
The columns 'id_' in all table is a primary key (int). The others columns are of datatype varchar(20). The maximum number of rows in these tables are 300.
Each table was created in this way:
CREATE TABLE group
(
id_G int NOT NULL,
name_group varchar(20) NOT NULL,
PRIMARY KEY (id_G)
);
|........GROUP......| |.......LOCAL.......| |.......SERVICE.......|
| id_G | name_group | | id_L | name_local | | id_S | name_service |
+------+------------+ +------+------------+ +------+--------------+
| 1 | group1 | | 1 | local1 | | 1 | service1 |
| 2 | group2 | | 2 | local2 | | 2 | service2 |
And I have one table that combine all these tables depending on user selects.
The 'id_' come from fixed tables selected by the user are recorded into this table.
This table was crate in this way:
CREATE TABLE group
(
id_E int NOT NULL,
event_name varchar(20) NOT NULL,
id_G int NOT NULL,
id_L int NOT NULL,
...
PRIMARY KEY (id_G)
);
The tables (event) look like this:
|....................EVENT.....................|
| id_E | event_name | id_G | id_L | ... |id_S |
+------+-------------+------+------+-----+-----+
| 1 | mater1 | 1 | 1 | ... | 3 |
| 2 | master2 | 2 | 2 | ... | 6 |
This table get greater each day, an now it has about thousunds of rows.
Column id_E is the primary key (int), event_name is varchar(20).
This table has, in addition of id_E and event_name columns, 12 other columns the came from the fixed tables.
Every time than I need to retrieve information on the event table, to turn more readable, I need to do about 12 joins.
My query look like this where i need to retrieve all columns from table event:
SELECT event_name, name_group, name_local ..., name_service
FROM event
INNER JOIN group on event.id_G = group.id_G
INNER JOIN local on event.id_L = local.id_L
...
INNER JOIN service on event.id_S = service.id_S
WHERE event.id_S = 7 (for example)
This slows down my system performance. Is there a way to reduce the number of joins? I've heard about using Natural Keys, but I think this is not a good idea to form my case thinking in future maintenance.
My queries are taking about 7 seconds and I need to reduce this time.
I changed the WHERE clause and this caused not affect. So, I am sure that the problem is that the query has so many joins.
Could someone give some help? thanks a lot...
MySQL has a great keyword of "STRAIGHT_JOIN" and might be what you are looking for. First, each of your lookup tables (id/description) I have to assume already have an index on the ID column since that is primary key.
Your event table is the one you are querying as the primary basis of the details and joining to the lookups per their respective IDs. As long as your WHERE clause applicable to the EVENT table is optimized, such as the ID you are looking for, it SHOULD be virtually instantaneous.
If it is not, then it might be that MySQL is trying to think for you and take one of the secondary lookup tables and make it a primary basis of the query for whatever reason, such as much lower record count. In this case, add the keyword and try it..
SELECT STRAIGHT_JOIN ... rest of your query
This tells MySQL to do the query in the order you gave it, thus the Event table first and it's where clause on the ID. It should find that one thing, then grab all the corresponding lookup descriptions from the other tables.
Create indexes, concretely use compound indexes, for instance, start creating a compound index for event and groups:
on table events create one for (event id, group id).
then, on the group table create another one for the next relation (group id, local id).
on local do the same with service, and so on...

update if two fields exists, insert if not (MySQL)

This isn't an (exact) duplicate of this questions so I'Ve started a new one.
I have this table (ID is primary and auto increment)
ID | mykey | myfoo | mybar
============================
1 | 1.1 | abc | 123
2 | 1.1.1 | def | 456
3 | 1.2 | abc | 789
4 | 1.1 | ghi | 999
I would like to UPDATE row 1 with mybar = "333" only if mykey = '1.1' AND myfoo = 'abc'
If either mykey != '1.1' OR myfoo != 'abc' I would like to INSERT an new row.
Is this possible with one statement?
A unique index in MySQL does not have to be on a single column. You can add a UNIQUE index on multiple columns simply by specifying more columns in your ALTER TABLE..ADD UNIQUE statement:
ALTER TABLE myTable ADD UNIQUE (
mykey,
myfoo
);
Now you can use a regular INSERT INTO...ON DUPLICATE KEY statement.
SQLFiddle DEMO (note that the multiple repeated values are not added - all others are)
Note:
If either is NULL, it will not be counted as unique. mykey being 'bar' and myfoo being NULL could be added to infinity even though they have the "same" values (NULL isn't really a value).

Can you tell what is wrong with this query?

HI all
I am using single database and near about 7 tables. do have data s filled with all tables.
say near about 10k as of now. but will grow further and may strike millions but will take time.
my question is why my query is slow fetching results. its taking near about 10 to 12 seconds for a query on non load conditions. I am worried if what happens under load conditions say thousands of queries at one time??
here is my sample query...
$result = $db->sql_query("SELECT * FROM table1,table2,table3,table4,table5 WHERE table1.url = table2.url AND table1.url = table3.url AND table1.url = table4.url AND table1.url = table5.url AND table1.url='".$uri."'")or die(mysql_error());
$row = $db->sql_fetchrow($result);
$daysA = $row['regtime'];
$days = (strtotime(date("Y-m-d")) - strtotime($row['regtime'])) / (60 * 60 * 24);
if($row > 0 && $days < 2){
$row['data'];
$row['data1'];
//remaining
}else{ //some code}
I'm not sure if you have resolved the problem or not, but here's some test data that I have produced. There are a number of factors that can affect the speed of your queries, so my simple test cases may not accurately reflect your tables or data. However, they serve as a useful starting point.
First, create 5 simple tables, each with the same structure. As with your tables, I have used a UNIQUE index on the url column:
CREATE TABLE `table1` (
`id` int(11) NOT NULL auto_increment,
`url` varchar(255) default NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `url` (`url`)
) ENGINE=InnoDB;
CREATE TABLE table2 LIKE table1;
CREATE TABLE table3 LIKE table1;
CREATE TABLE table4 LIKE table1;
CREATE TABLE table5 LIKE table1;
The following script creates a stored procedure which is used to fill each table with 10,000 rows of data:
DELIMITER //
DROP PROCEDURE IF EXISTS test.autofill//
CREATE PROCEDURE test.autofill()
BEGIN
DECLARE i INT DEFAULT 5;
WHILE i < 10000 DO
INSERT INTO table1 (url) VALUES (CONCAT('wwww.stackoverflow.com/', i ));
INSERT INTO table2 (url) VALUES (CONCAT('wwww.stackoverflow.com/', 10000 - i ));
INSERT INTO table3 (url) VALUES (CONCAT('wwww.stackoverflow.com/', i + 6000 ));
INSERT INTO table4 (url) VALUES (CONCAT('wwww.stackoverflow.com/', i + 3000 ));
INSERT INTO table5 (url) VALUES (CONCAT('wwww.stackoverflow.com/', i + 2000 ));
SET i = i + 1;
END WHILE;
END;
//
DELIMITER ;
CALL test.autofill();
Each table now contains 10,000 rows. Your SELECT statement can now be used to query the data:
SELECT *
FROM table1,table2,table3,table4,table5
WHERE table1.url = table2.url
AND table1.url = table3.url
AND table1.url = table4.url
AND table1.url = table5.url
AND table1.url = 'wwww.stackoverflow.com/8000';
This gives the following result almost instantly:
+------+-----------------------------+------+-----------------------------+------+-----------------------------+------+-----------------------------+------+-----------------------------+
| id | url | id | url | id | url | id | url | id | url |
+------+-----------------------------+------+-----------------------------+------+-----------------------------+------+-----------------------------+------+-----------------------------+
| 7996 | wwww.stackoverflow.com/8000 | 1996 | wwww.stackoverflow.com/8000 | 1996 | wwww.stackoverflow.com/8000 | 4996 | wwww.stackoverflow.com/8000 | 5996 | wwww.stackoverflow.com/8000 |
+------+-----------------------------+------+-----------------------------+------+-----------------------------+------+-----------------------------+------+-----------------------------+
An EXPLAIN SELECT shows why the query is very fast:
EXPLAIN SELECT *
FROM table1,table2,table3,table4,table5
WHERE table1.url = table2.url
AND table1.url = table3.url
AND table1.url = table4.url
AND table1.url = table5.url
AND table1.url = 'wwww.stackoverflow.com/8000';
+----+-------------+--------+-------+---------------+------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+------+---------+-------+------+-------------+
| 1 | SIMPLE | table1 | const | url | url | 258 | const | 1 | Using index |
| 1 | SIMPLE | table2 | const | url | url | 258 | const | 1 | Using index |
| 1 | SIMPLE | table3 | const | url | url | 258 | const | 1 | Using index |
| 1 | SIMPLE | table4 | const | url | url | 258 | const | 1 | Using index |
| 1 | SIMPLE | table5 | const | url | url | 258 | const | 1 | Using index |
+----+-------------+--------+-------+---------------+------+---------+-------+------+-------------+
select_type is SIMPLE, which means that there are no JOIN statements to slow things down.
type is const, which means that the table has at most one possible match - this is thanks to the UNIQUE index, which guarantees no two URLs will be the same (see mysql 5.0 indexes - Unique vs Non Unique for a good description of UNIQUE INDEX). A const value in the type column is about as good as you can get.
possible_keys and key use the url key. That means that the correct index is being used for each table.
ref is const, which means that MySQL is comparing a constant value (one that does not change) with the index. Again, this is very fast.
rows equals 1. MySQL only needs to look at one row from each table. Once again, this is very fast.
Extra is Using index. MySQL does not have to do any additional non-indexed searches of the tables.
Provided you have an index on the url column of each table, your query should be extremely fast.
definitely looks like an index on the url field in each table is the way to go
It sounds likely that some of the columns in your WHERE clause are not indexed. Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows.
You might find EXPLAIN helpful in analyzing your queries.
Look up JOINs and especially look at the difference between INNER JOINS, LEFT JOINS and OUTER JOINS. Also INDEX all the fields on which you are going to do a lookup.
Probably something wrong with your indexes!
In any case long character strings like urls make for poorly performing primary keys. The take up a lot of room in the index and so the indexes are not as dense as they could be and less row pointers are loaded per IO. Also with urls the chances are that 99% of your strings start with "http://www." so the database engine has to compare 13 characters before it decides a row does not match.
One solution to this is to use some hash finction like MD5, SHA1 or even CRC32 to get a raw binary value from your strings and to use this value as the primary key for your tables. CRC32 makes a nice integer sized primary key but its almost certain that at some stage you will encounter two urls that hash to the same CRC32 value so you will need to store and compare the "url" string to be sure. The other hash functions return longer values (16 bytes and 20 bytes respectively in "raw" mode) but the chances of a collision are so small that its not worth bothering about.
.

MySQL not using indexes ("Using filesort") when using ORDER BY

I'm hitting some quite major performances issues due to the use of "ORDER BY"-statements in my SQL-code.
Everything is fine as long as I'm not using ORDER BY-statements in the SQL. However, once I introduce ORDER BY:s in the SQL code everything slows down dramatically due to the lack of correct indexing. One would assume that fixing this would be trivial, but judging from forum discussions, etc this seems to be a rather common issue that I've yet to see a definitive and concise answer to this question.
Question: Given the following table ...
CREATE TABLE values_table (
id int(11) NOT NULL auto_increment,
...
value1 int(10) unsigned NOT NULL default '0',
value2 int(11) NOT NULL default '0',
PRIMARY KEY (id),
KEY value1 (value1),
KEY value2 (value2),
) ENGINE=MyISAM AUTO_INCREMENT=2364641 DEFAULT CHARSET=utf8;
... how do I create indexes that will be used when querying the table for a value1-range while sorting on the value of value2?
Currently, the fetching is OK when NOT using the ORDER BY clause.
See the following EXPLAIN QUERY output:
OK, when NOT using ORDER BY:
EXPLAIN select ... from values_table this_ where this_.value1 between 12345678 and 12349999 limit 10;
+----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+
| 1 | SIMPLE | this_ | range | value1 | value1 | 4 | NULL | 3303 | Using where |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-------------+
However, when using ORDER BY I get "Using filesort":
EXPLAIN select ... from values_table this_ where this_.value1 between 12345678 and 12349999 order by this_.value2 asc limit 10;
+----+-------------+-------+-------+---------------+----------+---------+------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-----------------------------+
| 1 | SIMPLE | this_ | range | value1 | value1 | 4 | NULL | 3303 | Using where; Using filesort |
+----+-------------+-------+-------+---------------+----------+---------+------+------+-----------------------------+
Some additional information about the table content:
SELECT MIN(value1), MAX(value1) FROM values_table;
+---------------+---------------+
| MIN(value1) | MAX(value2) |
+---------------+---------------+
| 0 | 4294967295 |
+---------------+---------------+
...
SELECT MIN(value2), MAX(value2) FROM values_table;
+---------------+---------------+
| MIN(value2) | MAX(value2) |
+---------------+---------------+
| 1 | 953359 |
+---------------+---------------+
Please let me know if any further information is needed to answer the question.
Thanks a lot in advance!
Update #1: Adding a new composite index (ALTER TABLE values_table ADD INDEX (value1, value2);) does not solve the problem. You'll still get "Using filesort" after adding such an index.
Update #2: A constraint that I did not mention in my question is that I'd rather change the structure of the table (say adding indexes, etc.) than changing the SQL queries used. The SQL queries are auto-generated using Hibernate, so consider those more or less fixed.
You cannot use an index in this case, as you use a RANGE filtering condition.
If you'd use something like:
SELECT *
FROM values_table this_
WHERE this_.value1 = #value
ORDER BY
value2
LIMIT 10
, then creating a composite index on (VALUE1, VALUE2) would be used both for filtering and for ordering.
But you use a ranged condition, that's why you'll need to perform ordering anyway.
Your composite index will look like this:
value1 value2
----- ------
1 10
1 20
1 30
1 40
1 50
1 60
2 10
2 20
2 30
3 10
3 20
3 30
3 40
, and if you select 1 and 2 in value1, you still don't get a whole sorted set of value2.
If your index on value2 is not very selective (i. e. there are not many DISTINCT value2 in the table), you could try:
CREATE INDEX ix_table_value2_value1 ON mytable (value2, value1)
/* Note the order, it's important */
SELECT *
FROM (
SELECT DISTINCT value2
FROM mytable
ORDER BY
value2
) q,
mytable m
WHERE m.value2 >= q.value2
AND m.value2 <= q.value2
AND m.value1 BETWEEN 13123123 AND 123123123
This is called a SKIP SCAN access method. MySQL does not support it directly, but it can be emulated like this.
The RANGE access will be used in this case, but probably you won't get any performance benefit unless DISTINCT value2 comprise less than about 1% of rows.
Note usage of:
m.value2 >= q.value2
AND m.value2 <= q.value2
instead of
m.value2 = q.value2
This makes MySQL perform RANGE checking on each loop.
It appears to me that you have two totally independent keys, one for value1 and one for value2.
So when you use the value1 key to retrieve, the records aren't necessarily returned in order of value2, so they have to be sorted. This is still better than a full table scan since you're only sorting the records that satisfy your "where value1" clause.
I think (if this is possible in MySQL), a composite key on (value1,value2) would solve this.
Try:
CREATE TABLE values_table (
id int(11) NOT NULL auto_increment,
...
value1 int(10) unsigned NOT NULL default '0',
value2 int(11) NOT NULL default '0',
PRIMARY KEY (id),
KEY value1 (value1),
KEY value1and2 (value1,value2),
) ENGINE=MyISAM AUTO_INCREMENT=2364641 DEFAULT CHARSET=utf8;
(or the equivalent ALTER TABLE), assuming that's the correct syntax in MySQL for a composite key.
In all databases I know (and I have to admit MySQL isn't one of them), that would cause the DB engine to select the value1and2 key for retrieving the rows and they would already be sorted in value2-within-value1 order, so wouldn't need a file sort.
You can still keep the value2 key if you need it.