mysql: how to optimize this when condition search + more? - mysql

I have been trying to debug my search MYSQL search speeds and they are horrible (a couple of seconds to 2 minutes).
This is an example code for the search. Search complexity can become really complicated depending on the user requirements.
SELECT Device,Input,`W(m)`,`L(m)`,VDD,`Temp(C)`,Param,Value
FROM `TABLE_NAME`
WHERE (`Temp(C)`='110' OR `Temp(C)`='125' )
AND (Device='ngear' )
AND (Input='a' OR Input='b' OR Input='a' OR Input='b' OR Input='c' OR Input='b' )
AND (Param='speed' OR Param='leakage' )
Please note this table has no indices and no primary key. This data isn't really relational as it contains statistical simulation data that is stored in MYSQL. This table has about 1 million rows of data.
Should I start indexing every column? Any thoughts would be appreciated.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE TABLE_NAME ALL NULL NULL NULL NULL 12278640 Using where

You need a composite index, and properly require some redefine of your data type,
here is some general advice :
alter table MQUAL__TSMC_20G_v0d1_Robert_02122011
add index ( `Temp(C)`, Device, Input, Param);
And your query can be change to :-
where `Temp(C)` in(110, 125)
AND Device='ngear'
AND Input in('a','b','c')
AND Param in('speed', 'leakage');
Due to lack of information of your schema,
so, data type should be :-
`Temp(C)` <-- int

My suggestion is to add an index with the 'where' clause that narrows the data the most. As a guess temp should be the first column in the index. You want a compound index as well. The general rule of columns should start with the column that reduces the result set the most to the last being the most common value.

Related

mysql range query , use index judge range large

mysql table struct
it make me confuse, if query range influence use index in mysql
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
That is what happens. And it is actually an optimization.
When using a secondary key (such as INDEX(teacher_id)), the processing goes like this:
Reach into the index, which is a B+Tree. In such a structure, it is quite efficient to find a particular value (such as 1) and then scan forward (until 5000 or 10000).
For each entry, reach over into the data to fetch the row (SELECT *). This uses the PRIMARY KEY, a copy of which is in the secondary key. The PK and the data are clustered together; each lookup by one PK value is efficient (again, a BTree), but you need to do 5000 or 10000 of them. So the cost (time taken) adds up.
A "table scan" (ie, not using any INDEX) goes like this:
Start at the beginning of the table, walk through the B+Tree for the table (in PK order) until the end.
For each row, check the WHERE clause (a range on teacher_id).
If more than something like 20% of the table needs to be looked at, a table scan is actually faster than bouncing back and forth between the secondary index and the data.
So, "large" is somewhere around 20%. The actual value depends on table statistics, etc.
Bottom line: Just let the Optimizer do its thing; most of the time it knows best.
in brief, i use mysql database
execute
EXPLAIN
SELECT * FROM t_teacher_course_info
WHERE teacher_id >1 and teacher_id < 5000
will use index INDEX `idx_teacher_id_last_update_time` (`teacher_id`, `last_update_time`)
but if change range
EXPLAIN
SELECT * FROM t_teacher_course_info
WHERE teacher_id >1 and teacher_id < 10000
id select_type table type possible_keys key key_len ref rows Extra
1 1 SIMPLE t_teacher_course_info ALL idx_teacher_update_time 671082 Using where
scan all table, not use index , any mysql config
maybe scan row count judge if use index. !!!!!!!

Distinct (or group by) using filesort and temp table

I know there are similar questions on this but I've got a specific query / question around why this query
EXPLAIN SELECT DISTINCT RSubdomain FROM R_Subdomains WHERE EmploymentState IN (0,1) AND RPhone='7853932120'
gives me this output explain
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE RSubdomains index NULL RSubdomain 767 NULL 3278 Using where
with and index on RSubdomains
but if I add in a composite index on EmploymentState/RPhone
I get this output from explain
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE RSubdomains range EmploymentState EmploymentState 67 NULL 2 Using where; Using temporary
if I take away the distinct on RSubdomains it drops the Using temp from the explain output... but what I don't get is why, when I add in the composite key (and keeping the key on RSubdomain) does the distinct end up using a temp table and which index schema is better here? I see that the amount of rows scanned on the combined key is far less, but the query is of type range and it's also slower.
Q: why ... does the distinct end up using a temp table?
MySQL is doing a range scan on the index (i.e. reading index blocks) to locate the rows that satisfy the predicates (WHERE clause). Then MySQL has to lookup the value of the RSubdomain column from the underlying table (it's not available in the index.) To eliminate duplicates, MySQL needs to scan the values of RSubdomain that were retrieved. The "Using temp" indicates the MySQL is materializing a resultset, which is processed in a subsequent step. (Likely, that's the set of RSubdomain values that was retrieved; given the DISTINCT, it's likely that MySQL is actually creating a temporary table with RSubdomain as a primary or unique key, and only inserting non-duplicate values.
In the first case, it looks like the rows are being retreived in order by RSubdomain (likely, that's the first column in the cluster key). That means that MySQL needn't compare the values of all the RSubdomain values; it only needs to check if the last retrieved value matches the currently retrieved value to determine whether the value can be "skipped."
Q: which index schema is better here?
The optimum index for your query is likely a covering index:
... ON R_Subdomains (RPhone, EmploymentState, RSubdomain)
But with only 3278 rows, you aren't likely to see any performance difference.
FOLLOWUP
Unfortunately, MySQL does not provide the type of instrumentation provided in other RDBMS (like the Oracle event 10046 sql trace, which gives actual timings for resources and waits.)
Since MySQL is choosing to use the index when it is available, that is probably the most efficient plan. For the best efficiency, I'd perform an OPTIMIZE TABLE operation (for InnoDB tables and MyISAM tables with dynamic format, if there have been a significant number of DML changes, especially DELETEs and UPDATEs that modify the length of the row...) At the very least, it would ensure that the index statistics are up to date.
You might want to compare the plan of an equivalent statement that does a GROUP BY instead of a DISTINCT, i.e.
SELECT r.RSubdomain
FROM R_Subdomains r
WHERE r.EmploymentState IN (0,1)
AND r.RPhone='7853932120'
GROUP
BY r.Subdomain
For optimum performance, I'd go with a covering index with RPhone as the leading column; that's based on an assumption about the cardinality of the RPhone column (close to unique values), opposed to only a few different values in the EmploymentState column. That covering index will give the best performance... i.e. the quickest elimination of rows that need to be examined.
But again, with only a couple thousand rows, it's going to be hard to see any performance difference. If the query was examining millions of rows, that's when you'd likely see a difference, and the key to good performance will be limiting the number of rows that need to be inspected.

How can I speed up a slow SUM + ORDER BY + LIMIT query in MySQL?

I have a simple key-value table with two fields, created like so:
CREATE TABLE `mytable` (
`key` varchar(255) NOT NULL,
`value` double NOT NULL,
KEY `MYKEY` (`key`)
);
The keys are not unique. The table contains over one million records. I need a query that will sum up all the values for a given key, and return the top 10 keys. Here's my attempt:
SELECT t.key, SUM(t.value) value
FROM mytable t
GROUP BY t.key
ORDER BY value DESC
LIMIT 0, 10;
But this is very slow. Thing is, without the GROUP BY and SUM, it's pretty fast, and without the ORDER BY, it's very fast, but for some reason the combination of the two makes it very very slow. Can anyone explain why this is so, and how this can be speeded up?
There is no index on value. I tried creating one but it didn't help.
EXPLAIN EXTENDED produces the following in Workbench:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE t index NULL MYKEY 257 NULL 1340532 100.00 "Using temporary; Using filesort"
There are about 400K unique keys in the table.
The query takes over 3 minutes to run. I don't know how long because I stopped it after 3 minutes. However, if I remove the index on key, it runs in 30 seconds! Anyone has any idea why?
The only way to really speed this up, as far as I can see, is to create a seperate table with unique keys in and maintain the total values. Then you will be able to index values to retrieve the top ten quickly, also the calculation will already be done. As long as the table is not updated in too many places, this shouldn't be a major problem.
The major problem with this type of query is that the group by requires indexing in one order and the order by requires sorting into a different order.

MySQL indexes and when to group them

I'm still trying to get my head around the best way to use INDEXES in MySQL. How do you know when to merge them together and when to have them separate?
Below are the indexes from the Wordpress posts table. See how post_name, post_parent and post_author are seperate entries? And then they have type_status_date which is a mixture of 4 fields?
http://img215.imageshack.us/img215/5976/screenshot20120426at431.png
I don't understand the logic behind this? Can anyone enlighten me?
Going to be a bit of a long answer but here we go. Please note I am not going to deal with the differences in database engines here(MyISAM and InnoDB have distinct way of implementing what I am trying to describe)
First thing you have to understand about a index is that it is a separate data structure stored on disk. Normally this is a b-tree data structure containing the column(s) that you have indexed and also contain a pointer to the row in the table(this pointer is normally the primary key).
The only index that is stored with the data is the primary key index. Thus a primary key index IS the table.
Lets assume you have following table definition.
CREATE TABLE `Student` (
`StudentNumber` INT NOT NULL ,
`Name` VARCHAR(32) NULL ,
`Surname` VARCHAR(32) NULL ,
`StudentEmail` VARCHAR(32) NULL ,
PRIMARY KEY (`StudentNumber`) );
Since we have a primary key on StudentID there will be a index containing the primary key and the other columns in the index. If you had to look at the data in the index you would probably see something like this.
1 , John ,Doe ,Jdoe#gmail.com
As you can see this is the table data once again showing you that the primary key index IS the table.
The StudentNumber column is indexed which allows your to effectively search on it the rest of the data is stored with the key. Thus if ran the following query:
SELECT * FROM Student WHERE StudentNumber=1
MySQL would use the primary index to quickly find the row and the read the data stored with the indexed column. Since there is a index MySQL can use the index to do a effective binary seek operation on the b-tree.
Also when it comes to retrieving the data after doing the search MySQL can read the data from the index thus we are using 1 operation in the index to retrieve the data. Now if I ran the following query:
SELECT * FROM Student WHERE Name ='Joe'
MySQL would check if there is a index that it could use to speed the query up. However in my case there is no index on name so MySQL would do a sequential read from the table one row at a time from the first row to the last.
At each row it would evaluate the row against the where clause and return matching row. So basically it reads the primary key index from top to bottom. Remember the primary key index is the table.
If I ran the following statement:
ALTER TABLE `TimLog`.`student`
ADD INDEX `ix_name` (`Name` ASC) ;
ALTER TABLE `TimLog`.`student`
ADD INDEX `ix_surname` (`Surname` ASC) ;
MySQL would create new indexes on the Student table. This will be stored away from the table on disk and the data inside would look something like this:
Data in ix_Name
John, 1 <--PRIMARY KEY VALUE
Data in ix_Surname
Doe, 1 <--PRIMARY KEY VALUE
Notice the data in the ix_Name index is the name and the primary key value. Great so if I ran the previous select statement MySQL would then read the ix_name index and get the primary key value for matching items and then use the primary key index to get the rest of the data.
So the number of operations to get the data from the index is 2. The matching rows are found in the index and then a lookup happens on the primary key to get the row data out.
You now have the following query:
SELECT * FROM Student WHERE Name='John' AND surname ='Doe'
Here MySQL cant use both indexes as it would be a waste of operations. If MySQL had to use both indexes in this query the following would happen(this should not happen).
1 Find in the ix_Name the rows with the value John
2 Read the primary key that matches to get the row data
3 Store the matching results
4 Find in the ix Surname the rows with the value Doe
5 Read the primary key that matches to get row data.
6 Store the matching results
7 Take the Name results and Surname results and merge them
8 Return query results.
This is really a waste of IO as MySQL would then read the table twice. Basically using one index would be better than trying to use two(I will explain in a momnet why). MySQL will choose 1 index to use in a this simple query.
So how does MySQL decide on which index to use?
MySQL keeps statistics around indexes internally. These statistics tell MySQL basically how unique a index is. So for the sake of argument lets say the surname index (ix_surname)was more unique than the name index(ix_name) MySQL would use the surname index (ix_surname).
Thus query retrieval would be like this:
1 Use the ix_surname and find rows that match the value Doe
2 Read the primary key and apply the filter for the value John on the actual column data in the row.
3 Return the matched row.
As you can see the number of operations in this search is much less. I have over simplified a lot of the technical detail. Indexing is a interesting thing to master but you have to look at it from the perspective of how do I get the data with the minimal amount of IO.
Hope it is as clear as mud now!
MySQL cannot normally use more than one index at a time. That means, for instance, that when you have a query that filters or sorts on two fields you put them both into the same index.
WordPress likely has a common query that filters and/or sorts on post_type, post_status and post_date. Making an educated guess as to what they stand for, this would likely be the core query for WordPress's Post listing pages. So the three fields are put into the same index.

Mysql query runs very slow when using order by

The following query takes 30 seconds to finish when having order by. Without order by it finish in 0.0035 seconds. I am already having an index on field "name". Field "id" is the primary key. I have 400,000 record in this table. Please help, what is wrong with the query when using order by.
SELECT *
FROM users
WHERE name IS NOT NULL
AND name != ''
AND ( status IS NULL OR status = '0' )
order by id desc
limit 50
Update: (solution at the end )
Hi All, Thanks for the help. Below is some updates as you asked:
Below is the output of explain.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE users range name name 258 NULL 226009 Using where; Using filesort
Yes, there are around 20 fields in this table.
Below are the indexes that I have:
Keyname Type Cardinality Field
PRIMARY PRIMARY 418099 id
name INDEX 411049 name
Solution:
It turns out the fields with null values are the reason. When making those 2 fields in the where condition to NOT NULL, it just takes .000x seconds. But the strange thing is, it increases to 29 seconds if I create an index of (status,name,id DESC) or (status,name,id).
You should definitely have compound index. A single one containing all the fields you need as DBMSs can not really use more than one index on a single query.
An OR clause is not really index-friendly, so if you can I recommend setting status to NOT NULL. I assume NULL does not have any different meaning from the zero number. This will help a lot to actually use the index.
I do not know how much name != '' is optimized. Semantically equal would be name > '' (meaning it is later in the alphabet), may be this also save you some CPU cycles.
Then you have to decide the order in which your columns appear. A rule of thumb could be cardinality, the possible values a field can have.
By this:
ALTER TABLE users ADD INDEX order1 (status, name, id DESC);
Edit
You don't need to delete indexes. MySQL will choose the best one very quickly and ignore the rest. They only cost disk space and some CPU cycles on UPDATEs. But if you do not need them in any circumstances you can remove them of course.
The long time is because the access to your table is slow. This is probably caused by dynamic length fields such as TEXT or BLOB. If you do not ALWAYS need these, you can move them to a twin auxiliary table like:
users (id, name, status, group_id)
profile (user_id, birthdate, gender, motto, cv)
This way the essential system-operations can be done with a restricted information about the user, and all the other stuff which is really content associated with the user only have to be used when it is really needed.
Edit2
You hint MySQL which index to use by specifying it (or more of them) like:
SELECT id, name FROM users USE INDEX (order1) WHERE name != '' and status = '0' ORDER BY id DESC
without having an explain it is hard to say, but most probably you also need an index on
the "status" column. slowness on a single table query almost always comes down to the query doing a full table scan as opposed to using an index.
try doing:
explain SELECT *
FROM users
WHERE name IS NOT NULL
AND name != ''
AND ( status IS NULL OR status = '0' )
order by id desc
limit 50
and post the output. you'll probably see it is doing a full table scan, because it doesn't have an index for status. here's some documentation on using "explain". If you want more background, this is a nice article on the kind of problem you are having.