My query is something like this
SELECT * FROM tbl1
JOIN tbl2 ON something = something
WHERE 1 AND (tbl2.date = '$date' OR ('$date' BETWEEN tbl1.planA AND tbl1.planB ))
When I run this query, it is considerably slower than for example this query
SELECT * FROM tbl1
JOIN tbl2 ON something = something
WHERE 1 AND ('$date' BETWEEN tbl1.planA AND tbl1.planB )
or
SELECT * FROM tbl1
JOIN tbl2 ON something = something
WHERE 1 AND tbl2.date = '$date'
In localhost, the first query takes about 0.7 second, the second query about 0.012 second and the third one 0.008 second.
My question is how do you optimize this? If currently I have 1000 rows in my tables and it takes 0.7 second to display the first query, it will take 7 seconds if I have 10.000 rows right? That's a massive slow down compared to second query (0.12 second) and third (0.08).
I've tried adding indexes, but the result is no different.
Thanks
Edit : This application will only work locally, so no need to worry about the speed over the web.
Sorry, I didn't include the EXPLAIN because my real query are much more complicated (about 5 joins). But the joins (I think) don't really matter, cos I've tried omitting them and still get approximately the same result as above.
The date belongs to tbl1, planA and planB belongs to tbl2. I've tried adding indexes to tbl1.date, tbl2.planA and tbl2.planB but the result is insignificant.
By schema do you mean MyISAM or InnoDB? It's MyISAM.
Okay, I'll just post my query straight away. Hopefully it's not that confusing.
SELECT *
FROM tb_joborder jo
LEFT JOIN tb_project p ON jo.project_id = p.project_id
LEFT JOIN tb_customer c ON p.customer_id = c.customer_id
LEFT JOIN tb_dispatch d ON jo.joborder_id = d.joborder_id
LEFT JOIN tb_joborderitem ji ON jo.joborder_id = ji.joborder_id
LEFT JOIN tb_mix m ON ji.mix_id = m.mix_id
WHERE dispatch_date = '2011-01-11'
OR '2011-01-11'
BETWEEN planA
AND planB
GROUP BY jo.joborder_id
ORDER BY customer_name ASC
And the describe output
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE jo ALL NULL NULL NULL NULL 453 Using temporary; Using filesort
1 SIMPLE p eq_ref PRIMARY PRIMARY 4 db_dexada.jo.project_id 1
1 SIMPLE c eq_ref PRIMARY PRIMARY 4 db_dexada.p.customer_id 1
1 SIMPLE d ALL NULL NULL NULL NULL 2048 Using where
1 SIMPLE ji ALL NULL NULL NULL NULL 455
1 SIMPLE m eq_ref PRIMARY PRIMARY 4 db_dexada.ji.mix_id 1
You can just use UNION to merge results of 2nd and 3d queries.
More about UNION.
First thing that comes to mind is to union the two:
SELECT * FROM tbl1
JOIN tbl2 ON something = something
WHERE 1 AND ('$date' BETWEEN planA AND planB )
UNION ALL
SELECT * FROM tbl1
JOIN tbl2 ON something = something
WHERE 1 AND date = '$date'
You have provided too little to make optimizations. We don't know anything about your data structures.
Even if most slow queries are usually due to the query itself or index setup of the used tables, you can try to find out where your bottleneck is with using the MySQL Query Profiler, too. It has been implemented into MySQL since Version 5.0.37.
Before you start your query, activate the profiler with this statement:
mysql> set profiling=1;
Now execute your long query.
With
mysql> show profiles;
you can now find out what internal number (query number) your long query has.
If you now execute the following query, you'll get alot of details about what took how long:
mysql> show profile for query (insert query number here);
(example output)
+--------------------+------------+
| Status | Duration |
+--------------------+------------+
| (initialization) | 0.00005000 |
| Opening tables | 0.00006000 |
| System lock | 0.00000500 |
| Table lock | 0.00001200 |
| init | 0.00002500 |
| optimizing | 0.00001000 |
| statistics | 0.00009200 |
| preparing | 0.00003700 |
| executing | 0.00000400 |
| Sending data | 0.00066600 |
| end | 0.00000700 |
| query end | 0.00000400 |
| freeing items | 0.00001800 |
| closing tables | 0.00000400 |
| logging slow query | 0.00000500 |
+--------------------+------------+
This is a more general, administrative approach, but can help narrow down or even find out the cause for slow queries very nice.
A good tutorial on how to use the MySQL Query Profiler can be found here in the MySQL articles.
Related
MySQL is not my specific field of expertise, and since I never needed to handle big tables, my basic knowledge has been enough - until now :)
When I have a master/detail MySQL tables, I always do queries like this:
SELECT SUM(detail.qty)
FROM master, detail
WHERE master.id = detail.masterId
AND (interestingField = interestingValue)
But right now, I have too may details rows and so the query is slow. I tried to understand the "INNER" syntax, and I came up with
SELECT SUM(detail.qty)
FROM master
INNER JOIN detail ON master.id = detail.masterId
WHERE (interestingField = interestingValue)
but the timing is still the same... So:
1) Does it mean that doing INNER JOIN is THE SAME than checking the master/detail field by hand?
2) Apart from making indexes using the required fields (which I already did), is there a way to make the query faster?
Thank you and sorry if the question seems dumb :(
----- As requested, I am adding the EXPLAIN output to my query
+----+-------------+--------+--------+---------------+-------------+---------+------+---------------------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | |
+----+-------------+--------+--------+---------------+-------------+---------+------+---------------------------------+-------+-------------+
| | 1 | SIMPLE | bill_t | ALL | PRIMARY,id | NULL | NULL | NULL | 4441 | Using where |
| | 1 | SIMPLE | bill_d | ref | billId,type | billId | 5 | localtha_bg3667772802.bill_t.id | 2 | Using where |
+----+-------------+--------+--------+---------------+-------------+---------+------+---------------------------------+-------+-------------+
INNER JOIN won't be any faster than WHERE since the query optimizer should produce an identical execution plan for both queries provided. However, different DB engines might produce different execution plans but typically such behaviour is observed for more complex queries (which this is not the case here). I would suggest to test both queries and choose the one which is faster.
As a general advice, I would recommend INNER JOIN which IMO is much more readable and it reduces the risk of omitting the condition in the WHERE clause that might lead to a CROSS JOIN.
for your query with master and detail performance assuming that interestingField in detail table be sure you have a proper composite index on
create index idx1 on detail (interestingField, masterId )
For the INNER JOIN and Where the implicit join based on where is an OLD way for manage join .. the recente version is based on Explictic JOIN condition eg:
SELECT SUM(detail.qty)
FROM master
INNER detail ON master.id = detail.masterId
AND interestingField = interestingValue
each the way are the same for performance
I cant find an answer to this despite looking for several days!
In MySQL I have 2 Tables
ProcessList contains foreign keys all from the process Table
ID |Operation1|Operation2|Operation3|etc....
---------------------------------------
1 | 1 | 4 | 6 | ....
---------------------------------------
2 | 2 | 4 | 5 |....
---------------------------------------
.
.
.
Process Table
ID | Name
-------------------
1 | Quote
2 | Order
3 | On-Hold
4 | Manufacturing
5 | Final Inpection
6 | Complete
Now, I am new to SQL but I understand that MYSQL doesnt have a pivot function as Ive researched, and I see some examples with UNIONs etc, but I need an SQL expression something like (pseudocode)
SELECT name FROM process
(IF process.id APPEARS in a row of the ProcessList)
WHERE processListID = 2
so I get the result
Order
Manufacturing
Final Inspection
I really need the last line of the query to be
WHERE processListID = ?
because otherwise I will have to completely rewrite my app as the SQL is stored in a String in java, and the app suplies the key index only at the end of the statement.
One option is using union to unpivot the processlist table and joining it to the process table.
select p.name
from process p
join (select id,operation1 as operation from processlist
union all
select id,operation2 from processlist
union all
select id,operation3 from processlist
--add more unions as needed based on the number of operations
) pl
on pl.operation=p.id
where pl.id = ?
If you always consider only a single line in the process list (i.e. procsessListId = x), the following query should do a pretty simple and performant job:
select p.name from process p, list l
where l.id = 2
and (p.id in (l.operation1, l.operation2, l.operation3))
I have some tables:
object
person
project
[...] (some more tables)
type
The object table has foreign keys to all other tables.
Now I do a query like:
SELECT * FROM object
LEFT JOIN person ON (object.person_id = person.id)
LEFT JOIN project ON (object.project_id = project.id)
LEFT JOIN [...] (all other joins)
LEFT JOIN type ON (object.type_id = type.id)
WHERE object.customer_id = XXX
ORDER BY object.type_id ASC
LIMIT 25
This works perfectly well and fast, even for big resultsets. For example I have 90000 objects and the query takes about 3 seconds. The result ist quite big because the tables have a lot of columns and all of them are fetched. For info: I'm using Symfony with Propel, InnoDB and the "doSelectJoinAll"-function.
But if do a query like (sort by type.name):
SELECT * FROM object
LEFT JOIN person ON (object.person_id = person.id)
LEFT JOIN project ON (object.project_id = project.id)
LEFT JOIN [...] (all other joins)
LEFT JOIN type ON (object.type_id = type.id)
WHERE object.customer_id = XXX
ORDER BY type.name ASC
LIMIT 25
The query takes about 200 seconds!
EXPLAIN:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 | SIMPLE | object | ref | object_FI_2 | object_FI_2 | 4 | const | 164966 | Using where; Using temporary; Using filesort
1 | SIMPLE | person | eq_ref | PRIMARY | PRIMARY | 4 | db.object.person_id | 1
1 | SIMPLE | ... | eq_ref | PRIMARY | PRIMARY | 4 | db.object...._id | 1
1 | SIMPLE | type | eq_ref | PRIMARY | PRIMARY | 4 | db.object.type_id | 1
I saw in the processlist, that MySQL is creating a temporary table for such a sorting on a joined table.
Adding an index to type.name didn't improve the performance. There are only about 800 type rows.
I found out that the many joins and the big result is the problem, because if I do a query with just one join like:
SELECT * FROM object
LEFT JOIN type ON (object.type_id = type.id)
WHERE object.customer_id = XXX
ORDER BY type.name ASC
LIMIT 25
it works as fast as expected.
Is there a way to improve such sorting queries on a big resultset with many joined tables? Or is it just a bad habit to sort on a joined table column and this shouldn't be done anyway?
Thank you
LEFT gets in the way of rearranging the order of the tables. How fast is it without any LEFT? Do you get the same answer?
LEFT may be a red herring... Here's what the optimizer is likely to be doing:
Decide what order to do the tables in. Take into consideration any WHERE filtering and any LEFTs. Because of WHERE object.customer_id = XXX, object is likely to be the best table to start with.
Get the rows from object that satisfy the WHERE.
Get the columns needed from the other tables (do the JOINs).
Sort according to the ORDER BY ** see below
Deliver the first 25 rows.
** Let's dig deeper into these two:
WHERE object.customer_id = XXX ORDER BY object.id
WHERE object.customer_id = XXX ORDER BY virtually-anything-else
You have INDEX(customer_id), correct? And the table is InnoDB, correct? Well, each secondary index implicitly includes the PRIMARY KEY, as if you had said INDEX(customer_id, id). The optimal index for the first WHERE + ORDER BY is precisely that. It will locate XXX and scan 25 rows, then stop. You might say that steps 2,4,5 are blended together.
The second WHERE just gather all the stuff through step 4. This could be thousands of rows. Hence it is likely to be a lot slower.
See also article on building optimal indexes.
I have this query:
SELECT 1 AS InputIndex,
IF(TRIM(DeviceInput1Name = '', 0, IF(INSTR(DeviceInput1Name, '|') > 0, 2, 1)) AS InputType,
(SELECT Value1_1 FROM devicevalues WHERE DeviceID = devices.DeviceID ORDER BY ValueTime DESC LIMIT 1) AS InputValueLeft,
(SELECT Value1_2 FROM devicevalues WHERE DeviceID = devices.DeviceID ORDER BY ValueTime DESC LIMIT 1) AS InputValueRight
FROM devices
WHERE DeviceIMEI = 'Some_Search_Value';
This completes fairly quickly (in up to 0.01 seconds). However, running the same query with WHERE clause as such
WHERE DeviceIMEI = 'Some_Other_Search_Value';
makes it run for upwards of 14 seconds! Some search values finish very quickly, while others run way too long.
If I run EXPLAIN on either query, I get the following:
+----+--------------------+--------------+-------+---------------+------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------------+-------+---------------+------------+---------+-------+------+-------------+
| 1 | PRIMARY | devices | ref | DeviceIMEI | DeviceIMEI | 28 | const | 1 | Using where |
| 3 | DEPENDENT SUBQUERY | devicevalues | index | DeviceID,More | ValueTime | 9 | NULL | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | devicevalues | index | DeviceID,More | ValueTime | 9 | NULL | 1 | Using where |
+----+--------------------+--------------+-------+---------------+------------+---------+-------+------+-------------+
Also, here's the actual number of records, just so it's clear:
mysql> select count(*) from devicevalues inner join devices using(DeviceID) where devices.DeviceIMEI = 'Some_Search_Value';
+----------+
| count(*) |
+----------+
| 1017946 |
+----------+
1 row in set (0.17 sec)
mysql> select count(*) from devicevalues inner join devices using(DeviceID) where devices.DeviceIMEI = 'Some_Other_Search_Value';
+----------+
| count(*) |
+----------+
| 306100 |
+----------+
1 row in set (0.04 sec)
Any ideas why changing a search value in the WHERE clause would cause the query to execute so slowly, even when the number of physical records to search through is lower?
Note there is no need for you to rewrite the query, just explain why the above happens.
UPDATE: I have tried running two separate queries instead of one with dependent subqueries to get the information I need (first I select DeviceID from devices by DeviceIMEI, then select from devicevalues by DeviceID I got from the previous query) and all queries return instantly. I suppose the only solution is to run these queries in a transaction, so I'll be making a stored procedure to do this. This, however, still doesn't answer my question which puzzles me.
I dont think that 1017946 is equivalent to the number of rows returned by your very first query.Your first query returns all rows from devices with some correlated queries,your count query returns all common rows between the 2 tables.If this is so the problem might be cardinality issues namely some_other_values constitute a much larger proportion of the rows in your first query than some_value so Mysql chooses a table scan.
If I understand correctly, the query is the same, and only the searched value changes.
There are three real possibilities as I can see, the first much likelier than the others:
The fast query only appears to be fast. And that's why it is in the MySQL query cache already. Try disabling the cache, running with NO_SQL_CACHE, or run the slow query twice. If the second way round runs in 0.01s instead of 14s, you'll know this is the case.
One query has to look way more records than the other. An IMEI may have lots of rows in devicevalues, another might have next no none. Apparently you are in such a condition, and what makes this unlikely is (apart from the times involved) the fact that it is the slower IMEI which actually has less matches.
The slow query is indeed slow. This means that a particular subset of data is hard to locate or hard to retrieve. The first may be due to an overdue reindexing or to filesystem fragmentation of very large indexes. The second can also be due to fragmentation of the tablespace, or to other condition which splits up records (for example the database is partitioned). A search in a small partition is wont to be faster than a search in a large partition.
But the time differences involved aren't equal in the three cases, and a 1400x difference seems to me an unlikely consequence of (2) or (3). The first possibility seems way more appealing.
Update you seem not to be using indexes on your subqueries. Have you an index such as
CREATE INDEX dv_ndx ON devicevalues(DeviceID, ValueTime);
If you can, you can try a covering index:
CREATE INDEX dv_ndx ON devicevalues(DeviceID, ValueTime, Value1_1, Value1_2);
I was trying to optimize NOT IN clause in mysql: Some how I ended up in the following query:
SELECT #i:=(SELECT correct_option_word_id FROM sent_question WHERE msisdn='abc');
SELECT * FROM word WHERE #i IS NULL OR word_id NOT IN (#i);
There is no relationship between sent_question table and word table. And also I cannot place index on correct_option_word_id.
Can somebody please explain, will this method even optimize the query or not?
UPDATE: As mentioned here that both the methods: NOT IN and LEFT JOIN/IS NULL are almost equally efficient. That's why I don't want to use LEFT JOIN/IS NULL method.
UPDATE 2:
Explain results for original query:
EXPLAIN SELECT * FROM word WHERE word_id NOT IN (SELECT correct_option_word_id FROM sent_question WHERE msisdn='abc');
+----+--------------------+---------------+------+-------------------------+-------------------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------------+------+-------------------------+-------------------------+---------+-------+------+-------------+
| 1 | PRIMARY | word | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
| 2 | DEPENDENT SUBQUERY | sent_question | ref | fk_question_subscriber1 | fk_question_subscriber1 | 48 | const | 1 | Using where |
+----+--------------------+---------------+------+-------------------------+-------------------------+---------+-------+------+-------------+
You're right in that both the NOT IN and LEFT JOIN/IS NULL method are equally efficient, however, unfortunately, there is no faster option, only slower ones (NOT EXISTS).
Here's your query, simplified:
SELECT *
FROM word
WHERE
word_id NOT IN (SELECT correct_option_word_id FROM sent_question WHERE msisdn='abc')
As you know, MySQL will do the subquery first and use the returned result set for the NOT IN clause. Then, it will scan through all of the rows in word to see if word_id is in the list for each row.
Unfortunately for this case, indexes are inclusive, not exclusive. They don't help with NOT queries. A covering index on word could potentially still be used to avoid accessing the actual table, and provide some IO benefits, but it won't be used in the traditional "lookup" sense. However, since you are returning all columns on the word table, it may not be viable to have such a large index.
The most important index that will be used here is an index on sent_question.msisdn for the subquery. Ensure that you have that index defined. A multi-column "covering" index on (msisdn, correct_option_word_id) would be best.
If you share your design, we can probably offer some design solutions for optimization.
I doubt it'll work at all.
Try
SELECT *
FROM word AS w
LEFT JOIN sent_question AS sq
ON w.word_id = sq.correct_option_word_id AND sq.msisdn='abc'
WHERE sq.correct_option_word_id IS NULL
Give this simple query a try
SELECT
sent_question.*,
word.word_id AS foundWord
FROM sent_question
LEFT JOIN word
ON word.word_id = sent_question.correct_option_word_id
WHERE sent_question.msisdn='abc'
// GROUP BY sent_question.correct_option_word_id // This shouldn't be needed but included for completion
HAVING foundWord IS NULL