When I enter the query:
SELECT A.T1, B.T2, from A, B where A.T3 = B.T3 and A.T4 = B.T4
mysql hangs. However, when I get rid of one one of the constraints:
SELECT A.T1, B.T2, from A, B where A.T3 = B.T3
mysql returns the result almost immediately. Why is this?
You're doing this in a terribly slow and inefficient way. It has nothing to do with the AND clause itself, but the way your tables are designed and the way you are trying to perform your query.
Try something along the lines of:
SELECT A.T1, B.T2
FROM A
JOIN B ON B.T3 = A.T3 AND A.T4 = B.T4
Also, to further increase performance, put the table with the most columns in FROM clause and not the JOIN.
Also, since it's such a large set of tables, you should consider indexing them.
CREATE INDEX index_name
ON A (T4)
GO
CREATE INDEX index_name2
ON B (T4)
GO
You only need to create an Index Once! You do not need to do this every time you run your query. Generally, once you create the index you don't need to worry about it again.
More about indexing: http://en.wikipedia.org/wiki/Database_index
Most frequently when this happens it has nothing to do with the constraints themselves, but with the relationship of the constraints to the indexes. Take a look at the WHERE clause with respect to the defined indexes on the tables. The order of the WHERE clause can make a difference also.
If you can, try going away from SQL-1 type JOINs to the more visible and flexible SQL-2 type. See madburn's answer.
Related
Please assume the following query:
select dic.*, d.syllable
from dictionary dic
join details d on d.word = dic.word
As you know (and I heard), MySQL uses only one index per query. Now I want to know, in query above, which index would be better?
dictionary(word)
details(word)
In another word, when there is a join (two tables are involved), the index of which one would be affected? Should I create both of them (on the columns on the on clause) and MySQL itself decides using which one is better?
As you know (and I heard), MySQL uses only one index per query.
In general, most databases will only use one index per table, per query. This isn't always the case, but is at least a decent rule of thumb. For your particular example, you can rely on this.
Now I want to know, in query above, which index would be better?
The query you wrote is actually an inner join. This means that either of the two tables could appear on the left side of the join, and the result sets would be logically equivalent. As a result of this, MySQL is therefore free to write the join in any order it chooses. The plan that gets chosen will likely place the larger table on the left hand side, and the smaller table on the right hand side. If you knew the actual execution order of the tables, then you would just index the right table. Given that you may not know this, then both of your suggested indices are reasonable:
CREATE INDEX dict_idx ON dictionary (word);
CREATE INDEX details_idx ON details (word);
We could even try to improve on the above indices by covering the columns which appear in the select clause. For example, the index on details could be expanded to:
CREATE INDEX details_idx ON details (word, syllable);
This would let MySQL use the above index exclusively to satisfy the query plan, without requiring a seek back to the original table. You select dictionary.*, so covering this with a single index might not be possible or desirable, but at least this gets the point across.
MySQL would use the most selective index (the one giving the fewest rows). This means it depends on the data, and also optimizations like this could change between versions of the database.
I have a MySQL table with ~17M rows where I end up doing a lot of aggregation queries.
For this example lets say I have index_on_b, index_on_c, compound_index_on_a_b, compound_index_on_a_c
I try and run a query explain
EXPLAIN SELECT SUM(revenue) FROM table WHERE a = some_value AND b = other_value
And I find that the selected index is index_on_b, but when I use a query hint
SELECT SUM(revenue) FROM table USE INDEX(compound_index_on_a_b)
The query runs way way faster. Is there anything I can do in MySQL config to make MySQL choose the compound indexes first?
There are 2 possible routes you can take:
A) The index resolution process is when according to the optimizer all things are equal based on the order the indexes are created in. You could drop index_b and recreate it and check if the optimizer was in a scenario where it just thought they were the same.
Or
B) Use optimizer_search_depth (see https://mariadb.com/blog/setting-optimizer-search-depth-mysql). By altering this parameter you determine how much effort the optimizer is allowed to spend on a query plan, and it might come up with the much better solution of using the combined index.
A possible explanation:
If a has the same value throughout the table, then INDEX(b) is actually better than INDEX(a,b). This is because the former is smaller, hence faster to work with. Note that both will return the same number of rows, even without further checking of a.
Please provide:
SHOW CREATE TABLE
SHOW INDEXES -- to see cardinality
EXPLAIN SELECT
I have two tables:
module_339 (id,name,description,etc)
module_339_schedule(id,itemid,datestart,dateend,timestart,timeend,days,recurrent)
module_339_schedule.itemid points to module_339
fist table holds conferences
second one keeps the schedules of the conferences
module_339 has 3 items
module_339_schedule has 4000+ items - almost evenly divided between the 3 conferences
I have a stored function - "getNextDate_module_339" - which will compute the "next date" for a specified conference, in order to be able to display it, and also sort by it - if the user wants to. This stored procedure will just take all the schedule entries of the specified conference and loop through them, comparing dates and times. So it will do one simple read from module_339_schedule, then loop through the items and compare dates and times.
The problem: this query is very slow:
SELECT
distinct(module_339.id)
,min( getNextDate_module_339(module_339.id,1,false)) AS ND
FROM
module_339
LEFT JOIN module_339_schedule on module_339.id=module_339_schedule.itemid /* standard schedule adding */
WHERE 1=1 AND module_339.is_system_preview<=0
group by
module_339.id
order by
module_339.id asc
If I remove either the function call OR the LEFT JOIN, it is fast again.
What am I doing wrong here? Seems to be some kind of "collision" between the function call and the left join.
I think the group by part can be removed from this query, thus enabling you to remove the min function as well. Also, there is not much point of WHERE 1=1 AND..., so I've changed that as well.
Try this:
SELECT DISTINCT module_339.id
,getNextDate_module_339(module_339.id,1,false) AS ND
FROM module_339
LEFT JOIN module_339_schedule ON module_339.id=module_339_schedule.itemid /* standard schedule adding */
WHERE module_339.is_system_preview<=0
ORDER BY module_339.id
Note that this might not have a lot of impact on performance.
I think that the worst part performance-wise is probably the getNextDate_module_339 function.
If you can find a way to get it's functionallity without using a function as a sub query, your sql statement will probably run alot faster then now, with or without the left join.
If you need help doing this, please edit your question to include the function and hopefully I (or someone else) might be able to help you with that.
From the MySQL reference manual:
The best way to improve the performance of SELECT operations is to create indexes on one or more of the columns that are tested in the query. The index entries act like pointers to the table rows, allowing the query to quickly determine which rows match a condition in the WHERE clause, and retrieve the other column values for those rows. All MySQL data types can be indexed.
Although it can be tempting to create an indexes for every possible column used in a query, unnecessary indexes waste space and waste time for MySQL to determine which indexes to use. Indexes also add to the cost of inserts, updates, and deletes because each index must be updated. You must find the right balance to achieve fast queries using the optimal set of indexes.
As a first step I suggest checking that the joined columns are both indexed. Since primary keys are always indexed by default, we can assume that module_339 is already indexed on the id column, so first verify that module_339_schedule is indexed on the itemid column. You can check the indexes on that table in MySQL using:
SHOW INDEX FROM module_339_schedule;
If the table does not have an index on that column, you can add one using:
CREATE INDEX itemid_index ON module_339_schedule (itemid);
That should speed up the join component of the query.
Since your query also references module_339.is_system_preview you might also consider adding an index to that column using:
CREATE INDEX is_system_preview_index ON module_339 (is_system_preview);
You might also be able to optimize the stored procedure, but you haven't included it in your question.
I have the following database structure :
create table Accounting
(
Channel,
Account
)
create table ChannelMapper
(
AccountingChannel,
ShipmentsMarketPlace,
ShipmentsChannel
)
create table AccountMapper
(
AccountingAccount,
ShipmentsComponent
)
create table Shipments
(
MarketPlace,
Component,
ProductGroup,
ShipmentChannel,
Amount
)
I have the following query running on these tables and I'm trying to optimize the query to run as fast as possible :
select Accounting.Channel, Accounting.Account, Shipments.MarketPlace
from Accounting join ChannelMapper on Accounting.Channel = ChannelMapper.AccountingChannel
join AccountMapper on Accounting.Accounting = ChannelMapper.AccountingAccount
join Shipments on
(
ChannelMapper.ShipmentsMarketPlace = Shipments.MarketPlace
and ChannelMapper.AccountingChannel = Shipments.ShipmentChannel
and AccountMapper.ShipmentsComponent = Shipments.Component
)
join (select Component, sum(amount) from Shipment group by component) as Totals
on Shipment.Component = Totals.Component
How do I make this query run as fast as possible ? Should I use indexes ? If so, which columns of which tables should I index ?
Here is a picture of my query plan :
Thanks,
Indexes are essential to any database.
Speaking in "layman" terms, indexes are... well, precisely that. You can think of an index as a second, hidden, table that stores two things: The sorted data and a pointer to its position in the table.
Some thumb rules on creating indexes:
Create indexes on every field that is (or will be) used in joins.
Create indexes on every field on which you want to perform frequent where conditions.
Avoid creating indexes on everything. Create index on the relevant fields of every table, and use relations to retrieve the desired data.
Avoid creating indexes on double fields, unless it is absolutely necessary.
Avoid creating indexes on varchar fields, unless it is absolutely necesary.
I recommend you to read this: http://dev.mysql.com/doc/refman/5.5/en/using-explain.html
Your JOINS should be the first place to look. The two most obvious candidates for indexes are AccountMapper.AccountingAccount and ChannelMapper.AccountingChannel.
You should consider indexing Shipments.MarketPlace,Shipments.ShipmentChannel and Shipments.Component as well.
However, adding indexes increases the workload in maintaining them. While they might give you a performance boost on this query, you might find that updating the tables becomes unacceptably slow. In any case, the MySQL optimiser might decide that a full scan of the table is quicker than accessing it by index.
Really the only way to do this is to set up the indexes that would appear to give you the best result and then benchmark the system to make sure you're getting the results you want here, whilst not compromising the performance elsewhere. Make good use of the EXPLAIN statement to find out what's going on, and remember that optimisations made by yourself or the optimiser on small tables may not be the same optimisations you'd need on larger ones.
The other three answers seem to have indexes covered so this is in addition to indexes. You have no where clause which means you are always selecting the whole darn database. In fact, your database design doesn't have anything useful in this regard, such as a shipping date. Think about that.
You also have this:
join (select Component, sum(amount) from Shipment group by component) as Totals
on Shipment.Component = Totals.Component
That's all well and good but you don't select anything from this subquery. Therefore why do you have it? If you did want to select something, such as the sum(amount), you will have to give that an alias to make it available in the select clause.
I have read that creating a temporary table is best if the number of parameters passed in the IN criteria is large. This is for select queries. Does this hold true for update queries as well ?? I have an update query which uses 3 table joins (Inner Joins) and passes 1000 parameters in the IN criteria and this query runs in a loop for 200 or more times. Which is the best approach to execute this query ?
IN operations are usually slow. Passing 1000 parameters to any query sounds awful. If you can avoid that, do it. Now, I'd really have a go with the temp table. You can even play with the indexing of the table. I mean, instead of just putting values in it, play with the indexes that would help you optimize your searches.
On the other hand, adding with indexes is slower that adding without indexes. Go for an empiric test there. Now, what I think is a must, bear in mind that when using the other table you don't need to use the IN clause because you can use the EXISTS clause which results usually in better performance. I.E.:
select * from yourTable yt
where exists (
select * from yourTempTable ytt
where yt.id = ytt.id
)
I don't know your query, nor data, but that would give you an idea about how to do it. Note the inner select * is as fast as select aSingleField, as the database engine optimizes it.
Those are all my thoughts. But remember, to be 100% sure of what is best for your problem, there is nothing like performing both tests and timing them :) Hope this help.