I am using the following mysql select-query with several joins. I am wondering if this is how a somewhat good select-statement should look like:
SELECT *
FROM table_news AS a
INNER JOIN table_cat AS b ON a.cat_id = b.id
INNER JOIN table_countries AS c ON a.country_id = c.id
INNER JOIN table_addresses AS d ON a.id = d.news_id
WHERE a.deleted = 0
AND a.hidden = 0
AND a.cat_id = ".$search_cat."
AND a.country_id = ".$search_country."
AND a.title LIKE '%".$search_string."%'
OR a.deleted = 0
AND a.hidden = 0
AND a.cat_id = ".$search_cat."
AND a.country_id = ".$search_country."
AND a.subtitle LIKE '%".$search_string."%'"
It seems to be a lot of joins. Even though table b and table c contain only 3 or 4 fields, I wonder if the number of joins would clearly slow down the search on the starting-page?
Would it be better to put the fields from table d (street, city and so on) back into the main-table, as they should be needed most of the time this query is executed?
Thanx in advance,
Jayden
I don't think there is necessarily anything wrong with having three joins. There are a couple of things you can do to make sure the query is optimised.
Firstly, you should never do SELECT * - instead explicitly state what fields you want to return from the database.
Also, I would create indexes on all the fields you have in the where clause, and all of the fields you are joining. This can be a little bit of a trade off - for example if you are doing a lot of write operations then there is a hit because you need to write to the index everytime.
Related
I have a pretty big SQL query to get data from multiple database tables. I use the ON condition to check if the guild_ids are always the same and in some cases, he check's for an user_id too.
That is my query:
SELECT
SUM( f.guild_id = 787672220503244800 AND f.winner_id LIKE '%841827102331240468%' ) AS guild_winner,
SUM( f.winner_id LIKE '%841827102331240468%' ) AS win_sum,
m.message_count,
r.bypass_role_id,
i.real_count,
i.total_count,
i.bonus_count,
i.left_count
FROM
guild_finished_giveaways AS f
JOIN guild_message_count AS m
JOIN guild_role_settings AS r
JOIN guild_invite_count AS i ON m.guild_id = f.guild_id
AND m.user_id = 841827102331240468
AND r.guild_id = f.guild_id
AND i.guild_id = f.guild_id
AND i.user_id = m.user_id
But it runs pretty slow, with over 15s. I can't see why it needs so long.
I figured out that if I remove the "guild_invite_count" JOIN, it's pretty fast again. Do I have some simple error here that I don't see? Or what could be the issue?
Each JOIN expression needs it's own ON. Don't wait until the end for this. As it was, the server was forced to build up a cartesian product of all those tables before narrowing them down again, and I'm surprised the query ran at all (I'd expect a syntax error for missing ON clauses).
FROM guild_finished_giveaways AS f
JOIN guild_message_count AS m ON m.guild_id = f.guild_id
JOIN guild_role_settings AS r ON r.guild_id = f.guild_id
JOIN guild_invite_count AS i ON i.guild_id = f.guild_id
AND i.user_id = m.user_id
WHERE m.user_id = 841827102331240468
It's also more than a little odd to use SUM() or any other aggregate function in the same query as non-aggregated values without a GROUP BY clause.
Are you using InnoDB?
Does every table have a PRIMARY KEY?
These may help:
m: PRIMARY KEY(user_id) -- assuming that is unique in that table
f: INDEX(guild_id, winner_id)
r: INDEX(guild_id, bypass_role_id)
i: INDEX(user_id,)
It looks like some tables should not be separate -- perhaps r,i,f could be combined? (I need to see SHOW CREATE TABLE to say more.)
Do NOT have a commalist in winner_id. Instead have another table with one row per winner per game (or whatever it is a winner of). Perhaps just to columns like a Many-to-many mapping table.
Noting that the execution is likely to start with m and then go next to i let's improve on Joel's suggestion:
FROM guild_message_count AS m
JOIN guild_invite_count AS i ON i.user_id = m.user_id
JOIN guild_finished_giveaways AS f ON f.guild_id = m.guild_id
JOIN guild_role_settings AS r ON r.guild_id = m.guild_id
WHERE m.user_id = 841827102331240468
Note that 3 tables are joined on guild_id; but only 2 = are needed.
SUM without GROUP BY sums up the entire resultset (after JOINing). But you have 6 non-aggregates, so you need to GROUP BY all 6.
But that may lead to grossly inflated sums. Maybe you need to do the aggregation just over f first since that is where you are summing. Then JOIN to the rest??
I'm always be amused and confused(at same time) whenever I have been to asked prepare and run Join query on Sql Console.
And the cause of most confusion is mainly based upon the fact whether/or not the ordering of join predicate hold any importances in Join results.
Example.
SELECT "zones"."name", "ip_addresses".*
FROM "ip_addresses"
INNER JOIN "zones" ON "zones"."id" = "ip_addresses"."zone_id"
WHERE "ip_addresses"."resporg_accnt_id" = 1
AND "zones"."name" = 'us-central1'
LIMIT 1;
Given the sql query, the Join predicate look like this.
... INNER JOIN "zones" ON "zones"."id" = "ip_addresses"."zone_id" WHERE "ip_addresses"."resporg_accnt_id"
Now, would it make any difference in term of performance of Join as well as the authenticity of the obtained result. If happen to change the predicate to look like this
... INNER JOIN "zones" ON "ip_addresses"."zone_id" = "zones"."id" WHERE "ip_addresses"."resporg_accnt_id"
The predicate order won't make a performance difference in your case, a simple equality condition, but personally I like to place the columns from the table I'm JOINing to on the LHS of each ON condition
SELECT ...
FROM ip_addresses ia
JOIN zones z
ON z.id = ia.zone_id
WHERE ...
The optimiser can use any index available on these columns during the JOIN and I find it easier to visualise this way.
Any additional conditions also tend to be on columns of the table being JOINed to and I find again this reads better when this table is consistently on the LHS
Not quite the same, but I did see a case where performance was affected by the choice of column to isolate
I think the JOIN looked something like
SELECT ...
FROM table_a a
JOIN table_b b
ON a.id = b.id - 1
Changing this to
SELECT ...
FROM table_a a
JOIN table_b b
ON b.id = a.id + 1
allowed the optimiser to use an index on b.id, but presumably at the cost of an index on a.id
I suspect this kind of query might need analysing on a case by case basis
Furthermore, I would probably switch your table order round too and write your original query:
SELECT z.name,
ia.*
FROM zones z
JOIN ip_addresses ia
ON ia.zone_id = z.id
AND ia.resporg_accnt_id = 1
WHERE z.name = 'us-central1'
LIMIT 1
Conceptually, you are saying "Start with the 'us-central1' zone and fetch me all the ip_addresses associated with a resporg_accnt_id of 1"
Check the EXPLAIN plans if you want to verify that there is no difference in your case
Im writing this complex query to return a large dataset, which is about 100,000 records. The query runs fine until i add in this OR statement to the WHERE clause:
AND (responses.StrategyFk = strategies.Id Or responses.StrategyFk IS
Null)
Now i understand that by putting the or statement in there it adds a lot of overhead.
Without that statement and just:
AND responses.StrategyFk = strategies.Id
The query runs within 15 seconds, but doesn't return any records that didn't have a fk linking a strategie.
Although i would like these records as well. Is there an easier way to find both records with a simple where statement? I can't just add another AND statement for null records because that will break the previous statement. Kind of unsure of where to go from here.
Heres the lower half of my query.
FROM
responses, subtestinstances, students, schools, items,
strategies, subtests
WHERE
subtestinstances.Id = responses.SubtestInstanceFk
AND subtestinstances.StudentFk = students.Id
AND students.SchoolFk = schools.Id
AND responses.ItemFk = items.Id
AND (responses.StrategyFk = strategies.Id Or responses.StrategyFk IS Null)
AND subtests.Id = subtestinstances.SubtestFk
try:
SELECT ... FROM
responses
JOIN subtestinstances ON subtestinstances.Id = responses.SubtestInstanceFk
JOIN students ON subtestinstances.StudentFk = students.Id
JOIN schools ON students.SchoolFk = schools.Id
JOIN items ON responses.ItemFk = items.Id
JOIN subtests ON subtests.Id = subtestinstances.SubtestFk
LEFT JOIN strategies ON responses.StrategyFk = strategies.Id
That's it. No OR condition is really needed, because that's what a LEFT JOIN does in this case. Anywhere responses.StrategyFk IS NULL will result in no match to the strategies table, and it wil return a row for that.
See this link for a simple explanation of joins: http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html
After that, if you're still having performance issues then you can start looking at the EXPLAIN SELECT ... ; output and looking for indexes that may need to be added. Optimizing Queries With Explain -- MySQL Manual
Try using explicit JOINs:
...
FROM responses a
INNER JOIN subtestinstances b
ON b.id = a.subtestinstancefk
INNER JOIN students c
ON c.id = b.studentfk
INNER JOIN schools d
ON d.id = c.schoolfk
INNER JOIN items e
ON e.id = a.itemfk
INNER JOIN subtests f
ON f.id = b.subtestfk
LEFT JOIN strategies g
ON g.id = a.strategyfk
I've found info on how to optimize MySQL queries, but most of the tips seem to suggest avoiding things MySQL isn't built for (e.g., calculations, validation, etc.) My query on the other hand is very straight forward but joins a lot of tables together.
Is there an approach to speeding up simple queries with many INNER JOINS? How would I fix my query below?
SELECT t_one.id FROM table_one t_one
INNER JOIN entr_to_state st
INNER JOIN entr_to_country ct
INNER JOIN entr_to_domain dm
INNER JOIN entr_timing t
INNER JOIN entr_to_weather a2w
INNER JOIN entr_to_imp_num a2i
INNER JOIN entr_collection c
WHERE t_one.type='normal'
AND t_one.campaign_id = c.id
AND t_one.status='running'
AND c.status='running'
AND (c.opt_schedule = 'continuous' OR (c.opt_schedule = 'schedulebydate'
AND (c.start_date <= '2011-03-06 14:25:52' AND c.end_date >= '2011-03-06 14:25:52')))
AND t.entr_id = t_one.id AND ct.entr_id = t_one.id
AND st.entr_id = t_one.id AND a2w.entr_id = t_one.id
AND (t_one.targeted_gender = 'male' OR t_one.targeted_gender = 'both')
AND t_one.targeted_min_age <= 23.1 AND t_one.targeted_max_age > 23.1
AND (ct.abbreviation = 'US' OR ct.abbreviation = 'any')
AND (st.abbreviation = 'CO' OR st.abbreviation = 'any')
AND t.sun = 1 AND t.hour_14 = 1
AND (a2w.weather_category_id = 1 OR a2w.weather_category_id = 0)
AND t_one.targeted_min_temp <= 46
AND t_one.targeted_max_temp > 46 GROUP BY t_one.id
Index all relevant fields, of course, which I'm sure you have
Then find which joins are the most costly ones by running EXPLAIN SELECT...
Consider splitting them off into a seperate query i.e. narrow down the record(s) you're looking for, then perform the joins on those records rather than all the records
i.e.
SELECT c.*, ....
FROM (SELECT x, y, z .... ) AS c
You would need to EXPLAIN SELECT the query, check which parts of the query are not using in indices, and then attempt to index those. If possible, break the query down into smaller parts as well.
If you really cannot in any way optimize the underlying DB or your query, you could resort to a flat table that has the data you need for fast access. Then just hook up the main query to update the flat table to run as often as needed.
I know I can change the way MySQL executes a query by using the FORCE INDEX (abc) keyword. But is there a way to change the execution order?
My query looks like this:
SELECT c.*
FROM table1 a
INNER JOIN table2 b ON a.id = b.table1_id
INNER JOIN table3 c ON b.itemid = c.itemid
WHERE a.itemtype = 1
AND a.busy = 1
AND b.something = 0
AND b.acolumn = 2
AND c.itemid = 123456
I have a key for every relation/constraint that I use. If I run explain on this statement I see that mysql starts querying c first.
id select_type table type
1 SIMPLE c ref
2 SIMPLE b ref
3 SIMPLE a eq_ref
However, I know that querying in the order a -> b -> c would be faster (I have proven that)
Is there a way to tell mysql to use a specific order?
Update: That's how I know that a -> b -> c is faster.
The above query takes 1.9 seconds to complete and returns 7 rows. If I change the query to
SELECT c.*
FROM table1 a
INNER JOIN table2 b ON a.id = b.table1_id
INNER JOIN table3 c ON b.itemid = c.itemid
WHERE a.itemtype = 1
AND a.busy = 1
AND b.something = 0
AND b.acolumn = 2
HAVING c.itemid = 123456
the query completes in 0.01 seconds (Without using having I get 10.000 rows).
However that is not a elegant solution because this query is a simplified example. In the real world I have joins from c to other tables. Since HAVING is a filter that is executed on the entire result it would mean that I would pull some magnitues more records from the db than nescessary.
Edit2: Just some information:
The variable part in this query is c.itemid. Everything else are fixed values that don't change.
Indexes are setup fine and mysql chooses the right ones for me
between a and b there is a 1:n relation (index PRIMARY is used)
between b and c there is a many to many relation (index IDX_ITEMID is used)
the point is that mysql should start querying table a and work it's way down to c and not the other way round. Any change to achive that.
Solution: Not exactly what I wanted but this seems to work:
SELECT c.*
FROM table1 a
INNER JOIN table2 b ON a.id = b.table1_id
INNER JOIN table3 c ON b.itemid = c.itemid
WHERE a.itemtype = 1
AND a.busy = 1
AND b.something = 0
AND b.acolumn = 2
AND c.itemid = 123456
AND f.id IN (
SELECT DISTINCT table2.id FROM table1
INNER JOIN table2 ON table1.id = table2.table1_id
WHERE table1.itemtype = 1 AND table1.busy = 1)
Perhaps you need to use STRAIGHT_JOIN.
http://dev.mysql.com/doc/refman/5.0/en/join.html
STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer puts the tables in the wrong order.
You can use FORCE INDEX to force the execution order, and I've done that before.
If you think about it, there's usually only one order you could query tables in for any index you pick.
In this case, if you want MySQL to start querying a first, make sure the index you force on b is one that contains b.table1_id. MySQL will only be able to use that index if it's already queried a first.
You can try rewriting in two ways
bring some of the WHERE condition into JOIN
introduce subqueries even though they are not necessary
Both things might impact the planner.
First thing to check, though, would be if your stats are up to date.