Indexing on JOIN and WHERE clauses - mysql

If my query looks like:
SELECT *
FROM member
LEFT JOIN group ON (member.group_id = group.id)
WHERE group.created > #date
ORDER BY member.last_updated
If I create two indexes for:
member.group_id, member.last_updated
group.id, group.created
Is that the best way to optimize the original query? Should I add a new field member.group_created and index that like so:
member.group_created, member.last_updated

SELECT *
FROM member AS m
LEFT JOIN group AS g ON (m.group_id = g.id)
WHERE g.created > #date
ORDER BY m.last_updated
If you don't need all (*) the columns of both tables, don't ask for them; it may impact performance.
Do you really need LEFT? That is, do you want NULL for any rows missing from the 'right' table?
If the Optimizer decides to start with member, it might benefit from INDEX(last_updated). Assuming that id is the PRIMARY KEY ofgroup`, no extra index is needed there.
If it decides to start with group, then INDEX(created) may be useful. Then, m needs INDEX(group_id).
So, add the 3 indexes I suggest, if they don't already exist.
If you have more issues, please provide SHOW CREATE TABLE and EXPLAIN SELECT ...

Dont use where clause on left join tables,instead do something like this.
SELECT *
FROM member
LEFT JOIN group ON (member.group_id = group.id and group.created > #date)
ORDER BY member.last_updated
Also add index (id,created) in group table

Related

MySQL - How do I figure out which columns to index?

How do I figure out which columns to index?
SELECT a.ORD_ID AS Manual_Added_Orders,
a.ORD_poOrdID_List AS Auto_Added_Orders,
a.ORDPOITEM_ModelNumber,
a.ORDPO_Number,
a.ORDPOITEM_ID,
(SELECT sum(ORDPOITEM_Qty) AS ORDPOITEM_Qty
FROM orderpoitems
WHERE ORDPOITEM_ModelNumber = a.ORDPOITEM_ModelNumber
AND ORDPO_Number = 123007)
AS ORDPOITEM_Qty,
a.ORDPO_TrackingNumber,
a.ORDPOITEM_Received,
a.ORDPOITEM_ReceivedQty,
a.ORDPOITEM_ReceivedBy,
b.ORDPO_ID
FROM orderpoitems a
LEFT JOIN orderpo b ON (a.ORDPO_Number = b.ORDPO_Number)
WHERE a.ORDPO_Number = 123007
GROUP BY a.ORDPOITEM_ModelNumber
ORDER BY a.ORD_poOrdID_List, a.ORD_ID
I did the explain that is how I am getting these pictures... I added a few indexes... still not looking good.
Well firstly your query could be simplified to:
SELECT a.ORD_ID AS Manual_Added_Orders,
a.ORD_poOrdID_List AS Auto_Added_Orders,
a.ORDPOITEM_ModelNumber,
a.ORDPO_Number,
a.ORDPOITEM_ID,
SUM(ORDPOITEM_Qty) AS ORDPOITEM_Qty
a.ORDPO_TrackingNumber,
a.ORDPOITEM_Received,
a.ORDPOITEM_ReceivedQty,
a.ORDPOITEM_ReceivedBy,
b.ORDPO_ID
FROM orderpoitems a
LEFT JOIN orderpo b ON (a.ORDPO_Number = b.ORDPO_Number)
WHERE a.ORDPO_Number = 123007
GROUP BY a.ORDPOITEM_ModelNumber
ORDER BY a.ORD_poOrdID_List, a.ORD_ID
Secondly I would start by creating a index on the orderpoitems.ORDPO_Number and orderpo.ORDPO_number
Bit hard to say without the table structures.
Read up on indexes and covering index
From what you have, start with what is in your where clause AND join criteria to another table. Also, include if possible and practical, those columns used in group by / order by as order by is typically a killer when finishing a query.
That said, I would have an index on your OrderPOItems table on
( ordpo_number, orderpoitem_ModelNumber, ord_poordid_list, ord_id )
This way, the FIRST element hits your WHERE clause. Next the column for your data grouping, finally, the columns for your order by. This way, the joins and qualifying components can be "covered" from the index alone without having to go to the raw data pages for the rest of the columns being returned. Hopefully a good jump start specific to your scenario.

An efficient way to write MySql query that has duplicate column names

I have a reasonably straightforward MySQL query as follows:
SELECT *
FROM post INNER JOIN comment ON post.post_id = comment.post_id
WHERE post.host = 99999
ORDER BY post.post_id
My problem is that some of my column names are common to both the post and comment table. In the result, the column name appears only once, and its values are taken from the last table in the query (ie. the comment table).
I realise that I could explicitly list each column name in the SELECT part of the query, and use aliases to differentiate between duplicate column names, but there are a lot of columns in each table and it would be messy.
Is there a more efficient way to go about it?
You should just use aliases for those columns that are alike and then * for the rest:
SELECT post.post_id, comment.post_id as comment_post_id, *
FROM ...
I don't think there is a better approach.
Good luck.
Listing column names is always a better approach as long as it is feasible, it may give some hint to the MySQL optimizer.
Additionally, I would use Using syntax when the Join is done on similar column name.
SELECT x.x1, x.x2 ... , y.x1, y.x2 ...
FROM post as x
INNER JOIN comment as y
USING post_id
WHERE ...
You can select all the column from both the table.Just give an alias to the both table name and the use alias.* after select statement.
Then it will select all the column from the table.
Example:
SELECT pst.*,cmt.* from post pst INNER JOIN comment cmt
FROM ...

Long query times for simple MySQL SELECT with JOIN

SELECT COUNT(*)
FROM song AS s
JOIN user AS u
ON(u.user_id = s.user_id)
WHERE s.is_active = 1 AND s.public = 1
The s.active and s.public are index as well as u.user_id and s.user_id.
song table row count 310k
user table row count 22k
Is there a way to optimize this? We're getting 1 second query times on this.
Ensure that you have a compound "covering" index on song: (user_id, is_active, public). Here, we've named the index covering_index:
SELECT COUNT(s.user_id)
FROM song s FORCE INDEX (covering_index)
JOIN user u
ON u.user_id = s.user_id
WHERE s.is_active = 1 AND s.public = 1
Here, we're ensuring that the JOIN is done with the covering index instead of the primary key, so that the covering index can be used for the WHERE clause as well.
I also changed COUNT(*) to COUNT(s.user_id). Though MySQL should be smart enough to pick the column from the index, I explicitly named the column just in case.
Ensure that you have enough memory configured on the server so that all of your indexes can stay in memory.
If you're still having issues, please post the results of EXPLAIN.
Perhaps write it as a stored procedure or view... You could also try selecting all the IDs first then running the count on the result... if you do it all as one query it may be faster. Generally optimisation is done by using nested selects or making the server do the work so in this context that is all I can think of.
SELECT Count(*) FROM
(SELECT song.user_id FROM
(SELECT * FROM song WHERE song.is_active = 1 AND song.public = 1) as t
JOIN user AS u
ON(t.user_id = u.user_id))
Also be sure you are using the correct kind of join.

MySQL JOIN tables with WHERE clause

I need to gather posts from two mysql tables that have different columns and provide a WHERE clause to each set of tables. I appreciate the help, thanks in advance.
This is what I have tried...
SELECT
blabbing.id,
blabbing.mem_id,
blabbing.the_blab,
blabbing.blab_date,
blabbing.blab_type,
blabbing.device,
blabbing.fromid,
team_blabbing.team_id
FROM
blabbing
LEFT OUTER JOIN
team_blabbing
ON team_blabbing.id = blabbing.id
WHERE
team_id IN ($team_array) ||
mem_id='$id' ||
fromid='$logOptions_id'
ORDER BY
blab_date DESC
LIMIT 20
I know that this is messy, but i'll admit, I am no mysql veteran. I'm a beginner at best... Any suggestions?
You could put the where-clauses in subqueries:
select
*
from
(select * from ... where ...) as alias1 -- this is a subquery
left outer join
(select * from ... where ...) as alias2 -- this is also a subquery
on
....
order by
....
Note that you can't use subqueries like this in a view definition.
You could also combine the where-clauses, as in your example. Use table aliases to distinguish between columns of different tables (it's a good idea to use aliases even when you don't have to, just because it makes things easier to read). Example:
select
*
from
<table> as alias1
left outer join
<othertable> as alias2
on
....
where
alias1.id = ... and alias2.id = ... -- aliases distinguish between ids!!
order by
....
Two suggestions for you since a relative newbie in SQL. Use "aliases" for your tables to help reduce SuperLongTableNameReferencesForColumns, and always qualify the column names in a query. It can help your life go easier, and anyone AFTER you to better know which columns come from what table, especially if same column name in different tables. Prevents ambiguity in the query. Your left join, I think, from the sample, may be ambigous, but confirm the join of B.ID to TB.ID? Typically a "Team_ID" would appear once in a teams table, and each blabbing entry could have the "Team_ID" that such posting was from, in addition to its OWN "ID" for the blabbing table's unique key indicator.
SELECT
B.id,
B.mem_id,
B.the_blab,
B.blab_date,
B.blab_type,
B.device,
B.fromid,
TB.team_id
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
WHERE
TB.Team_ID IN ( you can't do a direct $team_array here )
OR B.mem_id = SomeParameter
OR b.FromID = AnotherParameter
ORDER BY
B.blab_date DESC
LIMIT 20
Where you were trying the $team_array, you would have to build out the full list as expected, such as
TB.Team_ID IN ( 1, 4, 18, 23, 58 )
Also, not logical "||" or, but SQL "OR"
EDIT -- per your comment
This could be done in a variety of ways, such as dynamic SQL building and executing, calling multiple times, once for each ID and merging the results, or additionally, by doing a join to yet another temp table that gets cleaned out say... daily.
If you have another table such as "TeamJoins", and it has say... 3 columns: a date, a sessionid and team_id, you could daily purge anything from a day old of queries, and/or keep clearing each time a new query by the same session ID (as it appears coming from PHP). Have two indexes, one on the date (to simplify any daily purging), and second on (sessionID, team_id) for the join.
Then, loop through to do inserts into the "TempJoins" table with the simple elements identified.
THEN, instead of a hard-coded list IN, you could change that part to
...
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
LEFT JOIN TeamJoins TJ
on TB.Team_ID = TJ.Team_ID
WHERE
TB.Team_ID IN NOT NULL
OR B.mem_id ... rest of query
What I ended up doing is;
I added an extra column to my blabbing table called team_id and set it to null as well as another field in my team_blabbing table called mem_id
Then I changed the insert script to also insert a value to the mem_id in team_blabbing.
After doing this I did a simple UNION ALL in the query:
SELECT
*
FROM
blabbing
WHERE
mem_id='$id' OR
fromid='$logOptions_id'
UNION ALL
SELECT
*
FROM
team_blabbing
WHERE
team_id
IN
($team_array)
ORDER BY
blab_date DESC
LIMIT 20
I am open to any thought on what I did. Try not to be too harsh though:) Thanks again for all the info.

Any way to improve this slow query?

Its particular query pops up in the slow query log all the time for me. Any way to improve its efficiency?
SELECT
mov_id,
mov_title,
GROUP_CONCAT(DISTINCT genres.genre_name) as all_genres,
mov_desc,
mov_added,
mov_thumb,
mov_hits,
mov_numvotes,
mov_totalvote,
mov_imdb,
mov_release,
mov_type
FROM movies
LEFT JOIN _genres
ON movies.mov_id = _genres.gen_movieid
LEFT JOIN genres
ON _genres.gen_catid = genres.grenre_id
WHERE mov_status = 1 AND mov_incomplete = 0 AND mov_type = 1
GROUP BY mov_id
ORDER BY mov_added DESC
LIMIT 0, 20;
My main concern is in regard to the group_concat function, which outputs a comma separated list of genres associated with the particular film, which I put through a for loop and make click-able links.
Do you need the genre names? If you can do with just the genre_id, you can eliminate the second join. (You can fill in the genre name later, in the UI, using a cache).
What indexes do you have?
You probably want
create index idx_movies on movies
(mov_added, mov_type, mov_status, mov_incomplete)
and most certainly the join index
create index ind_genres_movies on _genres
(gen_mov_id, gen_cat_id)
Can you post the output of EXPLAIN? i.e. put EXPLAIN in front of the SELECT and post the results.
I've had quite a few wins with using SELECT STRAIGHT JOIN and ordering the tables according to their size.
STRAIGHT JOIN stops mysql guess which order to join tables and does it in the order specified so if you use your smallest table first you can reduce the amount of rows being joined.
I'm assuming you have indexes on mov_id, gen_movieid, gen_catid and grenre_id?
The 'using temporary, using filesort' is from the group concat distinct genres.genre_name.
Trying to get a distinct on a column without an index will cause a temp table to be used.
Try adding an index on genre_name column.