Explain SQL and Query optimization

Explain SQL and Query optimization - mysql

Explain SQL (in phpmyadmin) of a query that is taking more than 5 seconds is giving me the above. I read that we can study the Explain SQL to optimize a query. Can anyone tell if this Explain SQL telling anything as such?
Thanks guys.
Edit:
The query itself:
SELECT
a.`depart` , a.user,
m.civ, m.prenom, m.nom,
CAST( GROUP_CONCAT( DISTINCT concat( c.id, '~', c.prenom, ' ', c.nom ) ) AS char ) AS coordinateur,
z.dr
FROM `0_activite` AS a
JOIN `0_member` AS m ON a.user = m.id
LEFT JOIN `0_depart` AS d ON ( m.depart = d.depart AND d.rank = 'mod' AND d.user_sec =2 )
LEFT JOIN `0_member` AS c ON d.user_id = c.id
LEFT JOIN `zone_base` AS z ON m.depart = z.deprt_num
GROUP BY a.user
Edit 2:
Structures of the two tables a and d. Top: a and bottom: d
Edit 3:
What I want in this query?
I first want to get the value of 'depart' and 'user' (which is an id) from the table 0_activite. Next, I want to get name of the person (civ, prenom and name) from 0_member whose id I am getting from 0_activite via 'user', by matching 0_activite.user with 0_member.id. Here depart is short of department which is also an id.
So at this point, I have depart, id, civ, nom and prenom of a person from two tables, 0_activite and 0_member.
Next, I want to know which dr is related with this depart, and this I get from zone_base. The value of depart is same in both 0_activite and 0_member.
Then comes the trickier part. A person from 0_member can be associated with multiple departs and this is stored in 0_depart. Also, every user has a level, one of what is 'mod', stands for moderator. Now I want to get all the people who are moderators in the depart from where the first user is, and then get those moderaor's name from 0_member again. I also have a variable user_sec, but this is probably less important in this context, though I cannot overlook it.
This is what makes the query a tricky one. 0_member is storing id, name of users, + one depart, 0_depart is storing all departs of users, one line for each depart, and 0_activite is storing some other stuffs and I want to relate those through userid of 0_activite and the rest.
Hope I have been clear. If I am not, please let me know and I will try again to edit this post.
Many many thanks again.

Aside from the few answers provided by the others here, it might help to better understand the "what do I want" from the query. As you've accepted a rather recent answer from me in another of your questions, you have filters applied by department information.
Your query is doing a LEFT join at the Department table by rank = 'mod' and user_sec = 2. Is your overall intent to show ALL records in the 0_activite table REGARDLESS of a valid join to the 0_Depart table... and if there IS a match to the 0_Depart table, you only care about the 'mod' and 2 values?
If you only care about those people specifically associated with the 0_depart with 'mod' and 2 conditions, I would reverse the query starting with THIS table first, then join to the rest.
Having keys on tables via relationship or criteria is always a performance benefit (vs not having the indexes).
Start your query with whatever would be your smallest set FIRST, then join to other tables.
From clarification in your question... I would start with the inner-most... Who it is and what departments are they associated with... THEN get the moderators (from department where condition)... Then get actual moderator's name info... and finally out to your zone_base for the dr based on the department of the MODERATOR...
select STRAIGHT_JOIN
DeptPerMember.*
Moderator.Civ as ModCiv,
Moderator.Prenom as ModPrenom,
Moderator.Nom as ModNom,
z.dr
from
( select
m.ID,
m.Depart,
m.Civ,
m.Prenom,
m.Nom
from
0_Activite as a
join 0_member m
on a.User = m.ID
join 0_Depart as d
on m.depart = d.depart ) DeptPerMember
join 0_Depart as DeptForMod
on DeptPerMember.Depart = DeptForMod.Depart
and DeptForMod.rank = 'mod'
and DeptForMod.user_sec = 2
join 0_Member as Moderator
on DeptForMod.user_id = Moderator.ID
join zone_base z
on Moderator.depart = z.deprt_num
Notice how I tier'd the query to get each part and joined to the next and next and next. I'm building the chain based on the results of the previous with clear "alias" references for clarification of content. Now, you can get whatever respective elements from any of the levels via their distinct "alias" references...

The output from EXPLAIN is showing us that the first and third tables listed (a & d) are not having any indexes utilised by the database engine in executing this query. The key column is NULL for both - which is a shame since both are 'large' tables (OK, they're not really large, but compared to the rest of the tables they're the big 'uns).
Judging from the query, an index on user on 0_activite and an index on (depart, rank, user_sec) on 0_depart would go some way to improving performance.

you can see that columns key and key_len are null this means its not using any key in the possible_keys column. So table a and d are both scanning all rows. (check larger numbers in rows column. you want this smaller).
To deal with 0_depart:
Make sure you have a key on (d.depart, d.rank,d.user_sec) which are part of the join of 0_depart.
To deal with 0_activite:
I'm not positive but a GROUP column should be indexed too so you need a key on a.user

Related

Not getting the Join I want

We are doing some pro bone work for a good cause and I'm having a hell of a time with a query. The coding has been done by many volunteers over the years which has an inevitable outcome.
I have two tables, A and B. What I need is a sum of of score_hours on a join between the two where the data is unique for each instance of only A.
Please keep in mind that both tables are quite big (10 to 50k+ each depending on time in the month).
Table A:
id (pk, ai)
uid (int)
scores_date (timestamp (but for some reason only the actual date, not
the time))
score_hours (decimal 3,1)
Table B:
id (pk, ai)
uid (int)
shift_date (timestamp)
There are many records in table B that have the uid we are looking for on several dates (the dates are not unique). Table A has multiple records for uid but on different days. So it could have 1 uid a day, but not 2 instances of 1 uid a day.
There are obviously more selectors for both tables, but they don't match in any way between the tables (although I do need to query them with simple "AND") so this is what I have to work with. I do need to join them because of the rest of the query, but so far I'm not getting the records I need within a decent time.
My attempts were:
This almost made it. But the execution time was disgusting and failed with some simple selectors.
SELECT SUM(score_hours)
FROM A
WHERE
A.uid IN
(SELECT B.uid
FROM B
WHERE B.uid = "1")
This gives the right output but it joins one for every instance of a uid. Normally you can solve that by grouping, but the sum will still count all. So that is not an option:
SELECT SUM(score_hours)
FROM A
LEFT JOIN B ON A.uid = B.uid
WHERE A.uid = "1"
*edit: Not only do I need to JOIN on uid, but there has to be something like this in it:
DISTINCT(date(m.shift_datum)) = DATE(d.dagscores_date)
It is actually a very basic query, except for the fact that a SUM is needed on a record which is not unique in regards to the Left join and that I need to JOIN on two tables at the same time.
If you need more data please tell me so. I can provide all.

You need to remove the duplicates from the table you're joining with, otherwise the cross-product creates multiple rows that get added into the sum.
SELECT SUM(score_hours)
FROM A
JOIN (SELECT DISTINCT uid
FROM B) AS B
ON A.uid = B.uid

Long query times for simple MySQL SELECT with JOIN

SELECT COUNT(*)
FROM song AS s
JOIN user AS u
ON(u.user_id = s.user_id)
WHERE s.is_active = 1 AND s.public = 1
The s.active and s.public are index as well as u.user_id and s.user_id.
song table row count 310k
user table row count 22k
Is there a way to optimize this? We're getting 1 second query times on this.

Ensure that you have a compound "covering" index on song: (user_id, is_active, public). Here, we've named the index covering_index:
SELECT COUNT(s.user_id)
FROM song s FORCE INDEX (covering_index)
JOIN user u
ON u.user_id = s.user_id
WHERE s.is_active = 1 AND s.public = 1
Here, we're ensuring that the JOIN is done with the covering index instead of the primary key, so that the covering index can be used for the WHERE clause as well.
I also changed COUNT(*) to COUNT(s.user_id). Though MySQL should be smart enough to pick the column from the index, I explicitly named the column just in case.
Ensure that you have enough memory configured on the server so that all of your indexes can stay in memory.
If you're still having issues, please post the results of EXPLAIN.

Perhaps write it as a stored procedure or view... You could also try selecting all the IDs first then running the count on the result... if you do it all as one query it may be faster. Generally optimisation is done by using nested selects or making the server do the work so in this context that is all I can think of.
SELECT Count(*) FROM
(SELECT song.user_id FROM
(SELECT * FROM song WHERE song.is_active = 1 AND song.public = 1) as t
JOIN user AS u
ON(t.user_id = u.user_id))
Also be sure you are using the correct kind of join.

Query efficiency (multiple selects)

I have two tables - one called customer_records and another called customer_actions.
customer_records has the following schema:
CustomerID (auto increment, primary key)
CustomerName
...etc...
customer_actions has the following schema:
ActionID (auto increment, primary key)
CustomerID (relates to customer_records)
ActionType
ActionTime (UNIX time stamp that the entry was made)
Note (TEXT type)
Every time a user carries out an action on a customer record, an entry is made in customer_actions, and the user is given the opportunity to enter a note. ActionType can be one of a few values (like 'designatory update' or 'added case info' - can only be one of a list of options).
What I want to be able to do is display a list of records from customer_records where the last ActionType was a certain value.
So far, I've searched the net/SO and come up with this monster:
SELECT * FROM (
SELECT * FROM (
SELECT * FROM `customer_actions` ORDER BY `EntryID` DESC
) list1 GROUP BY `CustomerID`
) list2 WHERE `ActionType`='whatever' LIMIT 0,30
Which is great - it lists each customer ID and their last action. But the query is extremely slow on occasions (note: there are nearly 20,000 records in customer_records). Can anyone offer any tips on how I can sort this monster of a query out or adjust my table to give faster results? I'm using MySQL. Any help is really appreciated, thanks.
Edit: To be clear, I need to see a list of customers who's last action was 'whatever'.

To filter customers by their last action, you could use a correlated sub-query...
SELECT
*
FROM
customer_records
INNER JOIN
customer_actions
ON customer_actions.CustomerID = customer_records.CustomerID
AND customer_actions.ActionDate = (
SELECT
MAX(ActionDate)
FROM
customer_actions AS lookup
WHERE
CustomerID = customer_records.CustomerID
)
WHERE
customer_actions.ActionType = 'Whatever'
You may find it more efficient to avoid the correlated sub-query as follows...
SELECT
*
FROM
customer_records
INNER JOIN
(SELECT CustomerID, MAX(ActionDate) AS ActionDate FROM customer_actions GROUP BY CustomerID) AS last_action
ON customer_records.CustomerID = last_action.CustomerID
INNER JOIN
customer_actions
ON customer_actions.CustomerID = last_action.CustomerID
AND customer_actions.ActionDate = last_action.ActionDate
WHERE
customer_actions.ActionType = 'Whatever'

I'm not sure if I understand the requirements but it looks to me like a JOIN would be enough for that.
SELECT cr.CustomerID, cr.CustomerName, ...
FROM customer_records cr
INNER JOIN customer_actions ca ON ca.CustomerID = cr.CustomerID
WHERE `ActionType` = 'whatever'
ORDER BY
ca.EntryID
Note that 20.000 records should not pose a performance problem

Please note that I've adapted Lieven's answer (I made a separate post as this was too long for a comment). Any credit for the solution itself goes to him, I'm just trying to show you some key points for improving performance.
If speed is a concern then the following should give you some suggestions for improving it:
select top 100 -- Change as required
cr.CustomerID ,
cr.CustomerName,
cr.MoreDetail1,
cr.Etc
from customer_records cr
inner join customer_actions ca
on ca.CustomerID = cr.CustomerID
where ca.ActionType = 'x'
order by cr.CustomerID
A few notes:
In some cases I find left outer joins to be faster then inner joins - It would be worth measuring performance for both for this query
Avoid returning * wherever possible
You don't have to reference 'cr.x' in the initial select but it's a good habit to get into for when you start working on large queries that can have multiple joins in them (this will make a lot of sense once you start doing this
When using joins always join on a primary key

Maybe I'm missing something but what's wrong with a simple join and a where clause?
Select ActionType, ActionTime, Note
FROM Customer_Records CR
INNER JOIN customer_Actions CA
ON CR.CustomerID = CA.CustomerID
Where ActionType = 'added case info'

MySQL JOIN tables with WHERE clause

I need to gather posts from two mysql tables that have different columns and provide a WHERE clause to each set of tables. I appreciate the help, thanks in advance.
This is what I have tried...
SELECT
blabbing.id,
blabbing.mem_id,
blabbing.the_blab,
blabbing.blab_date,
blabbing.blab_type,
blabbing.device,
blabbing.fromid,
team_blabbing.team_id
FROM
blabbing
LEFT OUTER JOIN
team_blabbing
ON team_blabbing.id = blabbing.id
WHERE
team_id IN ($team_array) ||
mem_id='$id' ||
fromid='$logOptions_id'
ORDER BY
blab_date DESC
LIMIT 20
I know that this is messy, but i'll admit, I am no mysql veteran. I'm a beginner at best... Any suggestions?

You could put the where-clauses in subqueries:
select
*
from
(select * from ... where ...) as alias1 -- this is a subquery
left outer join
(select * from ... where ...) as alias2 -- this is also a subquery
on
....
order by
....
Note that you can't use subqueries like this in a view definition.
You could also combine the where-clauses, as in your example. Use table aliases to distinguish between columns of different tables (it's a good idea to use aliases even when you don't have to, just because it makes things easier to read). Example:
select
*
from
<table> as alias1
left outer join
<othertable> as alias2
on
....
where
alias1.id = ... and alias2.id = ... -- aliases distinguish between ids!!
order by
....

Two suggestions for you since a relative newbie in SQL. Use "aliases" for your tables to help reduce SuperLongTableNameReferencesForColumns, and always qualify the column names in a query. It can help your life go easier, and anyone AFTER you to better know which columns come from what table, especially if same column name in different tables. Prevents ambiguity in the query. Your left join, I think, from the sample, may be ambigous, but confirm the join of B.ID to TB.ID? Typically a "Team_ID" would appear once in a teams table, and each blabbing entry could have the "Team_ID" that such posting was from, in addition to its OWN "ID" for the blabbing table's unique key indicator.
SELECT
B.id,
B.mem_id,
B.the_blab,
B.blab_date,
B.blab_type,
B.device,
B.fromid,
TB.team_id
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
WHERE
TB.Team_ID IN ( you can't do a direct $team_array here )
OR B.mem_id = SomeParameter
OR b.FromID = AnotherParameter
ORDER BY
B.blab_date DESC
LIMIT 20
Where you were trying the $team_array, you would have to build out the full list as expected, such as
TB.Team_ID IN ( 1, 4, 18, 23, 58 )
Also, not logical "||" or, but SQL "OR"
EDIT -- per your comment
This could be done in a variety of ways, such as dynamic SQL building and executing, calling multiple times, once for each ID and merging the results, or additionally, by doing a join to yet another temp table that gets cleaned out say... daily.
If you have another table such as "TeamJoins", and it has say... 3 columns: a date, a sessionid and team_id, you could daily purge anything from a day old of queries, and/or keep clearing each time a new query by the same session ID (as it appears coming from PHP). Have two indexes, one on the date (to simplify any daily purging), and second on (sessionID, team_id) for the join.
Then, loop through to do inserts into the "TempJoins" table with the simple elements identified.
THEN, instead of a hard-coded list IN, you could change that part to
...
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
LEFT JOIN TeamJoins TJ
on TB.Team_ID = TJ.Team_ID
WHERE
TB.Team_ID IN NOT NULL
OR B.mem_id ... rest of query

What I ended up doing is;
I added an extra column to my blabbing table called team_id and set it to null as well as another field in my team_blabbing table called mem_id
Then I changed the insert script to also insert a value to the mem_id in team_blabbing.
After doing this I did a simple UNION ALL in the query:
SELECT
*
FROM
blabbing
WHERE
mem_id='$id' OR
fromid='$logOptions_id'
UNION ALL
SELECT
*
FROM
team_blabbing
WHERE
team_id
IN
($team_array)
ORDER BY
blab_date DESC
LIMIT 20
I am open to any thought on what I did. Try not to be too harsh though:) Thanks again for all the info.

In what order are MySQL JOINs evaluated?

I have the following query:
SELECT c.*
FROM companies AS c
JOIN users AS u USING(companyid)
JOIN jobs AS j USING(userid)
JOIN useraccounts AS us USING(userid)
WHERE j.jobid = 123;
I have the following questions:
Is the USING syntax synonymous with ON syntax?
Are these joins evaluated left to right? In other words, does this query say: x = companies JOIN users; y = x JOIN jobs; z = y JOIN useraccounts;
If the answer to question 2 is yes, is it safe to assume that the companies table has companyid, userid and jobid columns?
I don't understand how the WHERE clause can be used to pick rows on the companies table when it is referring to the alias "j"
Any help would be appreciated!

USING (fieldname) is a shorthand way of saying ON table1.fieldname = table2.fieldname.
SQL doesn't define the 'order' in which JOINS are done because it is not the nature of the language. Obviously an order has to be specified in the statement, but an INNER JOIN can be considered commutative: you can list them in any order and you will get the same results.
That said, when constructing a SELECT ... JOIN, particularly one that includes LEFT JOINs, I've found it makes sense to regard the third JOIN as joining the new table to the results of the first JOIN, the fourth JOIN as joining the results of the second JOIN, and so on.
More rarely, the specified order can influence the behaviour of the query optimizer, due to the way it influences the heuristics.
No. The way the query is assembled, it requires that companies and users both have a companyid, jobs has a userid and a jobid and useraccounts has a userid. However, only one of companies or user needs a userid for the JOIN to work.
The WHERE clause is filtering the whole result -- i.e. all JOINed columns -- using a column provided by the jobs table.

I can't answer the bit about the USING syntax. That's weird. I've never seen it before, having always used an ON clause instead.
But what I can tell you is that the order of JOIN operations is determined dynamically by the query optimizer when it constructs its query plan, based on a system of optimization heuristics, some of which are:
Is the JOIN performed on a primary key field? If so, this gets high priority in the query plan.
Is the JOIN performed on a foreign key field? This also gets high priority.
Does an index exist on the joined field? If so, bump the priority.
Is a JOIN operation performed on a field in WHERE clause? Can the WHERE clause expression be evaluated by examining the index (rather than by performing a table scan)? This is a major optimization opportunity, so it gets a major priority bump.
What is the cardinality of the joined column? Columns with high cardinality give the optimizer more opportunities to discriminate against false matches (those that don't satisfy the WHERE clause or the ON clause), so high-cardinality joins are usually processed before low-cardinality joins.
How many actual rows are in the joined table? Joining against a table with only 100 values is going to create less of a data explosion than joining against a table with ten million rows.
Anyhow... the point is... there are a LOT of variables that go into the query execution plan. If you want to see how MySQL optimizes its queries, use the EXPLAIN syntax.
And here's a good article to read:
http://www.informit.com/articles/article.aspx?p=377652
ON EDIT:
To answer your 4th question: You aren't querying the "companies" table. You're querying the joined cross-product of ALL four tables in your FROM and USING clauses.
The "j.jobid" alias is just the fully-qualified name of one of the columns in that joined collection of tables.

In MySQL, it's often interesting to ask the query optimizer what it plans to do, with:
EXPLAIN SELECT [...]
See "7.2.1 Optimizing Queries with EXPLAIN"

Here is a more detailed answer on JOIN precedence. In your case, the JOINs are all commutative. Let's try one where they aren't.
Build schema:
CREATE TABLE users (
name text
);
CREATE TABLE orders (
order_id text,
user_name text
);
CREATE TABLE shipments (
order_id text,
fulfiller text
);
Add data:
INSERT INTO users VALUES ('Bob'), ('Mary');
INSERT INTO orders VALUES ('order1', 'Bob');
INSERT INTO shipments VALUES ('order1', 'Fulfilling Mary');
Run query:
SELECT *
FROM users
LEFT OUTER JOIN orders
ON orders.user_name = users.name
JOIN shipments
ON shipments.order_id = orders.order_id
Result:
Only the Bob row is returned
Analysis:
In this query the LEFT OUTER JOIN was evaluated first and the JOIN was evaluated on the composite result of the LEFT OUTER JOIN.
Second query:
SELECT *
FROM users
LEFT OUTER JOIN (
orders
JOIN shipments
ON shipments.order_id = orders.order_id)
ON orders.user_name = users.name
Result:
One row for Bob (with the fulfillment data) and one row for Mary with NULLs for fulfillment data.
Analysis:
The parenthesis changed the evaluation order.
Further MySQL documentation is at https://dev.mysql.com/doc/refman/5.5/en/nested-join-optimization.html

SEE http://dev.mysql.com/doc/refman/5.0/en/join.html
AND start reading here:
Join Processing Changes in MySQL 5.0.12
Beginning with MySQL 5.0.12, natural joins and joins with USING, including outer join variants, are processed according to the SQL:2003 standard. The goal was to align the syntax and semantics of MySQL with respect to NATURAL JOIN and JOIN ... USING according to SQL:2003. However, these changes in join processing can result in different output columns for some joins. Also, some queries that appeared to work correctly in older versions must be rewritten to comply with the standard.
These changes have five main aspects:
The way that MySQL determines the result columns of NATURAL or USING join operations (and thus the result of the entire FROM clause).
Expansion of SELECT * and SELECT tbl_name.* into a list of selected columns.
Resolution of column names in NATURAL or USING joins.
Transformation of NATURAL or USING joins into JOIN ... ON.
Resolution of column names in the ON condition of a JOIN ... ON.

Im not sure about the ON vs USING part (though this website says they are the same)
As for the ordering question, its entirely implementation (and probably query) specific. MYSQL most likely picks an order when compiling the request. If you do want to enforce a particular order you would have to 'nest' your queries:
SELECT c.*
FROM companies AS c
JOIN (SELECT * FROM users AS u
JOIN (SELECT * FROM jobs AS j USING(userid)
JOIN useraccounts AS us USING(userid)
WHERE j.jobid = 123)
)
as for part 4: the where clause limits what rows from the jobs table are eligible to be JOINed on. So if there are rows which would join due to the matching userids but don't have the correct jobid then they will be omitted.

1) Using is not exactly the same as on, but it is short hand where both tables have a column with the same name you are joining on... see: http://www.java2s.com/Tutorial/MySQL/0100__Table-Join/ThekeywordUSINGcanbeusedasareplacementfortheONkeywordduringthetableJoins.htm
It is more difficult to read in my opinion, so I'd go spelling out the joins.
3) It is not clear from this query, but I would guess it does not.
2) Assuming you are joining through the other tables (not all directly on companyies) the order in this query does matter... see comparisons below:
Origional:
SELECT c.*
FROM companies AS c
JOIN users AS u USING(companyid)
JOIN jobs AS j USING(userid)
JOIN useraccounts AS us USING(userid)
WHERE j.jobid = 123
What I think it is likely suggesting:
SELECT c.*
FROM companies AS c
JOIN users AS u on u.companyid = c.companyid
JOIN jobs AS j on j.userid = u.userid
JOIN useraccounts AS us on us.userid = u.userid
WHERE j.jobid = 123
You could switch you lines joining jobs & usersaccounts here.
What it would look like if everything joined on company:
SELECT c.*
FROM companies AS c
JOIN users AS u on u.companyid = c.companyid
JOIN jobs AS j on j.userid = c.userid
JOIN useraccounts AS us on us.userid = c.userid
WHERE j.jobid = 123
This doesn't really make logical sense... unless each user has their own company.
4.) The magic of sql is that you can only show certain columns but all of them are their for sorting and filtering...
if you returned
SELECT c.*, j.jobid....
you could clearly see what it was filtering on, but the database server doesn't care if you output a row or not for filtering.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008