MySQL Slow Query Optimisation - mysql

I have a database ~800k records showing ticket purchases. All tables are InnoDB. The slow query is:
SELECT e.id AS id, e.name AS name, e.url AS url, p.action AS action, gk.key AS `key`
FROM event AS e
LEFT JOIN participation AS p ON p.event=e.id
LEFT JOIN goldenkey AS gk ON gk.issuedto=p.person
WHERE p.person='139160'
OR p.person IS NULL;
This query is coming from PDO hence quoting of p.person. All columns used in JOINs and WHERE are indexed. p.event is foreign key constrained to e.id and gk.issuedto and p.person are foreign key constrained to an unmentioned table, person.id. All these are INTs. The table e is small - only 10 rows. Table p is ~500,000 rows and gk is empty at this time.
This query runs on a person's details page. We want to get a list of all events, then if there is a participation row their participation and if there is a golden key row then their golden key.
Slow query log gives:
Query_time: 12.391201 Lock_time: 0.000093 Rows_sent: 2 Rows_examined: 466104
EXPLAIN SELECT gives:
+----+-------------+-------+------+---------------+----------+---------+----------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+----------+---------+----------------+------+-------------+
| 1 | SIMPLE | e | ALL | NULL | NULL | NULL | NULL | 10 | |
| 1 | SIMPLE | p | ref | event | event | 4 | msadb.e.id | 727 | Using where |
| 1 | SIMPLE | gk | ref | issuedto | issuedto | 4 | msadb.p.person | 1 | |
+----+-------------+-------+------+---------------+----------+---------+----------------+------+-------------+
This query runs at 7~12 seconds on first run for a given p.person then <0.05s in future. Dropping the OR p.person IS NULL does not improve query time. This query slowed right down when the size of p was increased from ~20k to ~500k (import of old data).
Does anyone have any suggestions on how to improve performance? Remembering overall aim is to retrieve a list of all events, then if there is a participation row their participation and if there is a golden key row then their golden key. If multiple queries will be more efficient I can do that.

If you can do away with p.person IS NULL try the following and see if it helps:
SELECT e.id AS id, e.name AS name, e.url AS url, p.action AS action, gk.key AS `key`
FROM event AS e
LEFT JOIN participation AS p ON (p.event=e.id AND p.person='139160')
LEFT JOIN goldenkey AS gk ON gk.issuedto=p.person

For grins... Add the keyword "STRAIGHT_JOIN" to your select...
SELECT STRAIGHT_JOIN ... rest of query...

I'm not sure how many indexes you have and schema of your table, but try avoid using null values by default, it can slow down your queries dramatically.

If you are doing a lookup for one particular person, which I'm guessing you are since you have the person id filter in there. I would try and reverse the query, so you are first searching though the person table and then making a union to and additional query which gives you all the events.
SELECT
e.id AS id, e.name AS name, e.url AS url,
p.action AS action, gk.key AS `key`
FROM person AS p
JOIN event AS e ON p.event=e.id
LEFT JOIN goldenkey AS gk ON gk.issuedto=p.person
UNION
SELECT
e.id AS id, e.name AS name, e.url AS url,
NULL, NULL
FROM event AS e
This would obviously mean you have a duplicate event in case the first query matches, but thats easily solved by wrapping a select around the whole thing, or maybe by using a variable and selecting the e.id into that in the first query and using that variable in the second query (not sure if this will work though, haven't tested it, cant see why not though).

Related

Exclude based on sub-table's value

Consider the table Audit, and AuditStatus.
Where auditId in AuditStatus is a foreign key, mapping the pk of table Audit.
table Audit
id | auditName |
1 | test |
2 | fooTest |
3 | barTest |
table AuditStatus
id | auditId | status |
11 | 1 | started |
12 | 1 | completed |
13 | 2 | started |
How can I only select the entries of table Audit, which do not have a AuditStatus.status 'completed'.
The result in this case would be:
2 | fooTest |
3 | barTest |
I have updated the question and the result example, to make it more clear. The relation Audit -> AuditStatus is a one to many. And I want to exclude the Audits which have a refrerence to an AuditStatus with status 'complete'
You should post your attempted query into your question, not as comment. Anyway, your query is actually correct but your condition is incorrect. Let's inspect your query:
SELECT *
FROM Audit a
WHERE NOT EXISTS (
SELECT s.auditId
FROM AuditStatus s
WHERE a.id = s.auditId AND s.status != 'completed'
);
You're suppose to find where the status is not complete, which is true in the subquery but the problem here is you're doing a NOT EXISTS which negates the correct result you're getting from the subquery.
This is what your subquery will return:
id
auditId
status
11
1
started
13
2
started
Then when your NOT EXIST negates the auditId being returned, you'll get this result instead:
id
auditName
3
barTest
Which is correct according to the condition; auditId=3 wasn't returned in the subquery. What you need to modify is actually very simple, you just need to make the subquery return status = completed as true then NOT EXISTS will return any Audit.Id that doesn't match with the correlated subquery. Therefore:
SELECT *
FROM Audit a
WHERE NOT EXISTS (
SELECT s.auditId
FROM AuditStatus s
WHERE a.id = s.auditId AND s.status = 'completed'
);
And that's it, you should be getting the result you looking for.
Demo fiddle
maybe use a left join like below which only joins on Audit Status on Fk as well as status constraint
SELECT *
FROM Audit A
LEFT JOIN AuditStatus ATS
ON A.id= ATS.auditId AND ATS.Status ='completed'
WHERE AS.auditId IS NULL

Rows column in Query Plan confusing

I have a MySql query
SELECT TE.company_id,
SUM(TE.debit- TE.credit) As summation
FROM Transactions T JOIN Transaction_E TE2
ON (T.parent_id = TE2.transaction_id)
JOIN Transaction_E TE
ON (TE.transaction_id = T.id AND TE.company_id IS NOT NULL)
JOIN Accounts A
ON (TE2.account_id=A.id AND A.deactivated_timestamp=0)
WHERE (TE.company_id IN (1,2))
AND A.user_id=2341 GROUP BY TE.company_id;
When I explain the query, the plan for it is like (in summary):
| Select type | table | type | rows |
-------------------------------------
| SIMPLE | A | ref | 2 |
| SIMPLE | TE2 | ref | 17 |
| SIMPLE | T | ref | 1 |
| SIMPLE | TE | ref | 1 |
But if I do a count(*) on the same query (instead of SUM(..) ), then it shows that there are ~40k rows for a particular company_id. What I don't understand is why the query plan shows so few rows being scanned while there is at least 40k rows being processed. What does the rows column in the query plan represent? Does it not represent the number of rows that get processed in that table? In that case it should be at most 2*17*1*1 = 34 rows?
The query plan just shows a high level judgement on the expected number of rows required per table to meet the end result.
It is to be used as a tool for judging as to how the optimizer is 'seeing' your query, and to help it a bit, in case query performance is worse or can be improved.
There is always a possibility that the query plan is built based on an earlier snapshot of statistics, and hence should not be taken on face value, especially while dealing with cardinality.
Well, first let's get rid of the computational bug:
SELECT TE.company_id, TE.summation
FROM
( SELECT company_id,
SUM(debit - credit) As summation
FROM Transaction_E
WHERE company_id IN (1,2)
) TE
JOIN Transactions T ON TE.transaction_id = T.id
JOIN Transaction_E TE2 ON T.parent_id = TE2.transaction_id
JOIN Accounts A ON TE2.account_id = A.id
AND A.deactivated_timestamp = 0
WHERE A.user_id = 2341;
Your query is probably summing up the same company multiple times before doing the GROUP BY. My variant avoids that inflation of the aggregate.
I got rid of TE.company_id IS NOT NULL because it was redundant.
See what the EXPLAIN says about this, then let's discuss your question about EXPLAIN further.

MySQL subquery from same table

I have a database with table xxx_facileforms_forms, xxx_facileforms_records and xxx_facileforms_subrecords.
Column headers for xxx_facileforms_subrecords:
id | record | element | title | neame | type | value
As far as filtering records with element = '101' ..query returns proper records, but when i add subquery to filete aditional element = '4871' from same table - 0 records returned.
SELECT
F.id AS form_id,
R.id AS record_id,
PV.value AS prim_val,
COUNT(PV.value) AS count
FROM
xxx_facileforms_forms AS F
INNER JOIN xxx_facileforms_records AS R ON F.id = R.form
INNER JOIN xxx_facileforms_subrecords AS PV ON R.id = PV.record AND PV.element = '101'
WHERE R.id IN (SELECT record FROM xxx_facileforms_records WHERE record = R.id AND element = '4871')
GROUP BY PV.value
Does this looks right?
Thank You!
EDIT
Thank you for support and ideas! Yes, I left lot of un guessing. Sorry. Some input/output table data might help make it more clear.
_facileforms_form:
id | formname
---+---------
1 | myform
_facileforms_records:
id | form | submitted
----+------+--------------------
163 | 1 | 2014-06-12 14:18:00
164 | 1 | 2014-06-12 14:19:00
165 | 1 | 2014-06-12 14:20:00
_facileforms_subrecords:
id | record | element | title | name|type | value
-----+--------+---------+--------+-------------+--------
5821 | 163 | 101 | ticket | radio group | flight
5822 | 163 | 4871 | status | select list | canceled
5823 | 164 | 101 | ticket | radio group | flight
5824 | 165 | 101 | ticket | radio group | flight
5825 | 165 | 4871 | status | select list | canceled
Successful query result:
form_id | record_id | prim_val | count
1 | 163 | flight | 2
So i have to return value data (& sum those records) from those records where _subrecord element - 4871 is present (in this case 163 and 165).
And again Thank You!
Thank You for support and ideas! Yes i left lot of un guessing.. sorry . So may be some input/output table data might help.
_facileforms_form:
headers -> id | formname
1 | myform
_facileforms_records:
headers -> id | form | submitted
163 | 1 | 2014-06-12 14:18:00
164 | 1 | 2014-06-12 14:19:00
165 | 1 | 2014-06-12 14:20:00
_facileforms_subrecords
headers -> id | record | element | title | name | type | value
5821 | 163 | 101 | ticket | radio group| flight
5822 | 163 | 4871 | status | select list | canceled
5823 | 164 | 101 | ticket | radio group | flight
5824 | 165 | 101 | ticket | radio group | flight
5825 | 165 | 4871 | status | select list | canceled
Succesful Query result:
headers -> form_id | record_id | prim_val | count
1 | 163 | flight | 2
So i have to return value data (& sum those records) from those records where _subrecord element - 4871 is present (in this case 163 and 165).
And again Thank You!
No, it doesn't look quite right. There's a predicate "R.id IN (subquery)" but that subquery itself has a reference to R.id; it's a correlated subquery. Looks like something is doubled up there. (We're assuming here that id is a UNIQUE or PRIMARY key in each table.)
The subquery references an identifier element... the only other reference we see to that identifier is from the _subrecords table (we don't see any reference to that column in _records table... if there's no element column in _records, then that's a reference to the element column in PV, and that predicate in the subquery will never be true at the same time the PV.element='101' predicate is true.
Kudos for qualifying the column references with a table alias, that makes the query (and the EXPLAIN output) much easier to read; the reader doesn't need to go digging around in the table definitions to figure out which table does and doesn't contain which columns. But please take that pattern to the next step, and qualify all column references in the query, including column references in the subqueries.
Since the reference to element isn't qualified, we're left to guess whether the _records table contains a column named element.
If the goal is to return only the rows from R with element='4871', we could just do...
WHERE R.element='4871'
But, given that you've gone to the bother of using a subquery, I suspect that's not really what you want.
It's possible you're trying to return all rows from R for a _form, but only for the _form where there's at least one associated _record with element='4871'. We could get that result returned with either an IN (subquery) or an EXISTS (correlated_ subquery) predicate, or an anti-join pattern. I'd give examples of those query patterns; I could take some guesses at the specification, but I would only be guessing at what you actually want to return.
But I'm guessing that's not really what you want. I suspect that _records doesn't actually contain a column named element.
The query is already restricting the rows returned from PV with those that have element='101'.)
This is a case where some example data and the example output would help explain the actual specification; and that would be a basis for developing the required SQL.
FOLLOWUP
I'm just guessing... maybe what you want is something pretty simple. Maybe you want to return rows that have element value of either '101' or '4913'.
The IN comparison operator is a convenient of way of expressing the OR condition, that a column be equal to a value in a list:
SELECT F.id AS form_id
, R.id AS record_id
, PV.value AS prim_val
, COUNT(PV.value) AS count
FROM xxx_facileforms_forms F
JOIN xxx_facileforms_records R
ON R.form = F.id
JOIN xxx_facileforms_subrecords PV
ON PV.record = R.id
AND PV.element IN ('101','4193')
GROUP BY PV.value
NOTE: This query (like the OP query) is using a non-standard MySQL extension to GROUP BY, which allows non-aggregate expressions (e.g. bare columns) to be returned in the SELECT list.
The values returned for the non-aggregate expressions (in this case, F.id and R.id) will be a values from a row included in the "group". But because there can be multiple rows, and different values on those rows, it's not deterministic which of values will be returned. (Other databases would reject this statement, unless we wrapped those columns in an aggregate function, such as MIN() or MAX().)
FOLLOWUP
I noticed that you added information about the question into an answer... this information would better be added to the question as an EDIT, since it's not an answer to the question. I took the liberty of copying that, and reformatting.
The example makes it much more clear what you are trying to accomplish.
I think the easiest to understand is to use EXISTS predicate, to check whether a row meeting some criteria "exists" or not, and exclude rows where such a row does not exist. This will use a correlated subquery of the _subrecords table, to which check for the existence of a matching row:
SELECT f.id AS form_id
, r.id AS record_id
, pv.value AS prim_val
, COUNT(pv.value) AS count
FROM xxx_facileforms_forms f
JOIN xxx_facileforms_records r
ON r.form = f.id
JOIN xxx_facileforms_subrecords pv
ON pv.record = r.id
AND pv.element = '101'
-- only include rows where there's also a related 4193 subrecord
WHERE EXISTS ( SELECT 1
FROM xxx_facileforms_subrecords sx
WHERE sx.element = '4193'
AND sx.record = r.id
)
--
GROUP BY pv.value
(I'm thinking this is where OP was headed with the idea that a subquery was required.)
Given that there's a GROUP BY in the query, we could actually accomplish an equivalent result with a regular join operation, to a second reference to the _subrecords table.
A join operation is often more efficient than using an EXISTS predicate.
(Note that the existing GROUP BY clause will eliminate any "duplicates" that might otherwise be introduced by a JOIN operation, so this will return an equivalent result.)
SELECT f.id AS form_id
, r.id AS record_id
, pv.value AS prim_val
, COUNT(pv.value) AS count
FROM xxx_facileforms_forms f
JOIN xxx_facileforms_records r
ON r.form = f.id
JOIN xxx_facileforms_subrecords pv
ON pv.record = r.id
AND pv.element = '101'
-- only include rows where there's also a related 4193 subrecord
JOIN xxx_facileforms_subrecords sx
ON sx.record = r.id
AND sx.element = '4193'
--
GROUP BY pv.value

is there better way to do these mysql queries?

Currently i have three different tables and three different queries, which are very similiar to each others with almost same joins. I was trying to combine all that three queries with in one query, so far not much success though. I will be very happy if someone has better solution or a direction to point. Thanks.
0.0013
SELECT `ilan_genel`.`id`, `ilan_genel`.`durum`, `ilan_genel`.`kategori`, `ilan_genel`.`tip`, `ilan_genel`.`ozellik`, `ilan_genel`.`m2`, `ilan_genel`.`fiyat`, `ilan_genel`.`baslik`, `ilan_genel`.`ilce`, `ilan_genel`.`mahalle`, `ilan_genel`.`parabirimi`, `kgsim_ilceler`.`isim` as ilce, (
SELECT ilanresimler.resimlink
FROM ilanresimler
WHERE ilanresimler.ilanid = ilan_genel.id LIMIT 1
) AS resim
FROM (`ilan_genel`)
LEFT JOIN `kgsim_ilceler` ON `kgsim_ilceler`.`id` = `ilan_genel`.`ilce`
ORDER BY `id` desc
LIMIT 30
0.0006
SELECT `video`.`id`, `video`.`url`, `ilan_genel`.`ilce`, `ilan_genel`.`tip`, `ilan_genel`.`m2`, `ilan_genel`.`ozellik`, `ilan_genel`.`fiyat`, `ilan_genel`.`parabirimi`, `ilan_genel`.`kullanici`, `ilanresimler`.`resimlink` as resim, `uyeler`.`isim` as isim, `uyeler`.`soyisim` as soyisim, `kgsim_ilceler`.`isim` as ilce
FROM (`video`)
LEFT JOIN `ilan_genel` ON `ilan_genel`.`id` = `video`.`id`
LEFT JOIN `kgsim_ilceler` ON `kgsim_ilceler`.`id` = `ilan_genel`.`ilce`
LEFT JOIN `ilanresimler` ON `ilanresimler`.`id` = `ilan_genel`.`resim`
LEFT JOIN `uyeler` ON `uyeler`.`id` = `ilan_genel`.`kullanici`
ORDER BY `siralama` desc
LIMIT 30
0.0005
SELECT `sanaltur`.`id`, `ilan_genel`.`ilce`, `ilan_genel`.`tip`, `ilan_genel`.`m2`, `ilan_genel`.`ozellik`, `ilan_genel`.`fiyat`, `ilan_genel`.`parabirimi`, `ilan_genel`.`kullanici`, `ilanresimler`.`resimlink` as resim, `uyeler`.`isim` as isim, `uyeler`.`soyisim` as soyisim, `kgsim_ilceler`.`isim` as ilce
FROM (`sanaltur`)
LEFT JOIN `ilan_genel` ON `ilan_genel`.`id` = `sanaltur`.`id`
LEFT JOIN `kgsim_ilceler` ON `kgsim_ilceler`.`id` = `ilan_genel`.`ilce`
LEFT JOIN `ilanresimler` ON `ilanresimler`.`id` = `ilan_genel`.`resim`
LEFT JOIN `uyeler` ON `uyeler`.`id` = `ilan_genel`.`kullanici`
ORDER BY `siralama` desc
LIMIT 30
These are actually three very different queries. I don't think you will be able to usefully combine them. Also, they seem pretty fast to me.
However, if you want to try to optimize each individual query, you can use EXPLAIN SELECT to find out how if each query uses appropriate indexes or not.
For example:
EXPLAIN SELECT *
FROM A
WHERE foo NOT IN (1,4,5,6);
Might yield:
+----+-------------+-------+------+---------------
| id | select_type | table | type | possible_keys
+----+-------------+-------+------+---------------
| 1 | SIMPLE | A | ALL | NULL
+----+-------------+-------+------+---------------
+------+---------+------+------+-------------+
| key | key_len | ref | rows | Extra |
+------+---------+------+------+-------------+
| NULL | NULL | NULL | 2 | Using where |
+------+---------+------+------+-------------+
In this case, the query had no possible_keys and therefore used no (or NULL) key to do the query. It's the key column you'd be interested in.
More information here:
http://dev.mysql.com/doc/refman/5.5/en/explain.html
http://dev.mysql.com/doc/refman/5.5/en/optimization-indexes.html

How to filter duplicates within row using Distinct/group by with JOINS

For simplicity, I will give a quick example of what i am trying to achieve:
Table 1 - Members
ID | Name
--------------------
1 | John
2 | Mike
3 | Sam
Table 1 - Member_Selections
ID | planID
--------------------
1 | 1
1 | 2
1 | 1
2 | 2
2 | 3
3 | 2
3 | 1
Table 3 - Selection_Details
planID | Cost
--------------------
1 | 5
2 | 10
3 | 12
When i run my query, I want to return the sum of the all member selections grouped by member. The issue I face however (e.g. table 2 data) is that some members may have duplicate information within the system by mistake. While we do our best to filter this data up front, sometimes it slips through the cracks so when I make the necessary calls to the system to pull information, I also want to filter this data.
the results SHOULD show:
Results Table
ID | Name | Total_Cost
-----------------------------
1 | John | 15
2 | Mike | 22
3 | Sam | 15
but instead have John as $20 because he has plan ID #1 inserted twice by mistake.
My query is currently:
SELECT
sq.ID, sq.name, SUM(sq.premium) AS total_cost
FROM
(
SELECT
m.id, m.name, g.premium
FROM members m
INNER JOIN member_selections s USING(ID)
INNER JOIN selection_details g USING(planid)
) sq group by sq.agent
Adding DISTINCT s.planID filters the results incorrectly as it will only show a single PlanID 1 sold (even though members 1 and 3 bought it).
Any help is appreciated.
EDIT
There is also another table I forgot to mention which is the agent table (the agent who sold the plans to members).
the final group by statement groups ALL items sold by the agent ID (which turns the final results into a single row).
Perhaps the simplest solution is to put a unique composite key on the member_selections table:
alter table member_selections add unique key ms_key (ID, planID);
which would prevent any records from being added where the unique combo of ID/planID already exist elsewhere in the table. That'd allow only a single (1,1)
comment followup:
just saw your comment about the 'alter ignore...'. That's work fine, but you'd still be left with the bad duplicates in the table. I'd suggest doing the unique key, then manually cleaning up the table. The query I put in the comments should find all the duplicates for you, which you can then weed out by hand. once the table's clean, there'll be no need for the duplicate-handling version of the query.
Use UNIQUE keys to prevent accidental duplicate entries. This will eliminate the problem at the source, instead of when it starts to show symptoms. It also makes later queries easier, because you can count on having a consistent database.
What about:
SELECT
sq.ID, sq.name, SUM(sq.premium) AS total_cost
FROM
(
SELECT
m.id, m.name, g.premium
FROM members m
INNER JOIN
(select distinct ID, PlanID from member_selections) s
USING(ID)
INNER JOIN selection_details g USING(planid)
) sq group by sq.agent
By the way, is there a reason you don't have a primary key on member_selections that will prevent these duplicates from happening in the first place?
You can add a group by clause into the inner query, which groups by all three columns, basically returning only unique rows. (I also changed 'premium' to 'cost' to match your example tables, and dropped the agent part)
SELECT
sq.ID,
sq.name,
SUM(sq.Cost) AS total_cost
FROM
(
SELECT
m.id,
m.name,
g.Cost
FROM
members m
INNER JOIN member_selections s USING(ID)
INNER JOIN selection_details g USING(planid)
GROUP BY
m.ID,
m.NAME,
g.Cost
) sq
group by
sq.ID,
sq.NAME