I am doing a course on Relational Databases, MySQL to be more especific. We need to create some SELECT queries for a project. The project is related to music. It has tables to represent musicians (musician), bands (band) and the musician ability to do a certain task, like singing or playing the guitar (act).
Table musician contains :
id
name
stagename
startyear
Table band contains :
code
name
type ("band" or "solo")
startyear
And finally, table act contains :
band (foreign key to code of "band" table)
musician (foreign key to id of "musician" table)
hability (guitarist, singer, like that... and a foreign key to another table)
earnings
I have doubts in two exercises, the first one asks to select musicians id and stagename who participate with more acts in bands whose type is solo.
My solution for the first one is this:
SELECT ma.id, ma.stagename
FROM musician ma, act d, band ba
WHERE ma.id = d.musician
AND ba.code = d.band
AND ba.type = "solo"
GROUP BY ma.id, ma.stagename
HAVING COUNT(ma.id) = (SELECT COUNT(d2.musician) AS count
FROM act d2, band ba2
WHERE d2.band = ba2.code
AND ba2.type = "solo"
GROUP BY d2.musician
ORDER BY count DESC
LIMIT 1);
The second one is very similar to the last one. We need to select, for every startyear, the id and stagename of a musician who can do more acts, with the corresponding number of acts and the maximum and minimum of his cachet. This is my solution:
SELECT ma.startyear, ma.id, ma.stagename, COUNT(ma.id) AS NumActs, MIN(d.earnings), MAX(d.earnings)
FROM musician ma, act d, band ba
WHERE ma.id = d.musician
AND ba.code = d.band
AND ba.type = "solo"
GROUP BY ma.year, ma.id, ma.stagename
HAVING COUNT(ma.id) = (SELECT COUNT(d2.musician) AS count
FROM act d2, band ba2
WHERE d2.band = ba2.code
AND ba2.type = "solo"
GROUP BY d2.musician
ORDER BY count DESC
LIMIT 1);
The results with my dummy data are perfect but my teacher told us we should avoid using the LIMIT option, but that's the only way we can get the highest number, at least with what we know right now.
I've seen a lot of subqueries after the FROM statement to solve this problem, however, for this project we can't use subqueries inside FROM. Is this really possible without LIMIT ?
Thanks in advance.
It is possible, but much worse than with sub-query in from or limit. So I'd never use it in real life :)
Well, long story short, you can do something like this:
SELECT
m.id
, m.stagename
FROM
musician m
INNER JOIN act a ON (
a.musician = m.id
)
INNER JOIN band b ON (
b.code = a.band
AND b.type = 'solo'
)
GROUP BY
m.id
, m.stagename
HAVING
NOT EXISTS (
SELECT
*
FROM
act a2
INNER JOIN band b2 ON (
b2.code = a2.band
AND b2.type = 'solo'
)
WHERE
a2.musician != a.musician
GROUP BY
a2.musician
HAVING
COUNT(a2.musician) > COUNT(a.musician)
)
;
I think you can understand the idea from the query itself as it's pretty straightforward. However, let me know if you need an explanation.
It is possible that your restriction was slightly different and you were not allowing to use subquery in your main FROM part only.
P.S. I'm also use INNER JOIN ... ON syntax as it is easier to see what are table join conditions and what are where conditions.
P.P.S. It might be mistakes in query as I do not have your data structure so cannot execute the query and check. I only checked if the idea works with my test table.
EDIT I just re-read the question; my initial reading missed that inline views are disallowed.
We can avoid the ORDER BY ... DESC LIMIT 1 construct by making the subquery into an inline view (or, a "derived table" in the MySQL parlance), and using a MAX() aggregate.
As a trivial demonstration, this query:
SELECT b.foo
FROM bar b
ORDER
BY b.foo DESC
LIMIT 1
can be emulated with this query:
SELECT MAX(c.foo) AS foo
FROM (
SELECT b.foo
FROM bar b
) c
An example re-write of the first query in the question
SELECT ma.id
, ma.stagename
FROM musician ma
JOIN act d
ON d.musician = ma.id
JOIN band ba
ON ba.code = d.band
WHERE ba.type = 'solo'
GROUP
BY ma.id
, ma.stagename
HAVING COUNT(ma.id)
= ( SELECT MAX(c.count)
FROM (
SELECT COUNT(d2.musician) AS count
FROM act d2
JOIN band ba2
ON ba2.code = d2.band
WHERE ba2.type = 'solo'
GROUP
BY d2.musician
) c
)
NOTE: this is a demonstration of a rewrite of the query in the question; this makes no guarantee that this query (or the query in the question) are guaranteed to return a result that satisfies any particular specification. And the specification given in the question is not at all clear.
Related
When I am running a query on MySQL database, it is taking around 3 sec. When we execute the performance testing for 50 concurrent users, then the same query is taking 120 sec.
The query joins multiple tables with an order by clause and a limit condition.
We are using RDS instance (16 GB memory, 4 vCPU).
Can any one suggest how to improve the performance in this case?
Query:
SELECT
person0_.person_id AS person_i1_131_,
person0_.uuid AS uuid2_131_,
person0_.gender AS gender3_131_
CASE
WHEN
EXISTS( SELECT * FROM patient p WHERE p.patient_id = person0_.person_id)
THEN 1
ELSE 0
END AS formula1_,
CASE
WHEN person0_1_.patient_id IS NOT NULL THEN 1
WHEN person0_.person_id IS NOT NULL THEN 0
END AS clazz_
FROM
person person0_
LEFT OUTER JOIN
patient person0_1_ ON person0_.person_id = person0_1_.patient_id
INNER JOIN
person_attribute attributes1_ ON person0_.person_id = attributes1_.person_id
CROSS JOIN
person_attribute_type personattr2_
WHERE
attributes1_.person_attribute_type_id = personattr2_.person_attribute_type_id
AND personattr2_.name = 'PersonImageAttribute'
AND (person0_.person_id IN (SELECT
person3_.person_id
FROM
person person3_
INNER JOIN
person_attribute attributes4_ ON person3_.person_id = attributes4_.person_id
CROSS JOIN
person_attribute_type personattr5_
WHERE
attributes4_.person_attribute_type_id = personattr5_.person_attribute_type_id
AND personattr5_.name = 'LocationAttribute'
AND (attributes4_.value IN ('d31fe20e-6736-42ff-a3ed-b3e622e80842'))))
ORDER BY person0_1_.date_changed , person0_1_.patient_id
LIMIT 25
Plan
There appears to be some redundant query components, and what does not appear to be a proper context of CROSSS-JOIN when you have relation on specific patient and/or attribute info.
Your query getting the "clazz_" is based on a patient_id NOT NULL, but then again a person_id not null. Under what condition, would the person_id coming from the person table EVER be null. That sounds like a KEY ID and would NEVER be null, so why test for that. It seems like that is a duplicate field and in-essence is just the condition of a person actually being a patient vs not.
This query SHOULD get the same results otherwise and suggest the following specific indexes are available including
table index
person ( person_id )
person_attribute ( person_id, person_attribute_type_id )
person_attribute_type ( person_attribute_type_id, name )
patient ( patient_id )
select
p1.person_id AS person_i1_131_,
p1.uuid AS uuid2_131_,
p1.gender AS gender3_131_,
CASE WHEN p2.patient_id IS NULL
then 0 else 1 end formula1_,
-- appears to be a redunant result, just trying to qualify
-- some specific column value for later calculations.
CASE WHEN p2.patient_id IS NULL
THEN 0 else 1 end clazz_
from
-- pre-get only those people based on the P4 attribute in question
-- and attribute type of location. Get small list vs everything else
( SELECT distinct
pa.person_id
FROM
person_attribute pa
JOIN person_attribute_type pat
on pa.person_attribute_type_id = pat.person_attribute_type_id
AND pat.name = 'LocationAttribute'
WHERE
pa.value = 'd31fe20e-6736-42ff-a3ed-b3e622e80842' ) PQ
join person p1
on PQ.person_id = p1.person_id
LEFT JOIN patient p2
ON p1.person_id = p2.patient_id
JOIN person_attribute pa1
ON p1.person_id = pa1.person_id
JOIN person_attribute_type pat1
on pa1.person_attribute_type_id = pat1.person_attribute_type_id
AND pat1.name = 'PersonImageAttribute'
order by
p2.date_changed,
p2.patient_id
LIMIT
25
Finally, your query does an order by the date_changed and patient id which is based on the PATIENT table data having been changed. If that table is a left-join, you may have a bunch of PERSON records that are not patients and thus may not get
the expected records you really intent. So, just some personal review of what is presented in the question.
Speeding up the query is the best hope for handling more connections.
A simplification (but no speed difference), since TRUE=1 and FALSE=0:
CASE WHERE (boolean_expression) THEN 1 ELSE 0 END
-->
(boolean_expression)
Index suggestions:
person: INDEX(patient_id, date_changed)
person_attribute: INDEX(person_attribute_type_id, person_id)
person_attribute: INDEX(person_attribute_type_id, value, person_id)
person_attribute_type: INDEX(person_attribute_type_id, name)
If value is of type TEXT, then that cannot be used in an index.
Assuming that person has PRIMARY KEY(person_id) and patient -- patient_id, I have no extra recommendations for them.
The Entity-Attribute-Value schema pattern, which this seems to be, is hard to optimize when there are a large number of rows. Sorry.
The CROSS JOIN seems to be just an INNER JOIN, but with the condition in the WHERE instead of in ON, where it belongs.
person0_1_.patient_id can be NULL because of the LEFT JOIN, but I don't see how person0_.person_id can be NULL. Please check your logic.
I would like to find a way to improve a query but it seems i've done it all. Let me give you some details.
Below is my query :
SELECT
`u`.`id` AS `id`,
`p`.`lastname` AS `lastname`,
`p`.`firstname` AS `firstname`,
COALESCE(`r`.`value`, 0) AS `rvalue`,
SUM(`rat`.`category` = 'A') AS `count_a`,
SUM(`rat`.`category` = 'B') AS `count_b`,
SUM(`rat`.`category` = 'C') AS `count_c`
FROM
`user` `u`
JOIN `user_customer` `uc` ON (`u`.`id` = `uc`.`user_id`)
JOIN `profile` `p` ON (`p`.`id` = `u`.`profile_id`)
JOIN `ad` FORCE INDEX (fk_ad_customer_idx) ON (`uc`.`customer_id` = `ad`.`customer_id`)
JOIN `ac` ON (`ac`.`id` = `ad`.`ac_id`)
JOIN `a` ON (`a`.`id` = `ac`.`a_id`)
JOIN `rat` ON (`rat`.`code` = `a`.`rat_code`)
LEFT JOIN `r` ON (`r`.`id` = `u`.`r_id`)
GROUP BY `u`.`id`
;
Note : Some table and column names are voluntarily hidden.
Now let me give you some volumetric data :
user => 6534 rows
user_customer => 12 923 rows
profile => 6511 rows
ad => 320 868 rows
ac => 4505 rows
a => 536 rows
rat => 6 rows
r => 3400 rows
And finally, my execution plan :
My query does currently run in around 1.3 to 1.7 seconds which is slow enough to annoy users of my application of course ... Also fyi result set is composed of 165 rows.
Is there a way I can improve this ?
Thanks.
EDIT 1 (answer to Rick James below) :
What are the speed and EXPLAIN when you don't use FORCE INDEX?
Surprisingly it gets faster when i don't use FORCE INDEX. To be honest, i don't really remember why i've done that change. I've probably found better results in terms of performance with it during one of my various tries and didn't remove it since.
When i don't use FORCE INDEX, it uses an other index ad_customer_ac_id_blocked_idx(customer_id, ac_id, blocked) and times are around 1.1 sec.
I don't really get it because fk_ad_customer_idx(customer_id) is the same when we talk about index on customer_id.
Get rid of FORCE INDEX. Even if it helped yesterday; it may hurt tomorrow.
Some of these indexes may be beneficial. (It is hard to predict; so simply add them all.)
a: (rat_code, id)
rat: (code, category)
ac: (a_id, id)
ad: (ac_id, customer_id)
ad: (customer_id, ac_id)
uc: (customer_id, user_id)
uc: (user_id, customer_id)
u: (profile_id, r_id, id)
(This assumes that id is the PRIMARY KEY of each table. Note that none have id first.) Most of the above are "covering".
Another approach that sometimes helps: Gather the SUMs before joining to any unnecessary table. But is seems that p is the only table not involved in getting from u (the target of GROUP BY) to r and rat (used in aggregates). It would look something like:
SELECT ..., firstname, lastname
FROM ( everything as above except for `p` ) AS most
JOIN `profile` `p` ON (`p`.`id` = most.`profile_id`)
GROUP BY most.id
This avoids hauling around firstname and lastname while doing most of the joins and the GROUP BY.
When doing JOINs and GROUP BY, be sure to sanity check the aggregates. Your COUNTs and SUMs may be larger than they should be.
First, you don't need to tick.everyTableAndColumn in your queries, nor result columns, aliases, etc. The tick marks are used primarily when you are in conflict with a reserved work so the parser knows you are referring to a specific column... like having a table with a COLUMN named "JOIN", but JOIN is part of SQL command... see the confusion it would cause. Helps clean readability too.
Next, and this is just personal preference and can help you and others following behind you on data and their relationships. I show the join as indented from where it is coming from. As you can see below, I see the chain on how do I get from the User (u alias) to the rat alias table... You get there only by going 5 levels deep, and I put the first table on the left-side of the join (coming from table) then = the table joining TO right-side of join.
Now, that I can see the relationships, I would suggest the following. Make COVERING indexes on your tables that have the criteria, and id/value where appropriate. This way the query gets as best it needs, the data from the index page vs having to go to the raw data. So here are suggestions for indexes.
table index
user_customer ( user_id, customer_id ) -- dont know what your fk_ad_customer_idx parts are)
ad ( customer_id, ac_id )
ac ( id, a_id )
a (id, rat_code )
rat ( code, category )
Reformatted query for readability and seeing relationships between the tables
SELECT
u.id,
p.lastname,
p.firstname,
COALESCE(r.value, 0) AS rvalue,
SUM(rat.category = 'A') AS count_a,
SUM(rat.category = 'B') AS count_b,
SUM(rat.category = 'C') AS count_c
FROM
user u
JOIN user_customer uc
ON u.id = uc.user_id
JOIN ad FORCE INDEX (fk_ad_customer_idx)
ON uc.customer_id = ad.customer_id
JOIN ac
ON ad.ac_id = ac.id
JOIN a
ON ac.a_id = a.id
JOIN rat
ON a.rat_code = rat.code
JOIN profile p
ON u.profile_id = p.id
LEFT JOIN r
ON u.r_id = r.id
GROUP BY
u.id
There's a lot of Q&A out there for how to make MySQL show results for rows that have 0 records, but they all involve 1-2 tables/fields at most.
I'm trying to achieve the same ends, but across 3 fields, and I just can't seem to get it.
Here's what I've hacked together:
SELECT circuit.circuit_name, county.county_name, result.adr_result, count( result.adr_result ) AS num_results
FROM
(
SELECT cases.case_id, cases.county_id, cases.result_id
FROM cases
WHERE cases.status_id <> "2"
) q1
RIGHT JOIN county ON q1.county_id = county.county_id
RIGHT JOIN circuit ON county.circuit_id = circuit.circuit_id
RIGHT JOIN result ON q1.result_id = result.result_id
GROUP BY adr_result, circuit_name, county_name
ORDER BY circuit_name, county_name, adr_result
What I need to see is a list of ALL circuits in the first column, a list of ALL counties per circuit in the second column, a list of ALL possible adr_result entries for each county (they're the same for every county) in the third column, and then the respective count for the circuit/county/result combination-- even if it is 0. I've tried every combination of left, right and inner join (I know inner is definitely not the solution, but I'm frustrated) and just can't see where I'm going wrong.
Any help would be appreciated!
Here is a start. I can't follow your problem statement completely. For instance, what is the purposes of the cases table? None the less, when you say "ALL" records for each of those tables, I interpret it as a Cartesian product - which is implemented through the derived table in the FROM clause (notice the lack of the JOIN in that clause)
SELECT everthingjoin.circuit_name
, everthingjoin.county_name
, everthingjoin.adr_result
, COUNT(result.adr_result) AS num_results
FROM
(SELECT circuit.circuit_name, county.county_name, result.adr_result,
FROM circuit
JOIN county
JOIN result) AS everthingjoin
LEFT JOIN cases
ON cases.status_id <> "2"
AND cases.county_id = everthingjoin.county_id
LEFT JOIN circuit
ON everthingjoin.circuit_id = circuit.circuit_id
LEFT JOIN result
ON cases.result_id = result.result_id
GROUP BY adr_result, circuit_name, county_name
ORDER BY circuit_name, county_name, adr_result
try this, see if it provides some ideas:
SELECT
circuit.circuit_name
, county.county_name
, result.adr_result
, ISNULL(COUNT(result.*)) AS num_results
, COUNT(DISTINCT result.adr_result) AS num_distinct_results
FROM cases
LEFT JOIN county
ON cases.county_id = county.county_id
LEFT JOIN circuit
ON county.circuit_id = circuit.circuit_id
LEFT JOIN result
ON cases.result_id = result.result_id
WHERE cases.status_id <> "2"
GROUP BY
circuit.circuit_name
, county.county_name
, result.adr_result
ORDER BY
circuit_name, county_name, adr_result
I learned the hard way that i shouldn't store serialized data in a table when i need to make it searchable .
So i made 3 tables the base & two 1-n relation tables .
So here is the query i get if i want to select a specific activity .
SELECT
jdc_organizations_activities.id
FROM
jdc_activity_sector ,
jdc_activity_type
INNER JOIN jdc_organizations_activities ON jdc_activity_type.activityId = jdc_organizations_activities.id
AND
jdc_activity_sector.activityId = jdc_organizations_activities.id
WHERE
jdc_activity_sector.activitySector = 5 AND
jdc_activity_type.activityType = 3
Questions :
1- What kind of indexes can i add on a 1-n relation table , i already have a unique combination of (activityId - activitySector) & (activityId - activityType)
2- Is there a better way to write the query to have a better performance ?
Thank you !
I would re-organise the query to avoid the cross product caused by using , notation.
Also, you are effectively only using the sector and type tables as filters. So put activity table first, and then join on your other tables.
Some may suggest that; the first join should ideally be the join which is most likely to restrict your results the most, leaving the minimal amount of work to do in the second join. In reality, the sql engine can actually re-arrange your query when generateing a plan, but it does help to think this way to help you think about the efforts the sql engine are having to go to.
Finally, there are the indexes on each table. I would actually suggest reversing the Indexes...
- ActivitySector THEN ActivityId
- ActivityType THEN ActivityId
This is specifically because the sql engine is manipulating your query. It can take the WHERE clause and say "only include records from the Sector table where ActivitySector = 5", and similarly for the Type table. By having the Sector and Type identifies FIRST in the index, this filtering of the tables can be done much faster, and then the joins will have much less work to do.
SELECT
[activity].id
FROM
jdc_organizations_activities AS [activity]
INNER JOIN
jdc_activity_sector AS [sector]
ON [activity].id = [sector].activityId
INNER JOIN
jdc_activity_type AS [type]
ON [activity].id = [type].activityId
WHERE
[sector].activitySector = 5
AND [type].activityType = 3
Or, because you don't actually use the content of the Activity table...
SELECT
[sector].activityId
FROM
jdc_activity_sector AS [sector]
INNER JOIN
jdc_activity_type AS [type]
ON [sector].activityId = [type].activityId
WHERE
[sector].activitySector = 5
AND [type].activityType = 3
Or...
SELECT
[activity].id
FROM
jdc_organizations_activities AS [activity]
WHERE
EXISTS (SELECT * FROM jdc_activity_sector WHERE activityId = [activity].id AND activitySector = 5)
AND EXISTS (SELECT * FROM jdc_activity_type WHERE activityId = [activity].id AND activityType = 3)
I would advise against mixing old style from table1, table2 and new style from table1 inner join table2 ... in a single query. And you can alias tables using table1 as t1, shortening long table names to an easy to remember mnenomic:
select a.id
from jdc_organizations_activities a
join jdc_activity_sector as
on as.activityId = a.Id
join jdc_activity_type as at
on at.activityId = a.Id
where as.activitySector = 5
and at.activityType = 3
Or even more readable using IN:
select a.id
from jdc_organizations_activities a
where a.id in
(
select activityId
from jdc_activity_sector
where activitySector = 5
)
and a.id in
(
select activityId
from jdc_activity_type
where activityType = 3
)
I feel there is a simple solution to this -- I've looked at other questions on Stack Overflow, but they seem to be inefficient, or perhaps I'm doing them wrong.
Here are simplified versions of tables I'm working with.
CREATE TABLE object_values (
object_value_id INT PRIMARY KEY,
object_id INT,
value FLOAT,
date_time DATETIME,
INDEX (object_id)
);
CREATE TABLE object_relations (
object_group_id INT,
object_id INT,
INDEX (object_group_id, object_id)
);
There is a many-to-one relationship -- many object values per one object (150 object values on average per object)
I want to get the last object value (determined by date_time field) for each object_id based on the object_group_id.
Current Query:
SELECT a.`object_id`, a.`value`
FROM `object_values` AS a
LEFT JOIN `object_relations` AS b
ON ( a.`object_id` = b.`object_id` )
WHERE b.`object_group_id` = 105
ORDER BY a.`date_time` DESC
This pulls back all the records -- if I do a GROUP BY a.object_id then the ORDER BY gets ignored
I have tried a series of variations -- and I apologize as I know this is a common question, but trying the other solutions hasn't quite worked so far. Any help would greatly appreciated.
Some of the solutions came back with results about 70 seconds later -- this is a large system so needs to be a bit faster.
SELECT a.object_id, a.`value`
FROM object_values a JOIN object_relations b
ON a.object_id = b.object_id
JOIN (
SELECT a.object_id, MAX(a.date_time) MaxTime
FROM object_relations b JOIN object_values a
ON a.object_id = b.object_id
WHERE b.`object_group_id` = 105
GROUP BY a.object_id
) g ON a.object_id = g.object_id AND a.date_time = g.MaxTime
WHERE b.`object_group_id` = 105
GROUP BY a.object_id
You can omit the final GROUP BY if there will never be duplicate date_time in a single group.
Try this version without a group by, using a sub-query to pull the max date for each object:
SELECT a.`object_id`, a.`value`
FROM `object_relations` b
JOIN `object_values` a
ON b.`object_id` = a.`object_id`
AND b.`object_group_id` = 105
WHERE a.`date_time` = (select MAX(date_time) from object_values where object_id = a.object_id)
The left join was unneeded since you specified the group ID in the where clause, so I made the group the main table in the join. This could be done several ways. The sub query could also be moved to the join clause if WHERE is just changed to AND.