Help me figure out a MySQL query - mysql

These are tables I have:
Class
- id
- name
Order
- id
- name
- class_id (FK)
Family
- id
- order_id (FK)
- name
Genus
- id
- family_id (FK)
- name
Species
- id
- genus_id (FK)
- name
I'm trying to make a query to get a list of Class, Order, and Family names that does not have any Species under them. You can see that the table has some form of hierarchy from Order all the way down to Species. Each table has Foreign Key (FK) that relates to the immediate table above itself on the hierarchy.
Trying to get this at work, but I am not doing so well.
Any help would be appreciated!

Meta-answer (comment on the two previous answers):
Using IN tends to degrade to something very like an OR (a disjunction) of all terms in the IN. Bad performance.
Doing a left join and looking for null is an improvement, but it's obscurantist. If we can say what we mean, let's say it in a wau that's clossest to how we'd say it in natural language:
select f.name
from family f left join genus g on f.id = g.family_id
WHERE NOT EXISTS (select * from species c where c.id = g.id);
We want where something doesn't exist, so if we can say "where not exists" all the better. And, the select * in the subquery doesn't mean it's really bringing back a whole row, so it's not an "optimization" to replace select * with select 1, at least not on any modern RDBMS.
Further, where a family has many genera (and in biology, most families do), we're going to get one row per (family, genus) when all we care about is the family. So let's get one row per family:
select DISTINCT f.name
from family f left join genus g on f.id = g.family_id
WHERE NOT EXISTS (select * from species c where c.id = g.id);
This is still not optimal. Why? Well it fulfills the OP's requirement, in that it finds "empty" genera, but it fails to find families that have no genera, "empty" families. Can we make it do that too?
select f.name
from family f
WHERE NOT EXISTS (
select * from genus g
join species c on c.id = g.id
where g.id = f.id);
We can even get rid of the distinct, because we're not joining family to anything. And that is an optimization.
Comment from OP:
That was a very lucid explanation. However, I'm curious as to why using IN or disjunctions is bad for performance. Can you elaborate on that or point me to a resource where I can learn more about the relative performance cost of different DB operations?
Think of it this way. Say that there was not IN operator in SQL. How would you fake an IN?
By a series of ORs:
where foo in (1, 2, 3)
is equivalent to
where ( foo = 1 ) or ( foo = 2 ) or (foo = 3 )
Ok, you say, but that still doesn't tell me why it's bad. It's bad because there's often no decent way to use a key or index to look this up. So what you get is either a) a table scan, where for each disjunction (or'd predicate or element of an IN list), the row gets tested, until a test is true or the list is exhausted. Or b) you get a table scan for each of these disjunctions. The second case (b) may actually be better, which is why you sometimes see a select with an OR turned into one select for each leg of the OR union'd together:
select * from table where x = 1 or x = 3 ;
select * from table where x = 1
union select * from table where x = 3 ;
Now this is not to say you can never use an OR or an IN list. And in some cases, the query optimizer is smart enough to turn an IN list into a join -- and the other answers you were given are precisely the cases where that's most likely.
But if we can explicitly turn our query into a join, well, we don't have to wonder if the query optimizer is smart. And in general, joins are what the databse is best at doing.

Well, just giving this a quick and dirty shot, I'd write something like this. I spend most of my time using Firebird so the MySQL syntax may be a little different, but the idea should be clear
select f.name
from family f left join genus g on f.id = g.family_id
left join species s on g.id = species.genus_id
where ( s.id is null )
if you want to enforce there being a genus then you just remove the "left" portion of the join from family to genus.
I hope I'm not misunderstanding the question and thus, leading you down the wrong path. Good luck!
edit: Actually, re-reading this I think this will just catch families where there's no species within a genus. You could add a " and ( g.id is null )" too, I think.

Sub-select to the rescue...
select f.name from family as f, genus as g
where
f.id == g.family_id and
g.id not in (select genus_id from species);

SELECT f.name
FROM family f
WHERE NOT EXISTS (
SELECT 1
FROM genus g
JOIN species s
ON g.id = s.genus_id
WHERE g.family_id = f.id
)
Note, than unlike pure LEFT JOIN solutions, this is more efficient.
It does not select ALL rows filtering out those with NOT NULL values, but instead selects at most one row from genus and species.

Related

Get Rows that are include one many to many. but not another

I am having a little trouble with this query. I want to filter my Features down for all features that have applicabilities that have include the name 'G6', but that also do not have a many to many relationship with applicabilies that have the name 'n2'. I have right now:
SELECT inner.*
FROM
(SELECT feat.*
FROM Features feat
INNER JOIN Feature _has_Applicability feat_app
ON feat_app.feature_id = feat.id
INNER JOIN Applicability app
ON feat_app.applicability_id = app.id
AND app.name like '%G6%'
WHERE feat.deleted_time = '0000-00-00 00:00:00'
GROUP BY feat.id
) AS inner
INNER JOIN Feature_has_Applicability out_feat_app
ON out_feat_app.feature_id = inner.id
INNER JOIN Applicability app
ON out_feat_app.applicability_id = app.id
AND app.name NOT LIKE '%N2%'
GROUP BY inner.id
HAVING count (*) = 1
I have a many to many from Feature to Applicability where
Feature
id INT PRIMARY
deleted_time DATETIME
Applicability
id INT Primary
name VARCHAR(45)
Feature_has_Applicability
feature_id INT
applicability_id INT
Example:
I have feature A with applicabilities named N2 and G6
I have feature B with applicability G6, N7
I have feature C with applicability N2
I want only feature B to be returned as it includes G6 but not N2.
G6 is A and N2 is B in regards to features that have a many to many relationship with them.
This still seems to return features that have an applicability to 'n2'. Can you see what I am doing wrong? Thank you.
Your first sub-query seems fine. Personally, I'm not sure why you still get 'n2' records based on this query alone without seeing the database. Maybe because you use an upper case 'N2' in the query? Operators are case sensitive.
Although I suggest you to use NOT EXISTS. It'd make the code much more understandable with its intention. Try this:
SELECT *
FROM
features feat
INNER JOIN feature_has_applicability feat_app ON feat_app.feature_id = feat.id
INNER JOIN applicability app ON app.id = feat_app.applicability_id
AND app.name LIKE '%G6%'
WHERE
feat.delete_time = '0000-00-00 00:00:00'
AND NOT EXISTS (
SELECT
*
FROM
feature_has_applicability out_feat_app
INNER JOIN applicability out_app ON out_app.id = out_feat_app.applicability_id
AND app.name LIKE '%N2%'
WHERE
out_feat_app.feature_id = feat.id
)
Using NOT EXISTS help to streamline the code. So in this case, the main query is much easier to understand, where we want to find Feature records that have a 'G2' applicability. But then we only want records that do NOT have 'N2' applicability owned by the selected Feature records' ID.
I'm confused with the purpose of GROUP BY and HAVING count(*) = 1. If you group it by its ID and expect each group to only have one record, then doesn't that mean each filtered Feature record only has a single 'G6' Applicability record and thus you don't need to worry about filtering out 'N2' records? Unless there are weird cases like having both 'G6' and 'N2' keywords in the same Applicability record.
Another pointer, for your sub query, you should never use reserved keywords as an identifier. In this case, you called your sub-query as "inner", which is a bad practice and may not run at all in other database engines. Perhaps you can call it as "g2_feature".

MySQL Most efficient way to get all rows with at least one associated row

Given a table members and a table devices, where each member can have 0-many devices, what would be the fastest way to get all members that has at least one device?
select m.*, md.* from members m
left join (
SELECT count(*) as c, memberId from member_devices d GROUP BY d.memberId
) md ON m.memberId = md.memberId
WHERE md.c > 0
This works, but it seems really slow.
select s.* from members m where
EXISTS (
SELECT 1 FROM member_devices md WHERE m.memberId = md.memberId
)
Also works, and might be a little faster (?)
Any one out there with any experience? Thanks!
The second option, that uses EXISTS with a correlated subquery, is surely the fastest option here.
Unlike the other option it does not require aggregation and joining. Aggregation is an expensive operation, that usually does not scale well (when the number of records to process increases, the performance tends to dramatically drop).
Also, you don't actually need to count how many records there are in each group. You just want to know if at least one record is available. That's exactly what EXISTS is here for.
For performance in your query, make sure that you have the following indexes (they are probably already there if you properly implemented the relationship with a foreign key):
members(memberId)
member_devices(memberId)
"INNER JOIN" returns rows when there is a match in both tables. You can do :
SELECT m.*, md.*
FROM members m
INNER JOIN devices md ON m.memberId = md.memberId

Combining LIKE and EXISTS?

Here is the database I'm using: https://drive.google.com/file/d/1ArJekOQpal0JFIr1h3NXYcFVngnCNUxg/view?usp=sharing
Find the papers whose title contain the string 'data' and where at least one author is
from the department with deptnum 100. List the panum and title of these papers. You
must use the EXISTS operator. Ensure your query is case-insensitive.
I'm unsure how to output the total number of papers for each academic.
My attempt at this question:
SELECT panum, title
FROM department NATURAL JOIN paper
WHERE UPPER(title) LIKE ('%data%') AND EXISTS (SELECT deptnum FROM
department WHERE deptnum = 100);
This seems to come up empty. I'm not sure what I'm doing wrong, can LIKE and EXISTS be combined?
Thank you.
Don't use natural join! It is an abomination because it does not make use of explicitly declared foreign key relationships. Explicitly list your join keys, so the queries are more understandable and more maintainable.
That said, your subquery is the problem. I would expect a query more like this:
SELECT p.panum, p.title
FROM paper p
WHERE lower(p.title) LIKE '%data%' AND
EXISTS (SELECT 1
FROM authors
WHERE a.author = p.author AND -- or whatever the column should be
a.deptnum = 100
);
Since they are requiring EXISTS, the operator needs to be applied to author, not department table. The query inside EXISTS needs to be correlated with the query on papers, so there should be no JOIN on the top level:
SELECT p.PANUM, p.TITLE
FROM paper p
WHERE p.Title LIKE ('%data%') AND EXISTS (
SELECT *
FROM author a
JOIN academic ac ON ac.ACNUM=a.ACNUM
WHERE a.PANUM=p.PANUM AND ac.DEPTNUM=100
)
Note that since author table lacks DEPTNUM, you do need a join inside the EXISTS query to bring in a row of academic for its DEPTNUM column.
The phrase UPPER(title) LIKE ('%data%') is never going to find any rows, since an uppercase version of whatever is in title will never contain the lowercase letters data.
select p.TITLE,p.PANUM from PAPER p where TITLE like '%data%'
AND EXISTS(
SELECT * FROM AUTHOR a join ACADEMIC d
on d.ACNUM=a.ACNUM where d.DEPTNUM=100 AND a.PANUM=p.PANUM)

Explain SQL and Query optimization

Explain SQL (in phpmyadmin) of a query that is taking more than 5 seconds is giving me the above. I read that we can study the Explain SQL to optimize a query. Can anyone tell if this Explain SQL telling anything as such?
Thanks guys.
Edit:
The query itself:
SELECT
a.`depart` , a.user,
m.civ, m.prenom, m.nom,
CAST( GROUP_CONCAT( DISTINCT concat( c.id, '~', c.prenom, ' ', c.nom ) ) AS char ) AS coordinateur,
z.dr
FROM `0_activite` AS a
JOIN `0_member` AS m ON a.user = m.id
LEFT JOIN `0_depart` AS d ON ( m.depart = d.depart AND d.rank = 'mod' AND d.user_sec =2 )
LEFT JOIN `0_member` AS c ON d.user_id = c.id
LEFT JOIN `zone_base` AS z ON m.depart = z.deprt_num
GROUP BY a.user
Edit 2:
Structures of the two tables a and d. Top: a and bottom: d
Edit 3:
What I want in this query?
I first want to get the value of 'depart' and 'user' (which is an id) from the table 0_activite. Next, I want to get name of the person (civ, prenom and name) from 0_member whose id I am getting from 0_activite via 'user', by matching 0_activite.user with 0_member.id. Here depart is short of department which is also an id.
So at this point, I have depart, id, civ, nom and prenom of a person from two tables, 0_activite and 0_member.
Next, I want to know which dr is related with this depart, and this I get from zone_base. The value of depart is same in both 0_activite and 0_member.
Then comes the trickier part. A person from 0_member can be associated with multiple departs and this is stored in 0_depart. Also, every user has a level, one of what is 'mod', stands for moderator. Now I want to get all the people who are moderators in the depart from where the first user is, and then get those moderaor's name from 0_member again. I also have a variable user_sec, but this is probably less important in this context, though I cannot overlook it.
This is what makes the query a tricky one. 0_member is storing id, name of users, + one depart, 0_depart is storing all departs of users, one line for each depart, and 0_activite is storing some other stuffs and I want to relate those through userid of 0_activite and the rest.
Hope I have been clear. If I am not, please let me know and I will try again to edit this post.
Many many thanks again.
Aside from the few answers provided by the others here, it might help to better understand the "what do I want" from the query. As you've accepted a rather recent answer from me in another of your questions, you have filters applied by department information.
Your query is doing a LEFT join at the Department table by rank = 'mod' and user_sec = 2. Is your overall intent to show ALL records in the 0_activite table REGARDLESS of a valid join to the 0_Depart table... and if there IS a match to the 0_Depart table, you only care about the 'mod' and 2 values?
If you only care about those people specifically associated with the 0_depart with 'mod' and 2 conditions, I would reverse the query starting with THIS table first, then join to the rest.
Having keys on tables via relationship or criteria is always a performance benefit (vs not having the indexes).
Start your query with whatever would be your smallest set FIRST, then join to other tables.
From clarification in your question... I would start with the inner-most... Who it is and what departments are they associated with... THEN get the moderators (from department where condition)... Then get actual moderator's name info... and finally out to your zone_base for the dr based on the department of the MODERATOR...
select STRAIGHT_JOIN
DeptPerMember.*
Moderator.Civ as ModCiv,
Moderator.Prenom as ModPrenom,
Moderator.Nom as ModNom,
z.dr
from
( select
m.ID,
m.Depart,
m.Civ,
m.Prenom,
m.Nom
from
0_Activite as a
join 0_member m
on a.User = m.ID
join 0_Depart as d
on m.depart = d.depart ) DeptPerMember
join 0_Depart as DeptForMod
on DeptPerMember.Depart = DeptForMod.Depart
and DeptForMod.rank = 'mod'
and DeptForMod.user_sec = 2
join 0_Member as Moderator
on DeptForMod.user_id = Moderator.ID
join zone_base z
on Moderator.depart = z.deprt_num
Notice how I tier'd the query to get each part and joined to the next and next and next. I'm building the chain based on the results of the previous with clear "alias" references for clarification of content. Now, you can get whatever respective elements from any of the levels via their distinct "alias" references...
The output from EXPLAIN is showing us that the first and third tables listed (a & d) are not having any indexes utilised by the database engine in executing this query. The key column is NULL for both - which is a shame since both are 'large' tables (OK, they're not really large, but compared to the rest of the tables they're the big 'uns).
Judging from the query, an index on user on 0_activite and an index on (depart, rank, user_sec) on 0_depart would go some way to improving performance.
you can see that columns key and key_len are null this means its not using any key in the possible_keys column. So table a and d are both scanning all rows. (check larger numbers in rows column. you want this smaller).
To deal with 0_depart:
Make sure you have a key on (d.depart, d.rank,d.user_sec) which are part of the join of 0_depart.
To deal with 0_activite:
I'm not positive but a GROUP column should be indexed too so you need a key on a.user

Will a key in sql still stay a key in a view

Let's say I have a mysql table called FISH with fields A, B and C.
I run SELECT * FROM FISH. This gets me a view with all fields. So, if A was a key in the original table, is it also a key in the view? Meaning, if I have a table FISH2, and I ran
SELECT * FROM (SELECT * FROM FISH) D, (SELECT * FROM FISH2) E WHERE D.A = E.A
Will the relevant fields still be keys?
Now, let's take this 1 step further. If I run
SELECT * FROM (SELECT CONCAT(A,B) AS DUCK, C FROM FISH) D, (SELECT CONCAT(A,B) AS DUCK2, C FROM FISH2) E WHERE D.DUCK = E.DUCK2
If A and B were keys in the original tables, will their concatenation also be a key?
Thanks :)
If A is a key in fish, any projection on fish only, will produce a resultset where A is still unique.
A join between table fish and any table with 1:1 relation (such as fish_type) will produce a result set where A is unique.
A join with another table that has 1:M or M:M relation from fish (such as fish_beits) will NOT produce a result where A is unique, unless you provide a filter predicate on the "other" side (such as bait='Dynamite').
SELECT * FROM (SELECT * FROM FISH) D, (SELECT * FROM FISH2) E WHERE D.A = E.A
...is logically equivalent to the following statement, and most databases (including MySQL) will perform the transformatiion:
select *
from fish
join fish2 on(fish.a = fish2.a)
Whether A is still unique in the resultset depends on the key of fish2 and their relation (see above).
Concatenation does not preserve uniqueness. Consider the following case:
concat("10", "10") => "1010"
concat("101", "0") => "1010"
Therefore, your final query...
SELECT *
FROM (SELECT CONCAT(A,B) AS DUCK, C FROM FISH) D
,(SELECT CONCAT(A,B) AS DUCK2, C FROM FISH2) E
WHERE D.DUCK = E.DUCK2
...won't (necessarily) produce the same result as
select *
from fish
join fish2 on(
fish.a = fish2.a
and fish.b = fish2.b
)
I wrote necessarily because the collisions depend on the actual values. I hunted down a bug about some time ago where the root cause was exactly this. The code had worked for several years before the bug manifested itself.
If by "key" you mean "unique", yes, tuples of a cartesian product over unique values will be unique.
(One can prove it via by reductio ad absurdum.)
For step 1, think of a view as a subquery containing everything in the AS clause when CREATE VIEW was executed.
For example, if view v is created as SELECT a, b, c FROM t, then when you execute...
SELECT * FROM v WHERE a = some_value
...it's conceptually treated as...
SELECT * FROM (SELECT a, b, c FROM t) WHERE a = some_value
Any database with a decent optimizer will notice that column a is passed straight into the results and that that it can take advantage of the indexing in t (if there is any) by moving it into the subquery:
SELECT * FROM (SELECT a, b, c FROM t WHERE a = some_value)
This all happens behind the scenes and is not an optimization you need to do yourself. Obviously, it can't do that for every condition in the WHERE clause, but understanding where you can is part of the art of writing a good optimizer.
For step 2, the concatenated keys will be part of intermediate results, and whether or not the database decides they need indexing is an implementation detail. Also note fche's comment about duplication.
If your database has a query plan explainer, running it and learning to interpret the results will give you a lot of insight about what makes your queries run fast and what slows them down.