Get Rows that are include one many to many. but not another

Get Rows that are include one many to many. but not another - mysql

I am having a little trouble with this query. I want to filter my Features down for all features that have applicabilities that have include the name 'G6', but that also do not have a many to many relationship with applicabilies that have the name 'n2'. I have right now:
SELECT inner.*
FROM
(SELECT feat.*
FROM Features feat
INNER JOIN Feature _has_Applicability feat_app
ON feat_app.feature_id = feat.id
INNER JOIN Applicability app
ON feat_app.applicability_id = app.id
AND app.name like '%G6%'
WHERE feat.deleted_time = '0000-00-00 00:00:00'
GROUP BY feat.id
) AS inner
INNER JOIN Feature_has_Applicability out_feat_app
ON out_feat_app.feature_id = inner.id
INNER JOIN Applicability app
ON out_feat_app.applicability_id = app.id
AND app.name NOT LIKE '%N2%'
GROUP BY inner.id
HAVING count (*) = 1
I have a many to many from Feature to Applicability where
Feature
id INT PRIMARY
deleted_time DATETIME
Applicability
id INT Primary
name VARCHAR(45)
Feature_has_Applicability
feature_id INT
applicability_id INT
Example:
I have feature A with applicabilities named N2 and G6
I have feature B with applicability G6, N7
I have feature C with applicability N2
I want only feature B to be returned as it includes G6 but not N2.
G6 is A and N2 is B in regards to features that have a many to many relationship with them.
This still seems to return features that have an applicability to 'n2'. Can you see what I am doing wrong? Thank you.

Your first sub-query seems fine. Personally, I'm not sure why you still get 'n2' records based on this query alone without seeing the database. Maybe because you use an upper case 'N2' in the query? Operators are case sensitive.
Although I suggest you to use NOT EXISTS. It'd make the code much more understandable with its intention. Try this:
SELECT *
FROM
features feat
INNER JOIN feature_has_applicability feat_app ON feat_app.feature_id = feat.id
INNER JOIN applicability app ON app.id = feat_app.applicability_id
AND app.name LIKE '%G6%'
WHERE
feat.delete_time = '0000-00-00 00:00:00'
AND NOT EXISTS (
SELECT
*
FROM
feature_has_applicability out_feat_app
INNER JOIN applicability out_app ON out_app.id = out_feat_app.applicability_id
AND app.name LIKE '%N2%'
WHERE
out_feat_app.feature_id = feat.id
)
Using NOT EXISTS help to streamline the code. So in this case, the main query is much easier to understand, where we want to find Feature records that have a 'G2' applicability. But then we only want records that do NOT have 'N2' applicability owned by the selected Feature records' ID.
I'm confused with the purpose of GROUP BY and HAVING count(*) = 1. If you group it by its ID and expect each group to only have one record, then doesn't that mean each filtered Feature record only has a single 'G6' Applicability record and thus you don't need to worry about filtering out 'N2' records? Unless there are weird cases like having both 'G6' and 'N2' keywords in the same Applicability record.
Another pointer, for your sub query, you should never use reserved keywords as an identifier. In this case, you called your sub-query as "inner", which is a bad practice and may not run at all in other database engines. Perhaps you can call it as "g2_feature".

Related

MYSQL - SubSelect when FK does and doesnt exists

Situation Overview
The current question is a problem about selecting values from two tables table A (material) and table B (MaterialRevision). However, The PK of table A might or Might not exist in Table B. When it doesnt exists, the query described in this question wont return the values of table A, but IT SHOULD. So basically here's whats happening :
The query is only returning values when A.id exists in B.id, when In fact, I need it to return values from A when A.id ALSO dont exist in B.id.
Problem:
Suppose two tables. Table Material and Table Material Revision.
Notice that the PK idMaterial is a FK in MaterialRevision.
Current "Mock" Tables
Query Objective
Obs: remember these two tables are a simplification of the real
tables.
For each Material, print the material variables and the last(MAX) RevisionDate from MaterialRevision. In case theres no RevisionDate, print BLANK ("") for the "last revision date".
What is wrongly happening
For each Material, print the material variables and the last(MAX) RevisionDate from MaterialRevision. In case theres no Revision for the Material, doesnt print the Material (SKIP).
Current Code
SELECT
Material.idMaterial,
Material.nextRevisionDate,
Material.obsolete,
lastRevisionDate
FROM Material,
(SELECT MaterialRevision.idMaterial, max(MaterialRevision.revisionDate) as "revisionDate" from MaterialRevision
GROUP BY MaterialRevision.idMaterial
) AS Revision
WHERE (Material.idMaterial = Revision.idMaterial AND Material.obsolete = 0)
References and Links used to reach the state described in this question
Why is MAX() 100 times slower than ORDER BY ... LIMIT 1?
MySQL get last date records from multiple
MySQL - How to SELECT based on value of another SELECT
MySQL Query Select where id does not exist in the JOIN table
PS I hope this question is correctly understood since it took me a lot of time to build it. I researched a lot in stackoverflow and after
several failed attempts I had no option but to ask for help.

You should use JOIN :
SELECT m.idMaterial, m.nextRevisionDate, mr.revisionDate AS "lastRevisionDate"
FROM Material m
LEFT JOIN MaterialRevision AS mr ON mr.idMaterial = m.idMaterial AND mr.revisionDate = (
SELECT MAX(ch.revisionDate) from MaterialRevision ch
WHERE mr.idMaterial = ch.idMaterial)
WHERE m.obsolete = 0
Here is an explanation of what INNER JOIN, LEFT JOIN and RIGHT JOIN are. (You will love them if you often cross tables in your queries)
As m.obsolete will always be true, I ommited it in the SELECT clause

You should use the left outer join instead of using the cross product.
You're query should be something like this:
SELECT idMaterial, nextRevisionableDate, obsolete,
revisionDate AS lastRevisionDate
FROM Material
LEFT OUTER JOIN MaterialRevision AS mr On
Material.idMaterial = MaterialRevision.id
AND mr.revisionDate = (SELECT MAX(ch.revisionDate) from MaterialRevision ch
WHERE mr.idMaterial = ch.idMaterial)
WHERE obsolete = 0;
Here you can find some documentation about types of join.

Understanding an SQL Alias

I'm new to SQL.
I've read the book called 'Sams Teach Yourself Oracle PL/SQL in 10 minutes'.
I found it very interesting and easy to understand. There was some information on aliases but when I started doing exercises I came across an alias I don't know the purpose of.
Here is the cite http://www.sql-ex.ru/ and here is the database schema http://www.sql-ex.ru/help/select13.php#db_1, just in case. I'm working with the computer firm database i.e database number 1. The task is:
To find the makers producing PCs but not laptops.
Here is one of the solutions:
SELECT DISTINCT maker
FROM Product AS pcproduct
WHERE type = 'PC'
AND NOT EXISTS (SELECT maker
FROM Product
WHERE type = 'laptop'
AND maker = pcproduct.maker
);
The question is: Why do we need to alias product as pc_product and make the comparison 'maker = pc_product.maker' in the subquery?

Because in the inner query there are columns, which are named exactly the same as those in outer query (due to the fact that you use the same table there).
Since outer query columns are available in the inner query, there must be a distinction, which column you want, without alias, you'd write in the inner query maker = maker, which would be always true.

You are accessing the table twice, once in the main query, one in the sub query.
In the main query you say: Look at each record. Dismiss it, if the type doesn't equal 'PC'. Dismiss it, if you find a record in the table for the same maker with type 'laptop'.
In order to ask for the same maker, you must compare the maker of the main query's record with the records of the subquery. Both stem from the same table, so where product.maker = product.maker would be ambiguous. (Or rather the DBMS would assume you are talking about the subquery record, because the expression is inside the subquery. where product.maker = product.maker would hence be true, and you'd end up checking only whether there is at least one laptop in the table, regardless of the maker.)
So when dealing with the same table twice in a query, give at least on of them an alias in order to tell one record from the other.
Anyway for the given query I'd also qualify the other column in the expression for readability:
AND product.maker = pcproduct.maker
or even
FROM Product laptopproduct
WHERE type = 'laptop'
AND laptopproduct.maker = pcproduct.maker
On a sidenote: The query looks for makers that produce PCs, but no laptops. I'd prefer asking for this with aggregation:
select maker
from product
group by maker
having sum(type = 'PC') > 0
and sum(type = 'laptop') = 0;

That query can be understood as :
Gimme the maker's that have products of the 'PC' type, but where a
product of the 'laptop' type doesn't exist for that maker.
Including the tablename or alias name is sometimes needed when the same column names are used in more than 1 table.
So that the optimizer will know from which table the column is used.
It's not some smart AI that could guess that a criteria as
WHERE x = x
would actually mean
WHERE table1.x = table2.x
But more often, shorter alias names are used.
To increase readability and make the SQL more concise.
For example. The following two queries are equivalent.
Without aliases:
SELECT myawesometableone.id, mysecondevenmoreawesometable.id,
mysecondevenmoreawesometable.col1
FROM myawesometableone
JOIN mysecondevenmoreawesometable on mysecondevenmoreawesometable.one_id = myawesometableone.id
With aliases:
SELECT t1.id, t2.id, t2.col1
FROM myawesometableone AS t1
JOIN mysecondevenmoreawesometable AS t2 on t2.one_id = t1.id
Which SQL do you think looks better?
As for why that maker = pc_product.maker is used inside the EXISTS?
That's how the syntax for EXISTS works.
You establish a link between the query in the EXISTS and the outer query.
And in that case, that link is the "maker" column.

This doesn't detract from the other (correct) answers, but an easier to follow example might be:
SELECT DISTINCT pcproduct.maker
FROM Product AS pcproduct
WHERE pcproduct.type = 'PC'
AND NOT EXISTS (SELECT internalproduct.maker
FROM Product AS internalproduct
WHERE internalproduct.type = 'laptop'
AND internalproduct.maker = pcproduct.maker
);

Need to find all client id's the most efficient/fastest way using mySql

I have a bridging table that looks like this
clients_user_groups
id = int
client_id = int
group_id = int
I need to find all client_id's of of clients that belong to the same group as client_id 46l
I can achieve it doing a query as below which produces the correct results
SELECT client_id FROM clients_user_groups WHERE group_id = (SELECT group_id FROM clients_user_groups WHERE client_id = 46);
Basically what I need to find out is if there's a way achieving the same results without using 2 queries or a faster way, or is the method above the best solution

You're using a WHERE-clause subquery which, in MySQL, ends up being reevaluated for every single row in your table. Use a JOIN instead:
SELECT a.client_id
FROM clients_user_groups a
JOIN clients_user_groups b ON b.client_id = 46
AND a.group_id = b.group_id
Since you plan on facilitating clients having more than one group in the future, you might want to add DISTINCT to the SELECT so that multiple of the same client_ids aren't returned when you do switch (as a result of the client being in more than one of client_id 46's groups).
If you haven't done so already, create the following composite index on:
(client_id, group_id)
With client_id at the first position in the index since it most likely offers the best initial selectivity. Also, if you've got a substantial amount of rows in your table, ensure that the index is being utilized with EXPLAIN.

you can try with a self join also
SELECT a.client_id
FROM clients_user_groups a
LEFT JOIN clients_user_groups b on b.client_id=46
Where b.group_id=a.group_id

set #groupID = (SELECT group_id FROM clients_user_groups WHERE client_id = 46);
SELECT client_id FROM clients_user_groups WHERE group_id = #groupID;
You will have a query which gets the group ID and you store it into a variable. After this you select the client_id values where the group_id matches the value stored in your variable. You can speed up this query even more if you define an index for clients_user_groups.group_id.
Note1: I didn't test my code, hopefully there are no typos, but you've got the idea I think.
Note2: This should be done in a single request, because DB requests are very expensive if we look at the needed time.

Based on your comment that each client can only belong to one group, I would suggest a schema change to place the group_id relation into the client table as a field. Typically, one would use the sort of JOIN table you have described to express many-to-many relationships within a relational database (i.e. clients could belong to many groups and groups could have many clients).
In such a scenario, the query would be made without the need for a sub-select like this:
SELECT c.client_id
FROM clients as c
INNER JOIN clients as c2 ON c.group_id = c2.group_id
WHERE c2.client_id = ?

Explain SQL and Query optimization

Explain SQL (in phpmyadmin) of a query that is taking more than 5 seconds is giving me the above. I read that we can study the Explain SQL to optimize a query. Can anyone tell if this Explain SQL telling anything as such?
Thanks guys.
Edit:
The query itself:
SELECT
a.`depart` , a.user,
m.civ, m.prenom, m.nom,
CAST( GROUP_CONCAT( DISTINCT concat( c.id, '~', c.prenom, ' ', c.nom ) ) AS char ) AS coordinateur,
z.dr
FROM `0_activite` AS a
JOIN `0_member` AS m ON a.user = m.id
LEFT JOIN `0_depart` AS d ON ( m.depart = d.depart AND d.rank = 'mod' AND d.user_sec =2 )
LEFT JOIN `0_member` AS c ON d.user_id = c.id
LEFT JOIN `zone_base` AS z ON m.depart = z.deprt_num
GROUP BY a.user
Edit 2:
Structures of the two tables a and d. Top: a and bottom: d
Edit 3:
What I want in this query?
I first want to get the value of 'depart' and 'user' (which is an id) from the table 0_activite. Next, I want to get name of the person (civ, prenom and name) from 0_member whose id I am getting from 0_activite via 'user', by matching 0_activite.user with 0_member.id. Here depart is short of department which is also an id.
So at this point, I have depart, id, civ, nom and prenom of a person from two tables, 0_activite and 0_member.
Next, I want to know which dr is related with this depart, and this I get from zone_base. The value of depart is same in both 0_activite and 0_member.
Then comes the trickier part. A person from 0_member can be associated with multiple departs and this is stored in 0_depart. Also, every user has a level, one of what is 'mod', stands for moderator. Now I want to get all the people who are moderators in the depart from where the first user is, and then get those moderaor's name from 0_member again. I also have a variable user_sec, but this is probably less important in this context, though I cannot overlook it.
This is what makes the query a tricky one. 0_member is storing id, name of users, + one depart, 0_depart is storing all departs of users, one line for each depart, and 0_activite is storing some other stuffs and I want to relate those through userid of 0_activite and the rest.
Hope I have been clear. If I am not, please let me know and I will try again to edit this post.
Many many thanks again.

Aside from the few answers provided by the others here, it might help to better understand the "what do I want" from the query. As you've accepted a rather recent answer from me in another of your questions, you have filters applied by department information.
Your query is doing a LEFT join at the Department table by rank = 'mod' and user_sec = 2. Is your overall intent to show ALL records in the 0_activite table REGARDLESS of a valid join to the 0_Depart table... and if there IS a match to the 0_Depart table, you only care about the 'mod' and 2 values?
If you only care about those people specifically associated with the 0_depart with 'mod' and 2 conditions, I would reverse the query starting with THIS table first, then join to the rest.
Having keys on tables via relationship or criteria is always a performance benefit (vs not having the indexes).
Start your query with whatever would be your smallest set FIRST, then join to other tables.
From clarification in your question... I would start with the inner-most... Who it is and what departments are they associated with... THEN get the moderators (from department where condition)... Then get actual moderator's name info... and finally out to your zone_base for the dr based on the department of the MODERATOR...
select STRAIGHT_JOIN
DeptPerMember.*
Moderator.Civ as ModCiv,
Moderator.Prenom as ModPrenom,
Moderator.Nom as ModNom,
z.dr
from
( select
m.ID,
m.Depart,
m.Civ,
m.Prenom,
m.Nom
from
0_Activite as a
join 0_member m
on a.User = m.ID
join 0_Depart as d
on m.depart = d.depart ) DeptPerMember
join 0_Depart as DeptForMod
on DeptPerMember.Depart = DeptForMod.Depart
and DeptForMod.rank = 'mod'
and DeptForMod.user_sec = 2
join 0_Member as Moderator
on DeptForMod.user_id = Moderator.ID
join zone_base z
on Moderator.depart = z.deprt_num
Notice how I tier'd the query to get each part and joined to the next and next and next. I'm building the chain based on the results of the previous with clear "alias" references for clarification of content. Now, you can get whatever respective elements from any of the levels via their distinct "alias" references...

The output from EXPLAIN is showing us that the first and third tables listed (a & d) are not having any indexes utilised by the database engine in executing this query. The key column is NULL for both - which is a shame since both are 'large' tables (OK, they're not really large, but compared to the rest of the tables they're the big 'uns).
Judging from the query, an index on user on 0_activite and an index on (depart, rank, user_sec) on 0_depart would go some way to improving performance.

you can see that columns key and key_len are null this means its not using any key in the possible_keys column. So table a and d are both scanning all rows. (check larger numbers in rows column. you want this smaller).
To deal with 0_depart:
Make sure you have a key on (d.depart, d.rank,d.user_sec) which are part of the join of 0_depart.
To deal with 0_activite:
I'm not positive but a GROUP column should be indexed too so you need a key on a.user

Multiple data stored into a field for better management?

I have a MySQL table as follow:
id, user, culprit, reason, status, ts_register, ts_update
I was thinking of using the reason as an int field and store just the id of the reason that could be selected by the user and the reason itself could be increased by the admin.
What I meant by increased is that the admin could register new reason, for example currently we have:
Flood, Racism, Hacks, Other
But the admin could add a new reason for instance:
Refund
Now my problem is that I would like to allow my users to select multiple reasons, for example:
The report 01 have the reasons Flood and Hack.
How should I store the reason field so that I could select multiple reasons while maintaining a good table format?
Should I just go ahead and store it as a string and cast it as an INT when I am searching thru it or there are better forms to store it?
UPDATE Based on Jonathan's reply:
SELECT mt.*, group_concat(r.reason separator ', ') AS reason
FROM MainTable AS mt
JOIN MainReason AS mr ON mt.id = mr.maintable_ID
JOIN Reasons AS r ON mr.reason = r.reason_id
GROUP BY mt.id

The normalized solution is to have a second table containing one row for each reason:
CREATE TABLE MainReasons
(
MainTable_ID INTEGER NOT NULL REFERENCES MainTable(ID),
Reason INTEGER NOT NULL REFERENCES Reasons(ID),
PRIMARY KEY(MainTable_ID, Reason)
);
(Assuming your main table is called MainTable and you have a table defining valid reason codes called Reasons.)
From a comment:
[W]ould you be [so] kind [as] to show me an example of selecting something to retrieve a report's reason? I mean if I simple select it SELECT * FROM MainTABLE I would never get any reasons since MainTable doesnt know it right? Because it is only linked to the MainReasons and Reasons table so I would need to do something like SELECT * FROM MainTable LEFT JOIN MainReasons USING (MainTable_ID) or something alike but how would I go about getting all the reasons if multiples?
SELECT mt.*, r.reason
FROM MainTable AS mt
JOIN MainReason AS mr ON mt.id = mr.maintable_ID
JOIN Reasons AS r ON mr.reason = r.reason_id
This will return one row per reason - so it would return multiple rows for a single report (recorded in what I called MainTable). I omitted the reason ID number from the results - you can include it if you wish.
You can add criteria to the query, adding terms to a WHERE clause. If you want to see the reports where a specific reason is specified:
SELECT mt.*
FROM MainTable AS mt
JOIN MainReason AS mr ON mt.id = mr.maintable_ID
JOIN Reasons AS r ON mr.reason = r.reason_id
WHERE r.reason = 'Flood'
(You don't need the reason in the results - you know what it is.)
If you want to see the reports where 'Floods' and 'Hacks' were the reasons given, then you can write:
SELECT mt.*
FROM MainTable AS mt
JOIN (SELECT f.MainTable_ID
FROM (SELECT MainTable_ID
FROM MainReason AS mr1
JOIN Reasons AS r1 ON mr1.reason = r1.reason_ID
WHERE r1.reason = 'Floods'
) AS f ON f.MainTable_ID = mt.MainTable_ID
JOIN (SELECT f.MainTable_ID
FROM (SELECT MainTable_ID
FROM MainReason AS mr2
JOIN Reasons AS r2 ON mr2.reason = r2.reason_ID
WHERE r1.reason = 'Hacks'
) AS h ON h.MainTable_ID = mt.MainTable_ID

To do a one-to-many relationship, I would spin reason off into it's own table, like so:
id, parent_id, reason
parent_id would refer back into your current table's id.
You could store it as an INT, but it would be a continual pain to have to parse it every time you wanted to read the data. This way would just take one more join.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008