MySQL Joins where one value can be optional - mysql

Situation
I have a database which heavily makes use of joins due to the various situations in which each entity is used. Here is a simplified diagram:
Goal
I would like to be able to get details of all modules and the "name" fields regardless of whether the "fk_chapter_id" within user_has_module is set or not.
In the case where "user_has_module.fk_chapter_id" is null, the system can return details of the module and then null chapter.
In the case where there is a user_has_module, I would like to get the status
Issue
Whenever I perform SQL statements, I get the results only partially returned. I.E. If I have 4 module records in total, two of which where the user has an entry in "user_has_module" returns the two records in full and then 2 null records for the other modules.
Update based on feedback, almost there
Now, the only problem is I get duplicates. Using some test data
SELECT DISTINCT
chapter_id,
chapter_name,
module_id,
module_name,
(null ) AS user_module_progress,
(SELECT COUNT(fk_chapter_id) FROM module_has_chapter WHERE fk_module_id = m.module_id) AS chapter_count
FROM
module as m
LEFT JOIN
module_has_chapter as mhc ON m.module_id = mhc.fk_module_id
LEFT JOIN
chapter as c ON mhc.fk_chapter_id = c.chapter_id
group by m.module_id
UNION
SELECT DISTINCT
chapter_id,
chapter_name,
module_id,
module_name,
user_module_progress,
(SELECT COUNT(fk_chapter_id) FROM module_has_chapter WHERE fk_module_id = m.module_id) AS chapter_count
FROM
module as m
LEFT JOIN
user_has_module as uhm ON m.module_id = uhm.fk_module_id
LEFT JOIN
user as u ON uhm.fk_user_id = u.user_id
LEFT JOIN
chapter as c ON uhm.fk_latest_chapter_id = c.chapter_id
WHERE u.user_id = 2
group by m.module_id;

I got there in the end but, not particularly happy about it. This works but, it's a bloody mess...Does anyone have a better solution please?
SELECT DISTINCT
(null) AS chapter_id,
(null) AS chapter_name,
module_id,
module_name,
(null ) AS user_module_progress,
(SELECT COUNT(fk_chapter_id) FROM module_has_chapter WHERE fk_module_id = m.module_id) AS chapter_count
FROM
module as m
LEFT JOIN
user_has_module as uhm ON m.module_id = uhm.fk_module_id
WHERE
uhm.fk_user_id IS NULL
UNION ALL
SELECT DISTINCT
chapter_id,
chapter_name,
module_id,
module_name,
user_module_progress,
(SELECT COUNT(fk_chapter_id) FROM module_has_chapter WHERE fk_module_id = m.module_id) AS chapter_count
FROM
module as m
LEFT JOIN
user_has_module as uhm ON m.module_id = uhm.fk_module_id
INNER JOIN
user as u ON uhm.fk_user_id = u.user_id
INNER JOIN
chapter as c ON uhm.fk_latest_chapter_id = c.chapter_id
WHERE
u.user_id = 2;

Related

Extracting latest version of articles from the results of a previous query

I have the following query:
SELECT e_c.*, c.name, j.status, j.version, j.articleId, j.title FROM assetcategory AS c
INNER JOIN assetentries_assetcategories AS e_c
ON c.categoryId = e_c.categoryId AND c.name = 'news'
INNER JOIN assetentry AS e
ON e.entryId = e_c.entryId
INNER JOIN journalarticle AS j
ON j.resourcePrimKey = e.classPK
AND e.classNameId = (SELECT classNameId FROM classname_ WHERE value = 'com.liferay.portlet.journal.model.JournalArticle')
AND j.companyId= e.companyId
WHERE j.status = 0
which returns all the category news in the journalarticles. From the results I need to select the most recent versions for each articleId. For example suppose there is an article with 4 versions, even with different title, it is the same article because it will have the same articleId. So therefore for each unique articleId I need the latest version. How can I do that?
Add a join to a subquery which finds the most recent version for each article:
SELECT e_c.*, c.name, j1.status, j1.version, j1.articleId, j1.title
FROM assetcategory AS c
INNER JOIN assetentries_assetcategories AS e_c
ON c.categoryId = e_c.categoryId AND c.name = 'news'
INNER JOIN assetentry AS e
ON e.entryId = e_c.entryId
INNER JOIN journalarticle AS j1
ON j1.resourcePrimKey = e.classPK AND
e.classNameId = (SELECT classNameId FROM classname_
WHERE value = 'com.liferay.portlet.journal.model.JournalArticle') AND
j.companyId = e.companyId
INNER JOIN
(
SELECT articleId, MAX(version) AS max_version
FROM journalarticle
WHERE status = 0
GROUP BY articleId
) j2
ON j1.articleId = j2.articleId AND j1.version = j2.max_version;
The basic idea behind the join to the subquery aliased as j2 above is that it restricts the result set to only the most recent version of each article. We don't necessarily have to change the rest of the query.

Retrieve list of ids by comparing total -vs- specified status

I have a list of clients. Each client can have several activities (0..*). Each activity contains a status `is_completed` which is a Boolean (True/False).
I need to retrieve the list of clients that have all activities completed:
if a client has all its activities completed, I keep him.
if a client has not all its activities completes, I ignore him.
I wrote an SQL query that does the job but I am not convinced that it is optimized:
SELECT DISTINCT cc.client_id
FROM clients_clientactivity AS cc
LEFT JOIN clients_client AS c ON (c.id = cc.client_id)
WHERE c.client_type_id = 2
AND (
SELECT COUNT(cc1.id) FROM clients_clientactivity AS cc1 WHERE cc1.client_id = cc.client_id
) = (
SELECT COUNT(cc2.id) FROM clients_clientactivity AS cc2 WHERE cc2.is_completed = True AND cc2.client_id = cc.client_id
);
How can I improve it ?
Thank you for your help.
You could use a not in select for the not true
SELECT DISTINCT cc.client_id
FROM clients_clientactivity AS cc
LEFT JOIN clients_client AS c ON (c.id = cc.client_id)
WHERE c.client_type_id = 2
AND cc.client_id NOT IN (
SELECT cc2.client_id
FROM clients_clientactivity AS cc2
WHERE cc2.is_completed != True
)
I would use aggregation and having:
SELECT c.id
FROM clients_clientactivity ca JOIN
clients_client c
ON c.id = ca.client_id
WHERE c.client_type_id = 2
GROUP BY c.id
HAVING COUNT(*) = SUM(ca.iscompleted)
Your WHERE clause converts the LEFT JOIN to an INNER JOIN, so I removed the LEFT JOIN.
Let's simplify even further:
SELECT client_id
FROM clients_clientactivity
WHERE MIN(is_completed) = TRUE
GROUP BY client_id
(TRUE==1, FALSE==0)
Subqueries are often slow. NOT IN ( SELECT ... ) is really bad (unless the optimizer has magically gotten smarter).
You did not explain how client_type_id = 2, but maybe something like:
clients_client
SELECT a.client_id
FROM clients_client AS c
JOIN clients_clientactivity AS a ON (c.id = a.client_id)
WHERE MIN(a.is_completed) = TRUE
AND c.client_type_id = 2
GROUP BY a.client_id
If performance is a problem, then:
c needs INDEX(client_type_id, id)
a needs INDEX(client_id, is_completed)

SQL JOIN Query - linking four tables

I have the SQL to display ALL the activities and relative Admin permissions (if any) for that activity.
Current SQL Code:
SELECT `activities`.*, `admins`.`admin_role_id`
FROM (`activities`)
LEFT JOIN `admins` ON `admins`.`activity_id`=`activities`.`id` AND admins.member_id=27500
WHERE `activities`.`active` = 1
Returning:
id | name | description | active | admin_role_id (or null)
I then need to detect whether they are an active member within that Activity.
I have the following SQL code:
SELECT DISTINCT `products`.`activity_ID` as joinedID
FROM (`transactions_items`)
JOIN `transactions` ON `transactions`.`id` = `transactions_items`.`id`
JOIN `products` ON `products`.`id` = `transactions_items`.`product_id`
JOIN `activities` ON `activities`.`id` = `products`.`activity_ID`
WHERE `transactions`.`member_id` = 27500
AND `activities`.`active` = 1
Is there any way to merge this into one SQL query. I can't figure out how to use the correct JOIN queries, because of the complexity of the JOINs.
Help please, thanks! :)
Try like this
SELECT `activities`.*, `admins`.`admin_role_id`
FROM (`activities`)
LEFT JOIN `admins` ON `admins`.`activity_id`=`activities`.`id` AND admins.member_id=27500
JOIN (`transactions_items`
JOIN `transactions` ON `transactions`.`id` = `transactions_items`.`id`
JOIN `products` ON `products`.`id` = `transactions_items`.`product_id`)
ON `activities`.`id`=`products`.`activity_ID`
WHERE `transactions`.`member_id` = 27500
AND `activities`.`active` = 1
Seems to me that a query like this would be marginally more comprehensible and (I think) adhere more closely to the spec...
SELECT c.*
, d.admin_role_id
FROM activities c
LEFT
JOIN admins d
ON d.activity_id = c.id
AND d.member_id = 27500
LEFT
JOIN products p
ON p.activity_ID = c.id
LEFT
JOIN transactions_items ti
ON ti.product_id = p.id
LEFT
JOIN transactions t
ON t.id = ti.id
AND t.member_id = 27500
WHERE c.active = 1

MySQL / PHP - 2 different arguments for 1 table

I have the following SQL:
$queryString = "
SELECT
iR.lastModified,
d.*,
c2.title as stakeholderTitle,
u.username as authorUsername,
c.title as authorContactName,
GROUP_CONCAT(iR.stakeholderRef) AS participants
FROM
informationRelationships iR,
contacts c2
INNER JOIN
debriefs d ON
d.id = iR.linkId
LEFT JOIN
users u ON
u.id = iR.author
LEFT JOIN
contacts c ON
c.ref = u.contactId
LEFT JOIN
debriefs d2 ON
d2.stakeholder = c2.ref
WHERE
(
iR.clientRef = '$clientRef' OR
iR.contactRef = '$contactRef'
)
AND
iR.projectRef = '$projectRef' AND
iR.type = 'Debrief'
GROUP BY
iR.linkId
ORDER BY
d.dateOfEngagement
";
notice how I require 2 different bits of data for the the contacts table.
So at one point, I need to match
c.ref = u.contactId
This will return one bit of information
but I also need a completely different grouping:
d2.stakeholder = c2.ref
Problem is that the title is the column i'm interested in for both:
c2.title as stakeholderTitle,
...
c.title as authorContactName
How do I go about doing this?
My current try is returning:
Error: Unknown column 'iR.linkId' in 'on clause'
I'm not sure I really understand what is happening here:
how to join two tables on common attributes in mysql and php?
EDIT::::---ANSWERED--zerkms
$queryString = "
SELECT
iR.lastModified,
d.*,
c2.title as stakeholderTitle,
u.username as authorUsername,
c.title as authorContactName,
GROUP_CONCAT(iR.stakeholderRef) AS participants
FROM
informationRelationships iR
INNER JOIN
debriefs d ON
d.id = iR.linkId
INNER JOIN
contacts c2 ON
d.stakeholder = c2.ref
LEFT JOIN
users u ON
u.id = iR.author
LEFT JOIN
contacts c ON
c.ref = u.contactId
WHERE
(
iR.clientRef = '$clientRef' OR
iR.contactRef = '$contactRef'
)
AND
iR.projectRef = '$projectRef' AND
iR.type = 'Debrief'
GROUP BY
iR.linkId
ORDER BY
d.dateOfEngagement
";
By re-ordering my query I have managed to get both columns in... Thanks zerkms!
You cannot mix implicit joins and explicit joins in a single query in mysql.
So
FROM informationRelationships iR,
contacts c2
should be rewritten to
FROM informationRelationships iR
INNER JOIN contacts c2 ON ...
Do not use cartesian product and joins in the same query (not subquery), here, use only joins (CROSS JOIN is the same as cartesian product).

SQL - Multiple many-to-many relations filtering SELECT

These are my tables:
Cadastros (id, nome)
Convenios (id, nome)
Especialidades (id, nome)
Facilidades (id, nome)
And the join tables:
cadastros_convenios
cadastros_especialidades
cadastros_facilidades
The table I'm querying for: Cadastros
I'm using MySQL.
The system will allow the user to select multiple "Convenios", "Especialidades" and "Facilidades". Think of each of these tables as a different type of "tag". The user will be able to select multiple "tags" of each type.
What I want is to select only the results in Cadastros table that are related with ALL the "tags" from the 3 different tables provided. Please note it's not an "OR" relation. It should only return the row from Cadastros if it has a matching link table row for EVERY "tag" provided.
Here is what I have so far:
SELECT Cadastro.*, Convenio.* FROM Cadastros AS Cadastro
INNER JOIN cadastros_convenios AS CadastrosConvenio ON(Cadastro.id = CadastrosConvenio.cadastro_id)
INNER JOIN Convenios AS Convenio ON (CadastrosConvenio.convenio_id = Convenio.id AND Convenio.id IN(2,3))
INNER JOIN cadastros_especialidades AS CadastrosEspecialidade ON (Cadastro.id = CadastrosEspecialidade.cadastro_id)
INNER JOIN Especialidades AS Especialidade ON(CadastrosEspecialidade.especialidade_id = Especialidade.id AND Especialidade.id IN(1))
INNER JOIN cadastros_facilidades AS CadastrosFacilidade ON (Cadastro.id = CadastrosFacilidade.cadastro_id)
INNER JOIN Facilidades AS Facilidade ON(CadastrosFacilidade.facilidade_id = Facilidade.id AND Facilidade.id IN(1,2))
GROUP BY Cadastro.id
HAVING COUNT(*) = 5;
I'm using the HAVING clause to try to filter the results based on the number of times it shows (meaning the number of times it has been successfully "INNER JOINED"). So in every case, the count should be equal to the number of different filters I added. So if I add 3 different "tags", the count should be 3. If I add 5 different tags, the count should be 5 and so on. It works fine for a single relation (a single pair of inner joins). When I add the other 2 relations it starts to lose control.
EDIT
Here is something that I believe is working (thanks #Tomalak for pointing out the solution with sub-queries):
SELECT Cadastro.*, Convenio.*, Especialidade.*, Facilidade.* FROM Cadastros AS Cadastro
INNER JOIN cadastros_convenios AS CadastrosConvenio ON(Cadastro.id = CadastrosConvenio.cadastro_id)
INNER JOIN Convenios AS Convenio ON (CadastrosConvenio.convenio_id = Convenio.id)
INNER JOIN cadastros_especialidades AS CadastrosEspecialidade ON (Cadastro.id = CadastrosEspecialidade.cadastro_id)
INNER JOIN Especialidades AS Especialidade ON(CadastrosEspecialidade.especialidade_id = Especialidade.id)
INNER JOIN cadastros_facilidades AS CadastrosFacilidade ON (Cadastro.id = CadastrosFacilidade.cadastro_id)
INNER JOIN Facilidades AS Facilidade ON(CadastrosFacilidade.facilidade_id = Facilidade.id)
WHERE
(SELECT COUNT(*) FROM cadastros_convenios WHERE cadastro_id = Cadastro.id AND convenio_id IN(1, 2, 3)) = 3
AND
(SELECT COUNT(*) FROM cadastros_especialidades WHERE cadastro_id = Cadastro.id AND especialidade_id IN(3)) = 1
AND
(SELECT COUNT(*) FROM cadastros_facilidades WHERE cadastro_id = Cadastro.id AND facilidade_id IN(2, 3)) = 2
GROUP BY Cadastro.id
But I'm concerned about performance. It looks like these 3 sub-queries in the WHERE clause are gonna be over-executed...
Another solution
It joins subsequent tables only if the previous joins were a success (if no rows match one of the joins, the next joins are gonna be joining an empty result-set) (thanks #DRapp for this one)
SELECT STRAIGHT_JOIN
Cadastro.*
FROM
( SELECT Qualify1.cadastro_id
from
( SELECT cc1.cadastro_id
FROM cadastros_convenios cc1
WHERE cc1.convenio_id IN (1, 2, 3)
GROUP by cc1.cadastro_id
having COUNT(*) = 3 ) Qualify1
JOIN
( SELECT ce1.cadastro_id
FROM cadastros_especialidades ce1
WHERE ce1.especialidade_id IN( 3 )
GROUP by ce1.cadastro_id
having COUNT(*) = 1 ) Qualify2
ON (Qualify1.cadastro_id = Qualify2.cadastro_id)
JOIN
( SELECT cf1.cadastro_id
FROM cadastros_facilidades cf1
WHERE cf1.facilidade_id IN (2, 3)
GROUP BY cf1.cadastro_id
having COUNT(*) = 2 ) Qualify3
ON (Qualify2.cadastro_id = Qualify3.cadastro_id) ) FullSet
JOIN Cadastros AS Cadastro
ON FullSet.cadastro_id = Cadastro.id
INNER JOIN cadastros_convenios AS CC
ON (Cadastro.id = CC.cadastro_id)
INNER JOIN Convenios AS Convenio
ON (CC.convenio_id = Convenio.id)
INNER JOIN cadastros_especialidades AS CE
ON (Cadastro.id = CE.cadastro_id)
INNER JOIN Especialidades AS Especialidade
ON (CE.especialidade_id = Especialidade.id)
INNER JOIN cadastros_facilidades AS CF
ON (Cadastro.id = CF.cadastro_id)
INNER JOIN Facilidades AS Facilidade
ON (CF.facilidade_id = Facilidade.id)
GROUP BY Cadastro.id
Emphasis mine
"It should only return the row from Cadastros if it has a matching row for EVERY "tag" provided."
"where there is a matching row"-problems are easily solved with EXISTS.
EDIT After some clarification, I see that using EXISTS is not enough. Comparing the actual row counts is necessary:
SELECT
*
FROM
Cadastros c
WHERE
(SELECT COUNT(*) FROM cadastros_facilidades WHERE cadastro_id = c.id AND id IN (2,3)) = 2
AND
(SELECT COUNT(*) FROM cadastros_especialidades WHERE cadastro_id = c.id AND id IN (1)) = 1
AND
(SELECT COUNT(*) FROM cadastros_facilidades WHERE cadastro_id = c.id AND id IN (1,2)) = 2
The indexes on the link tables should be (cadastro_id, id) for this query.
Depending on the size of the tables (records), WHERE-based subqueries, running a test on every row CAN SIGNIFICANTLY hit performance. I have restructured it which MIGHT better help, but only you would be able to confirm. The premise here is to have the first table based on getting distinct IDs that meet the criteria, join THAT set to the next qualifier criteria... joined to the FINAL set. Once that has been determined, use THAT to join to your main table and its subsequent links to get the details you are expecting. You also had an overall group by by the ID which will eliminate all other nested entries as found in the support details table.
All that said, lets take a look at this scenario. Start with the table that would be EXPECTED TO HAVE THE LOWEST RESULT SET to join to the next and next. if cadastros_convenios has IDs that match all the criteria include IDs 1-100, great, we know at MOST, we'll have 100 ids.
Now, these 100 entries are immediately JOINED to the 2nd qualifying criteria... of which, say it only matches ever other... for simplicity, we are now matched on 50 of the 100.
Finally, JOIN to the 3rd qualifier based on the 50 that qualified and you get 30 entries. So, within these 3 queries you are now filtered down to 30 entries with all the qualifying criteria handled up front. NOW, join to the Cadastros and then subsequent tables for the details based ONLY on the 30 that qualified.
Since your original query would eventually TRY EVERY "ID" for the criteria, why not pre-qualify it up front with ONE query and get just those that hit, then move on.
SELECT STRAIGHT_JOIN
Cadastro.*,
Convenio.*,
Especialidade.*,
Facilidade.*
FROM
( SELECT Qualify1.cadastro_id
from
( SELECT cc1.cadastro_id
FROM cadastros_convenios cc1
WHERE cc1.convenio_id IN (1, 2, 3)
GROUP by cc1.cadastro_id
having COUNT(*) = 3 ) Qualify1
JOIN
( SELECT ce1.cadastro_id
FROM cadastros_especialidades ce1
WHERE ce1.especialidade_id IN( 3 )
GROUP by ce1.cadastro_id
having COUNT(*) = 1 ) Qualify2
ON Qualify1.cadastro_id = Qualify2.cadastro_id
JOIN
( SELECT cf1.cadastro_id
FROM cadastros_facilidades cf1
WHERE cf1.facilidade_id IN (2, 3)
GROUP BY cf1.cadastro_id
having COUNT(*) = 2 ) Qualify3
ON Qualify2.cadastro_id = Qualify3.cadastro_id ) FullSet
JOIN Cadastros AS Cadastro
ON FullSet.Cadastro_id = Cadastro.Cadastro_id
INNER JOIN cadastros_convenios AS CC
ON Cadastro.id = CC.cadastro_id
INNER JOIN Convenios AS C
ON CC.convenio_id = C.id
INNER JOIN cadastros_especialidades AS CE
ON Cadastro.id = CE.cadastro_id
INNER JOIN Especialidades AS E
ON CE.especialidade_id = E.id
INNER JOIN cadastros_facilidades AS CF
ON Cadastro.id = CF.cadastro_id
INNER JOIN Facilidades AS F
ON CF.facilidade_id = F.id