What is the correct way to join two complex mysql select statements? - mysql

I am able to pull out two lists from my tables. One shows all the units for each student in each cohort. Another shows if all the parts of each unit have been submitted, for a particular student's work in a specific cohort. I want to join the lists, so that I can see who has submitted (or not) each part of each unit, for each student in each cohort.
cohort_units:
cohort_id unit part
235 ABC A
235 ABC B
246 DEF A
246 DEF B
246 DEF C
cohort_students:
user_id cohort_id
21 235
24 235
43 235
53 246
assignments:
user_id cohort_id unit draft1recdt
21 235 ABCA 2023-01-03
21 235 ABCB NULL
24 235 ABCA 2023-02-01
24 235 ABCB 2023-02-02
This pulls a list of units with the user id and cohort id.
SELECT cohort_students.user_id,
cohort_units.unit,
cohort_units.cohort_id
FROM cohort_units
LEFT JOIN cohort_students
ON cohort_units.cohort_id = cohort_students.cohort_id
GROUP BY cohort_units.unit,cohort_students.user_id
ORDER BY cohort_students.user_id;
result:
user_id unit cohort_id
21 ABC 235
24 ABC 235
43 ABC 235
53 DEF 236
This returns a row IF there are more parts to an assignment than parts that have been submitted, for each unit that each student in each cohort should have completed given the cohort id, user id and unit name.
SELECT GROUP_CONCAT(CASE WHEN draft1recdt IS NOT NULL THEN draft1recdt END) AS drafts,
(LENGTH(GROUP_CONCAT(DISTINCT draft1recdt))-LENGTH(REPLACE(GROUP_CONCAT(DISTINCT draft1recdt), ',', '')))+1 as numDrafts,
cohort_units.unit,
GROUP_CONCAT(cohort_units.part) as parts,
(LENGTH(GROUP_CONCAT(DISTINCT cohort_units.part))-LENGTH(REPLACE(GROUP_CONCAT(DISTINCT cohort_units.part), ',', '')))+1 as numParts
FROM assignments
LEFT JOIN cohort_units
ON assignments.cohort_id = cohort_units.cohort_id
AND assignments.unit = CONCAT(cohort_units.unit,cohort_units.part)
WHERE assignments.cohort_id = 235
AND cohort_units.unit = 'ABC' AND assignments.user_id = 21
GROUP BY cohort_units.unit
HAVING numParts > numDrafts;
How do I make the second select statement part of the first, using the three columns on the first select statement as the joining information?
I want to run the second query on every result from the first query. Using the data above, I would expect to pull out user id 21 as they have only submitted one part of a two part unit.
user_id unit cohort_id parts numParts numDrafts
21 ABC 235 A,B 2 1
Is this a JOIN? Or a SUBQUERY?

(For what it's worth, I believe cohort is an accepted term in various social science disciplines.)
Your problem would be made easier if your assignments table had a part column matching the part table in your cohort_units table. So let's start with a subquery to generate a virtual table with that column present.
SELECT assignments.user_id, assignments.cohort_id,
cohort_units.unit, cohort_units.part,
assignments.draft1recdt
FROM assignments
JOIN cohort_units
ON assignments.unit = CONCAT(cohort_units.unit, cohort_units.part)
We'll use this subquery in place of assignments moving forward. This is a bit of a kludge, but it will make later work cleaner.
Next we need the number of parts in each unit. That's a simple aggregate:
SELECT COUNT(*) num_parts,
cohort_id,
unit
FROM cohort_units
GROUP BY cohort_id, unit
We can organize our query using common table expressions, like so.
WITH completed AS (
SELECT assignments.user_id, assignments.cohort_id,
cohort_units.unit, cohort_units.part,
assignments.draft1recdt
FROM assignments
JOIN cohort_units
ON assignments.unit = CONCAT(cohort_units.unit, cohort_units.part)
),
partcount AS (
SELECT COUNT(*) num_parts,
cohort_id,
unit
FROM cohort_units
GROUP BY cohort_id, unit
)
SELECT completed.user_id, cohort_units.cohort_id, cohort_units.unit,
GROUP_CONCAT(completed.part) parts,
COUNT(*) completed_parts,
partcount.num_parts
FROM cohort_units
JOIN partcount
ON cohort_units.cohort_id = partcount.cohort_id
AND cohort_units.unit = partcount.unit
JOIN completed
ON completed.cohort_id = cohort_units.cohort_id
AND completed.unit = cohort_units.unit
AND completed.part = cohort_units.part
GROUP BY completed.user_id, cohort_units.cohort_id, cohort_units.unit, num_parts
HAVING COUNT(*) < partcount.num_parts
Here's a fiddle. https://dbfiddle.uk/FvGkiAnl
One of the tricks to this is the separate aggregate to get the part counts.

Related

Using a foreach/while/loop with sql query

I'm using the following query to create a view. It's currently only grabbing data from two different tables, subscriptions and subscriptionitems.
For each subscription, I want to grab the item data and output it in the column, the concat function is grabbing one row at the moment and outputting the data in the correct format.
The problem I have is that a subscription can have multiple items, so I need to grab each one and tie it to the correct subscription via the where statement.
How can I do that?
I've read about using UNION ALL, is that the right direction to go?
CREATE VIEW Sub_Products AS
(
SELECT
i.subscription_id as "Subscription ID",
concat('product_id:',i.product_id,'|quantity:',i.quantity,'|total:',(i.unit_price * i.quantity),'|meta:|tax:0;') as "Products"
FROM subscriptions s, subscriptionitems i, customerdata c
WHERE s.id = i.subscription_id
AND i.active = 1
);
So as an example of the output - any with the same subscription id should be combined and the products should be output in the same row.
So the subscription 217 should have in the products column "product_id:253|quantity:1|total:2.34|meta:|tax:0;product_id:252|quantity:1|total:2.43|meta:|tax:0;"
Sample data from the subscriptionitems table:
id
subscription_id
customer_id
product_id
quantity
active
unit_price
556
230
184
262
1
0
2.79
8100
230
184
262
1
1
2.79
555
230
184
260
1
0
2.52
This is my attempt:
CREATE VIEW Sub_Products AS
(
SELECT
i.subscription_id as "Subscription ID",
GROUP_CONCAT('product_id:',i.product_id,'|quantity:',i.quantity,'|total:',(i.unit_price * i.quantity),'|meta:|tax:0;') as "Products"
FROM subscriptions s, subscriptionitems i, customerdata c
WHERE s.id = i.subscription_id
AND i.active = 1
GROUP BY i.subscription_id
);
Never use commas in the FROM clause. Always use proper, explicit, standard, readable JOIN syntax.
If you did so, you would probably notice that there is no JOIN condition for customerdata. In fact, that table is not used at all. And neither is subscriptions.
I would suggest
SELECT i.subscription_id ,
GROUP_CONCAT('product_id:', i.product_id,
'|quantity:', i.quantity,
'|total:', (i.unit_price * i.quantity),
'|meta:|tax:0;'
) as Products
FROM subscriptionitems i
WHERE i.active = 1 ;
GROUP BY i.subscription_id;
Note that I fixed the column names so no escaping is needed either.

Alternative Using Having in mysql

I am trying to get the records where avg is greater than 81, I noticed I can't use a simple where avg(score) > 80
But using a Having statement is problematic as well as it does not consider where the individual records average is greater than 80, but it considers the group average. Is there an alternative?
In general, if we want to return aggregates (SUM,AVG) and also return detail that makes up the aggregate, we typically use two SELECT
As a rudimentary example, consider a table of "test_score"
test_id student_id score
------- ---------- -----
101 6 90
101 7 71
101 8 88
222 6 93
222 7 78
222 8 81
We can calculate the average score for each test, with a SELECT ... GROUP BY query.
SELECT r.test_id AS test_id
, AVG(r.score) AS avg_score
, MAX(r.score) AS high_score
FROM test_score r
GROUP
BY r.test_id
We expect that to return a resultset like this:
test_id avg_score
------- ---------
101 83
222 84
We can use that query as an inline view i.e. we wrap it in parens and reference it like a table in the FROM clause of another SELECT.
As a demonstration, to return student scores that were better (or equal to) average for each test:
SELECT s.test_id
, s.avg_score
, t.student_id
, t.score
FROM ( -- inline view to get average score for each test_id
SELECT r.test_id AS test_id
, AVG(r.score) AS avg_score
FROM test_score r
GROUP
BY r.test_id
) s
LEFT
JOIN test_score t
ON t.test_id = s.test_id
AND t.score >= s.avg_score
ORDER
BY t.test_id
, s.score DESC
And we'd expect that to return something like:
test_id avg_score student_id score
------- --------- ---------- -----
101 83 6 90
101 83 8 88
222 84 6 93
The first two columns, returned from the inline view, are the result of the aggregate (AVG). The last two columns are detail rows, matched to the rows from the aggregate result.
To summarize the main point here:
To return aggregates along with details, we typically need two SELECT.
One SELECT to get the aggregates (with a GROUP BY if the aggregates are "per" each something or other)
Another SELECT to get the details and a match to the aggregate.
If the average score being computed in your query is already correct, you are just having trouble filtering by it, just wrap it in parens and select from it
select * from (
SELECT Count(entry_id) AS Filled,
q.question AS Questions,
AVG(ag.score) AS TOTAL
FROM entry e
LEFT JOIN entry_answer ea
ON ea.entry_id= e.entry
LEFT JOIN question q
ON q.question_id = ea.question_id
LEFT JOIN question_group qg
ON ea.question_parent_id = qg.question_parent_id
LEFT JOIN answer_group ag
ON ag.question_id = qg.question_parent_id
JOIN sent_list using (sent_list_id)
WHERE
entry_group_id = 2427
AND ag.score >= 0
AND ea.rated_answer_id = ag.rated_answer_id
AND sent_id = 6156
AND e.entry_date BETWEEN '2018-01-01' AND '2019-12-31'
group by ea.question_id
) results where total >= 81

Multiple join with self join in MySQL and split rows in columns by a row value

I have three tables "Users" , "Subjects" and "Marks" like
Users Table
id name
1 A
2 B
3 C
4 D
5 E
6 A
7 B
Subjects Table
id name
1 Chemistry
2 Physics
3 English
4 Maths
5 History
Marks Table
u_id is the foreign key of Users (id) and s_id is foreign key of Subjects(id)
id u_id s_id marks
1 1 1 60
2 1 2 70
3 1 3 80
4 2 2 80
5 2 3 44
6 3 1 50
7 5 4 50
8 4 5 50
9 5 4 100
10 2 5 100
and I wish for the result to be like
id Name Chemistry Physics English
1 A 60 70 80
2 B NULL 80 44
3 3 50 NULL NULL
Using Join
So far I have only been able to get
name name marks
A English 80
A Physics 70
A Chemistry 60
B English 44
B Physics 80
C Chemistry 50
Using the following query
SELECT u.name, s.name , m.marks
FROM Users as u
RIGHT JOIN Marks as m ON m.u_id = u.id
LEFT JOIN Subjects as s ON m.s_id = s.id
WHERE s.name = "English"
OR s.name = "Physics"
OR s.name = "Chemistry"
ORDER BY u.name; "
Well, after reading the answers, I wanted to post my own one:
SELECT
u.id
, u.name
, MAX(IF(s.id = 1, COALESCE(m.mark), 0)) as 'Chem'
, MAX(IF(s.id = 2, COALESCE(m.mark), 0)) as 'Phys'
, MAX(IF(s.id = 3, COALESCE(m.mark), 0)) as 'Eng'
FROM marks m
INNER JOIN subjects s
ON s.id = m.subjects_id
INNER JOIN users u
ON u.id = m.users_id
GROUP BY u.id
You can check that makes all you want in SqlFiddle: http://sqlfiddle.com/#!9/f567b/1
The important part is the grouping of all the elements according to the user id, and the way of writing the results from rows in a table to columns in another table. As written in #TheShalit answer, the way of achieving that is just assigning the value as a column. Problem is that when grouping by user, you'll have a lot of values there from where you have to select the important one (the one that is not 0 neither NULL, XD). COALESCE function makes sure that you always return a integer, just in case a NULL is given.
It's also important to notice that you'll have to build the SQL with the names of the subjects and the ids from database, as SQL can't retrieve the name of the elements to write them directly as names of the columns. That's why I wrote 'Chem', 'Phys' and 'Eng' instead of the right names. In fact, would be easier if you just wrote the id of the subject instead of a name, just to retrieve the elements later when you'll fetch the rows.
Take into account that is VERY IMPORTANT that you'll table will have the right indexes there. Make sure you have an UNIQUE id on the table marks with users and subjects to avoid having more than one value there stored
Use select like this(with joins and group by student):
MAX(If(subjects.name="Chemistry",marks.marks,'')) Chemistry,
MAX(If(subjects.name="Physics",marks.marks,'')) Physics,
.....
You will need to do something like:
SELECT u.NAME AS NAME,
m_e.marks AS english,
m_p.marks AS physics,
m_c.marks AS chemistry
FROM users AS u
JOIN marks AS m_e ON m_e.u_id = u.id
JOIN marks AS m_p ON m_p.u_id = u.id
JOIN marks AS m_c ON m_c.u_id = u.id
WHERE m_e.s_id = 3 AND m_c.s_id = 1 AND m_p.s_id = 2
You are getting 3 different values from a single table but different rows so you need to join the marks table with itself to be able to get the values from 3 different records into 1 result row
I used the values that you defined as primary id's for your 3 subjects in your question in the where clause to make sure you are getting the correct result for each subject

how to avoid duplicate data in sql command

I try to use DISTINCT to avoid duplication of the data, but no work.
How to avoid the duplicate data?
Table 1
Employee code Deduction Voucher no Dec_Amount
001 999 50
001 888 20
002 777 100
Table 2
Employee code Payslip Voucher No Pay_Amount
001 111 100
002 222 200
The output should be:
Employee code Deduction Voucher no Dec_Amount Payslip Voucher No Pay_Amount
001 999 50 111 100
001 888 20
002 777 100 222 200
But i got the table like this.
Employee code Deduction Voucher no Dec_Amount Payslip Voucher No Pay_Amount
001 999 50 111 100
001 888 20 111 100
002 777 100 222 200
You cannot get those results with just a SQL query. It seems to me you need it in this format for display in a table/excel spreadsheet. If this is the case you would have to handle the "Hiding" of the particular entries with some other code. The reason is because the entries you want to hide are correctly associated with the 001 employee.
While I do agree this probably makes a lot more sense to do in your front end code, it is possible to do in SQL. Using a variable you get a function similar to SQL Server's ROW_NUMBER function and then only join on the first row per employee code.
See the sqlfiddle - http://sqlfiddle.com/#!2/47302/11
SELECT t1.`Employee code`,`Deduction Voucher no`,`Dec_Amount`,
COALESCE(`Payslip Voucher No`,'') as `Payslip Voucher No`,
COALESCE(CAST(`Pay_Amount` as char(10)),'') as `Pay_Amount`
FROM Table2 t2
RIGHT JOIN
(
SELECT #row_num := IF(#prev_value=`Employee code`,#row_num+1,1) AS RowNumber
,`Employee code`,`Deduction Voucher no`,`Dec_Amount`
,#prev_value := `Employee code`
FROM Table1,
(SELECT #row_num := 1) x,
(SELECT #prev_value := '') y
ORDER BY `Employee code`
) t1
ON t1.`Employee code`=t2.`Employee code` AND t1.RowNumber=1
To expand on #d.lanza38's answer, there is no way given for the DB to tell which row in table1 should get the data from table2. Remember that there is no order to the data in the database, so there is no inherent concept of "the first row with employee code 001".
A standard (inner) join will put them as you have shown - on both. Which is actually correct - your table structures say that for every payslip in table2, there can be many deductions. So if you want the data from both tables, the deductions have to have the matching payslip data attached.
You can't use DISTINCT to magically fix your data - you need to understand the data structures, and relate them correctly.
To get what is in your example (which may be wrong) try this SQL:
select
a.Employee_code,
Deduction_Voucher_no,
Dec_Amount,
Payslip_Voucher_No,
Pay_Amount
from
table1 as a
inner join table2 as b on a.employee_code = b.employee_code
where Deduction_Voucher_no = (
select max(Deduction_Voucher_no)
from table1 as c
where a.Employee_code = c.Employee_code)
UNION
select
a2.Employee_code,
Deduction_Voucher_no,
Dec_Amount,
null as Payslip_Voucher_No,
null as Pay_Amount
from
table1 as a2
inner join table2 as b2 on a2.employee_code = b2.employee_code
where Deduction_Voucher_no <> (
select max(Deduction_Voucher_no)
from table1 as c2
where a2.Employee_code = c2.Employee_code)
order by 1,2 desc
Note: untested, because I don't have your database, and don't even know which database engine you are using. If it complains about selecting nulls, replace with 0 or '' depending upon the data type.
UPDATE improved SQL and provided a fiddle: http://sqlfiddle.com/#!2/e7fc2/2

SQL - Find AT LEAST TWO DISTINCT / SEPARATE / DIFFERENT values on another Table

While taking an Online database course (for beginner) a problem has came to my attention, where I had to find queries involving ...AT LEAST TWO DISTINCT values... For example,
the COMPANY database in the ELMASRI book which states: Find all employee who work on at least two distinct projects. And the solution (which works great) is
SELECT DISTINCT LName FROM Employee e1
JOIN Works_On AS w1 ON (e1.Ssn = w1.Essn)
JOIN Works_On AS w2 ON (e1.Ssn = w2.Essn)
WHERE w1.Pno <> w2.Pno
Similarly in case of the STUDENT/COURSE database (i forgot the source): Find the Student_ID of the Students who take at least two distinct Courses. And the solution looks also simple (though its not tested)
SELECT e1.Student_ID FROM Enroll AS e1, Enroll AS e2
WHERE e1.Student_ID = e2.Student_ID
AND e1.Course_ID <> e2.Course_ID
In my problem, I have to Find the name and customer ID of those customers who have accounts in at least two branches of distinct types (i.e., which do not have the same Branch Type).
from the following table (MySql)
CUSTOMER: BRANCH: ACCOUNT:
Cust_ID Lname Br_ID Br_Type Acc_Num Br_ID Cust_ID Balance
------- ------ ----- ------- ------- ----- ------- -------
1 Mr.A 10 big 1001 10 1 2000
2 Mr.B 11 small 1002 11 1 2500
3 Mr.C 12 big 1003 13 1 3000
4 Mr.D 13 small 1004 12 2 4000
1005 13 3 4500
1006 10 4 5000
1007 12 4 6000
Result Table should look like the following:
Lname Cust_ID
----- -------
Mr.A 1
Only Mr.A has account in a branch whose type is 'big' as well as in a branch whose type is 'small'
I tried the following which didnt work
SELECT DISTINCT c1.Lname, a1.Cust_ID FROM Customer AS c1
JOIN Account a1 ON (c1.Cust_ID=a1.Cust_ID)
JOIN Branch b1 ON (a1.Br_ID=b1.Br_ID)
JOIN Branch b2 ON (a1.Br_ID=b2.Br_ID)
WHERE b1.Br_Type<>b2.Br_Type;
What am I exactly doing wrong? Sorry for such a long description but i wanted to make sure that the question is understandable and a little explanation on < > part will be highly appreciated.
You're trying to pull 2 different Branch records off the same Account record - but that can't happen. What you want is to search on 2 different Account records with associated Branches of a different type:
SELECT DISTINCT c1.Lname, a1.Cust_ID FROM Customer AS c1
JOIN Account a1 ON (c1.Cust_ID=a1.Cust_ID)
JOIN Account a2 ON (c1.Cust_ID=a2.Cust_ID)
JOIN Branch b1 ON (a1.Br_ID=b1.Br_ID)
JOIN Branch b2 ON (a2.Br_ID=b2.Br_ID)
WHERE b1.Br_Type<>b2.Br_Type;
SQLFiddle here
A more efficient approach that gives the same result, would be to use GROUP BY and HAVING COUNT(DISTINCT Br_Type) >= 2 - which is what #GordonLindoff proposed.
The problem with your query is the two on conditions. They are returning the same row in branch, because the join conditions are the same.
In any case, I think there is a better way to think about these types of queries (what I call "set--sets" queries). Think of these as aggregation. Aggregation at the customer level, then using the having clause to filter the customers:
SELECT c.Lname, a.Cust_ID
FROM Customer AS c JOIN
Account a
ON c.Cust_ID = a.Cust_ID JOIN
Branch b
ON a.Br_ID = b.Br_ID
GROUP BY c.Lname, a.Cust_ID
HAVING count(distinct b.br_type) > 1;