Merge rows in mysql based on condition - mysql

I am trying to merge the rows based on condition in mysql.
I have table as shown below :
Looking merge the row 1 into row 2 (where the attendance count is larger)
and need to shown the result as :
I was trying to divide the dataset into 2 parts using the below query
select
a.student_id,a.school_id,a.name,a.grant,a.classification,a.original_classification,,a.consent_type
from (
select * from school_temp where original_classification='all' and availability='implicit')a
join(select * from school_temp where original_classification!='all' and availability!='implicit')b
on a.student_id = b.student_id and a.school_id=b.school_id and a.name=b.name
But unable to merge the rows and get total attendance count .
Please help me ,i am badly stuck in this

Split this into two queries that you combine with UNION.
The first joins the implicit row with the row with the highest attendance among the explicit rows for each student. See Retrieving the last record in each group - MySQL for how that works. Use SUM(attendance_count) to combine the attendances.
The second query in the UNION gets all the rows that don't have the highest attendance.
WITH explicit as (
SELECT *
FROM school_temp
WHERE original_classification!='all' and availability!='implicit'
)
SELECT a.student_id, a.school_id, a.name, SUM(attendance_count) AS attendance_count,
b.grant, b.classification, b.original_classification, b.consent_type
FROM school_temp AS a
JOIN (
SELECT t1.*
FROM explicit AS t1
JOIN (
SELECT student_id, school_id, name, MAX(attendance_count) AS max_attendance
FROM explicit AS t2
GROUP BY student_id, school_id, name
) AS t2 ON t1.student_id = t2.student_id AND t1.school_id = t2.school_id AND t1.name = t2.name AND t1.attendance_count = t2.max_attendance
) AS b ON a.student_id = b.student_id and a.school_id=b.school_id and a.name=b.name
WHERE a.original_classication = 'all' AND a.availability = 'implicit'
UNION ALL
SELECT t1.*
FROM explicit AS t1
JOIN (
SELECT student_id, school_id, name, MAX(attendance_count) AS max_attendance
FROM explicit AS t2
GROUP BY student_id, school_id, name
) AS t2 ON t1.student_id = t2.student_id AND t1.school_id = t2.school_id AND t1.name = t2.name AND t1.attendance_count < t2.max_attendance
I've used a CTE to give a name to the subquery that gets all the explicit rows. If you're using MySQL 5.x, you'll need to replace explicit with that subquery throughout the query. Or you could define it as a view.

Related

SQL trying to use a column from a joined table in a subquery

Here is what I currently have which returns 3 columns for patient_id, group_concat_1, and group_concat_2:
SELECT patient_id,
(SELECT GROUP_CONCAT(column1) FROM
table1 where patient_id = patient.id
) group_concat_1,
(SELECT GROUP_CONCAT(column1) FROM
table2 where patient_id = patient.id
) group_concat_2
FROM patient
However, I need to return a single column with group_concat_1 and group_concat_2 combined, so I tried this:
SELECT patient_id,
SELECT CONCAT(group_concat_1, group_concat_2) FROM (
(SELECT GROUP_CONCAT(column1) FROM
table1 where patient_id = patient.id
) group_concat_1,
(SELECT GROUP_CONCAT(column1) FROM
table2 where patient_id = patient.id
) group_concat_2
)
FROM patient
But his clearly doesn't work since now it can't find patient.id in the inner subquery. Any advice? Thanks!
You can concatenate directly the 2 columns:
SELECT p.patient_id,
CONCAT(
(SELECT GROUP_CONCAT(column1) FROM table1 where patient_id = p.patient.id),
(SELECT GROUP_CONCAT(column1) FROM table2 where patient_id = p.patient.id)
)
FROM patient p
I'm pretty sure you want concat_ws() for this purpose:
SELECT patient_id,
CONCAT_WS(','
(SELECT GROUP_CONCAT(t1.column1) FROM table1 t1 where t1.patient_id = p.id
),
(SELECT GROUP_CONCAT(t2.column1) FROM table2 t2 where t2.patient_id = p.id
)
) as combined
FROM patient p;
There are two reasons:
You can distinguish between the last element from table1 and the first from `table2.
If one of the tables has no matching values, this returns the results from the other.
Also note that I added table aliases and qualified column names. This is quite important when working with queries that have multiple table references -- it helps prevent some very hard to debug errors.
I should add that your original query would run in most databases. MySQL and Oracle happen to be two that don't understand nested correlation clauses.

why the sql correct and the inner mechanism for run it?

the sql as follows come from mysql document. it is:
SELECT * FROM t1 AS t
WHERE 2 = (SELECT COUNT(*) FROM t1 WHERE t1.id = t.id);
The document say It finds all rows in table t1 containing a value that occurs twice in a given column , and doesnot explain the sql.
t1 and t is the same table, so the
count(*) in subquery == select count(*) from t
, isn't it?
count(*) in subquery == select count(*) from t
is wrong. because in mysql you can't use it like that. so you have to run it like that to get result of same id having two rows.
if you want to get count of same occurrence,
SELECT id, name, count(*) AS all_count FROM t1 GROUP BY id HAVING all_count > 1 ORDER BY all_count DESC
And also you can get values as your query like this as well,
select * from t1 where id in ( select id from t1 group by id having count(*) > 1 )
The query contains a correlated subquery in WHERE clause:
SELECT COUNT(*) FROM t1 WHERE t1.id = t.id
It is called correlated because it is related to the main query via t.id. So, this subquery counts the number of records having an id value that is equal to the current id value of the record returned by the main query.
Thus, predicate
(SELECT COUNT(*) FROM t1 WHERE t1.id = t.id) = 2
evaluates to true for any row with an id value that occurs twice in the table.
SELECT * FROM t1 AS t
WHERE 2 = (SELECT COUNT(*) FROM t1 WHERE t1.id = t.id);
This query goes through each record in t1 and then in the subquery looks into t1 again to see if in this case id is found 2 times (and only 2 times). You can do the same for any other column in t1 (or any table for that matter).
When you would like to see all values that are multiple times in the table, change WHERE 2 = by WHERE 1 <. This will also give you the values that are 3 times, 4 times, etc. in the table.
{
SELECT id,count( * )
FROM
MyTable
group by id
having count( * )>1
}
with this code, you can see the rows which repet more than one,
and you can change this query by yourself
How about using GROUP BY and HAVING:
SELECT id, count(1) as Total FROM MyTable AS t1
GROUP BY t1.id
HAVING Total = 2

How to select rows which have the biggest value of a column?

I don't know if my title is understandable or not, may be someone can help edit my title?
All I want to do is, for example:
I have a table like this
Engineering appears 5 times with different article_category_abbr, and I want to select only one row with the biggest value of num.
Here, it will be Engineering-ENG-192, and Geriatrics&Gerontology will be Geriatrics&Gerontology-CLM-26
But I don't know how to do it on the whole table using mysql
Join your table to a subquery which finds the greatest num value for each sc group.
SELECT t1.*
FROM yourTable t1
INNER JOIN
(
SELECT sc, MAX(num) AS max_num
FROM yourTable
GROUP BY sc
) t2
ON t1.sc = t2.sc AND
t1.num = t2.max_num;
You can have a subquery that gets the largest value for each sc and the resulting rows will then be joined with the table itself based from two columns - sc and num.
SELECT a.*
FROM tableName a
INNER JOIN
(
SELECT sc, MAX(num) AS Num
FROM tableName
GROUP BY sc
) b ON a.sc = b.sc
AND a.num = b.num
Here's a Demo
USE MAX function and GROUP BY like this. Here is more information.
SELECT myID, classTitle, subField, MAX(score) FROM myTable GROUP BY myID, classTitle, subField

How can I compare if 2 tables have the same data?

If I have 2 tables and want to find if they have the same data, what is the most straightforward way to do it in MySQL?
I have read about doing a correlated subquery and UNION ALL but this query is about 2 pages (!) and can not really follow what it is doing. There must be an easier way.
Even if it is e.g. make MySQL copy the table data to files and do a vimdiff (I am not sure that this is even possible -is it?- just thinking out loud).
UPDATE
I am interested only in the table data and not structure. This is to clarify due to an ambiguous comment I made
If you just want to tell whether the tables are identical or not as efficiently as possible, use this query:
SELECT 1 FROM (
SELECT * FROM table1
UNION ALL
SELECT * FROM table2
) t
GROUP BY col1, col2, col3
HAVING count(*) = 1
LIMIT 1
List all the columns in GROUP BY to compare the entire table.
If the result is an empty set, the two tables are identical.
If you want to see the differences, use this query:
SELECT * FROM (
SELECT 'table1' tname, col1, col2, col3 FROM table1
UNION ALL
SELECT 'table2' tname, col1, col2, col3 FROM table2
) t
GROUP BY col1, col2, col3
HAVING count(*) = 1
List the same columns in the inner SELECT as in the GROUP BY, plus a column to distinguish the two tables.
Just throwing this out there, you could emulate a full outer join and then return the rows where just the right or the left side is null.
select t1.*
from table1 t1
LEFT OUTER JOIN table2 t2
ON t1.col1 = t2.col1
AND t1.col2 = t2.col2
AND ...
WHERE t2.id is null
UNION
select t2.*
from table2 t2
LEFT OUTER JOIN table1 t1
ON t2.col1 = t1.col1
AND t2.col2 = t1.col2
AND ...
WHERE t1.id is null
With the FULL OUTER JOIN you can show all rows where the other row is not available in the other table.
Use the following query:
SELECT c1 = cjoin AND c2 = cjoin equiv
FROM (SELECT COUNT(*) c1 FROM Table1) t1,
(SELECT COUNT(*) c2 FROM Table2) t2,
(SELECT COUNT(*) cjoin
FROM Table1 t1
JOIN Table2 t2
ON t1.col1 = t2.col1 AND t1.col2 = t2.col2 AND t1.col3 = t2.col3 ...) tjoin
Assuming the tables have a unique key, this will return equiv = 1 if the tables are equal. It doesn't show the differences, it's just a binary test.
I was reading SQL Cookbook from A.Molinaro, when I came across a solution.
It is based on to tables
emp(empno,ename,job,mgr,hiredate,sal,comm,deptno)
and a view
V
which has the same columns but different rows. The columns mgr and comm might be NULL, other columns not.
The solution in the book is very long and it does not show all differences, although this was the stated problem in 3.7.
I made up my solution which is shorter and shows all differences (means all rows which have different counts in the two tables).
select * from
# those which are contained in the (distinct) union of (col1,col2,...,coln, count) of both tables:
( select empno,ename,job,mgr,hiredate,comm,deptno, count(*) cnt from emp group by empno,ename,job,mgr,hiredate,comm,deptno
union
select empno,ename,job,mgr,hiredate,comm,deptno, count(*) cnt from V group by empno,ename,job,mgr,hiredate,comm,deptno
) as unionOfBoth
where (empno,ename,job,mgr,hiredate,comm,deptno,cnt)
not in
# those which are contained in the intersection of both tables with the equal number of counts:
( select e.empno,e.ename,e.job,e.mgr,e.hiredate,e.comm,e.deptno,e.cnt
from
(select empno, ename,job,mgr,hiredate,comm,deptno, count(*) cnt from emp group by empno,ename,job,mgr,hiredate,comm,deptno) e,
(select empno, ename,job,mgr,hiredate,comm,deptno, count(*) cnt from V group by empno,ename,job,mgr,hiredate,comm,deptno) v
where
e.empno = v.empno
and e.ename = v.ename
and e.job = v.job
and ifnull(e.mgr,0) = ifnull(v.mgr,0)
and e.hiredate = v.mgr
and e.deptno = v.deptno
and ifnull(e.comm,0) = ifnull(v.comm,0)
and e.cnt = v.cnt
);
Basically you count the distinct rows in both tables and do a union (not union all) to get the tmp.table unionBoth. Then you remove those rows, which both tables have in common.
Here two rows r1 from table t1 and r2 from table t2 are considered the same, if
(r1,count of r1 in t1) = (r2, count of r2 in t2), which is equivalent to r1=r2 (on all columns) and (count of r1 in t1) = (count of r2 in t2).
If the tables are small enough, you can export both tables as csv files and then copy one of the tables and paste them side-by-side with the other table. You can just go row by row and see if the outputs are the same that way.

SQL - joining data

Is it possible to get 1 result where I require data from 3 tables.
First table: I will need to grab all the fields (1 row found by a primary key)
Second table: I will need to grab the field 'username' (connected to first table by 'master_id')
Third table: I will need to grab the latest added row with the associated master_id key (table has 'date', 'master_id', 'previous_name').
select top 1 first.*, second.username, third.*
from first
inner join second on first.id = second.master_id
inner join third on first.id = third.master_id
order by
third.date desc
As always there are dozens of ways to skin a cat, I'm not sure if this is optimized as the subquery methods, but it should work.
You can join the three tables together. Then, you can use a "filter" join to keep only the latest Table3 row:
select *
from Table1 t1
join Table2 t2
on t2.master_id = t1.master_id
join Table3 t3
on t3.master_id = t1.master_id
join (
select master_id
, max(date) as max_date
from Table3
group by
master_id
) as filter
on t3.master_id = filter.master_id
and t3.date = filter.max_date
You'll need a correlated subquery for that third table.
SELECT t1.*, username, date, previous_name
FROM FirstTable t1
INNER JOIN SecondTable t2 ON t1.master_id=t2.master_id
INNER JOIN
(SELECT master_id, date, previous_name
FROM ThirdTable AS t3_1
WHERE date = (
SELECT MAX(date)
FROM ThirdTable AS t3_2
WHERE t3_2.master_id=t3_1.master_id)) q1 ON q1.master_id=t1.master_id;
NOTE: Untested.