SQL Group_concat not get all data - mysql

I have 2 table and second table use relationship
table1
id name
---------
1 alpha
2 beta
table2
id name relation
-------------------
1 2015 2
2 2016 2
3 2017 2
4 2018 2
I want to see
name data
-------------------------
beta 2015,2016,2017,2018
alpha NULL
I tried the following sql query but the output is not what I wanted
I use:
SELECT
t1.name,
GROUP_CONCAT(t2.name SEPARATOR ',')
FROM table1 AS t1
LEFT JOIN table2 AS t2
ON t2.relation = t1.id
Output:
alpha 2015,2016,2017,2018
Alpha doesn't get any value in the other related tablature. the values in the output belong to the beta.

You need GROUP BY:
SELECT t1.name,
GROUP_CONCAT(t2.name SEPARATOR ',')
FROM table1 t1 LEFT JOIN
table2 t2
ON t2.relation = t1.id
GROUP BY t1.name;
In most databases (and recent versions of MySQL), your query would fail. It is an aggregation query (because of the GROUP_CONCAT()). But, t1.name is not an argument to an aggregation function and it is not a GROUP BY key.
MySQL does allow this type of query. It returns exactly one row. The value of t1.name on the one row in the result set comes from an arbitrary row.

No FKs for fiddle:
CREATE TABLE Table1 (`id` int, `name` varchar(5)) ;
INSERT INTO Table1
(`id`, `name`)
VALUES
(1, 'alpha'),
(2, 'beta')
;
CREATE TABLE Table2 (`id` int, `name` int, `relation` int);
INSERT INTO Table2
(`id`, `name`, `relation`)
VALUES
(1, 2015, 2),
(2, 2016, 2),
(3, 2017, 2),
(4, 2018, 2)
;
Statement:
SELECT
t1.name,
GROUP_CONCAT(t2.name SEPARATOR ',') -- missing an AS .... => ugly name from fiddle
FROM table1 AS t1
LEFT JOIN table2 AS t2
ON t2.relation = t1.id
group by t1.name
Output:
name GROUP_CONCAT(t2.name SEPARATOR ',')
alpha (null)
beta 2017,2018,2015,2016

Related

Joining two tables with date ranges

I have two SQL tables that contain start and end dates.
Table 1: Name, AddedDate
Table 2: Name, RemovedDate
I'm am looking to join these two tables and dump the data into a temp table to show when a name was added and removed from a list.
The same name may have been added and removed multiple times.
Desired Output Example
- Name, AddedDate, RemovedDate
- Jane, 2017-02-01, 2017-02-03
- Bill, 2017-01-28, (blank)
- Mike, 2017-01-15, 2017-01-19
- Jane, 2017-01-13, 2017-01-14
Can someone please help? Thanks.
Another option is an OUTER APPLY (If SQL Server)
Example
Declare #Table1 table (Name varchar(25),AddedDate date)
Insert Into #Table1 Values
('Jane', '2017-02-01'),
('Bill', '2017-01-28'),
('Mike', '2017-01-15'),
('Jane', '2017-01-13')
Declare #Table2 table (Name varchar(25),RemovedDate date)
Insert Into #Table2 Values
('Jane', '2017-02-03'),
('Mike', '2017-01-19'),
('Jane', '2017-01-14')
Select A.Name
,A.AddedDate
,B.RemovedDate
From #Table1 A
Outer Apply (
Select RemovedDate=min(RemovedDate)
From #Table2
Where Name=A.Name
and RemovedDate>=A.AddedDate
) B
Returns
Name AddedDate RemovedDate
Jane 2017-02-01 2017-02-03
Bill 2017-01-28 NULL
Mike 2017-01-15 2017-01-19
Jane 2017-01-13 2017-01-14
Using correlated subquery to consider only the last time the name was added (and potentially removed)...
select Name,
AddedDate,
(select max(RemovedDate)
from table_2
where Name=q1.Name
and RemovedDate >= q1.AddedDate) as RemovedDate
from (select Name,
max(AddedDate) as AddedDate
from table_1
group by name) as q1
order by AddedDate desc,
Name;
Same correlated subquery approach to show every time a name was added and removed...
select Name,
AddedDate,
(select min(RemovedDate)
from table_2
where Name=q1.Name
and RemovedDate >= t1.AddedDate) as RemovedDate
from table_1 t1
order by AddedDate desc,
Name;
You could use left join on name
select t1.name, t1.AddedDate, t2.RemovedDate
from table1 t1
left join table2 t2 on t1.name = t2.name
order by name, t1.AddedDate, t2.RemovedDate
For SQL Server Use the below script
;WITH CTE
AS
(
SELECT
SeqNo = ROW_NUMBER() OVER(PARTITION BY T1.Name ORDER BY T1.AddedDate DESC,T2.RemovedDate DESC)
T1.Name,
T1.AddedDate,
T2.RemovedDate
FROM Table1 T1
LEFT JOIN Table2 T2
ON LTRIM(RTRIM(T1.name)) = LTRIM(RTRIM(T2.name))
)
SELECT
*
FROM CTE
WHERE SeqNo = 1
If you want all the records for a Name, such as if a name was added and removed multiple times and you want each of the dates, then just execute without the
WHERE SeqNo = 1
part

Joining table to itself and selecting values that don't match

I want to get all data in id's 1-3 that are NOT in id's > 6
I'm using id's for simplicity, but I'm really using timestamps.
CREATE TABLE test (
id bigint(20) NOT NULL AUTO_INCREMENT,
data varchar(3) NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO test (id, data) VALUES
(1, 'abc'),
(2, 'def'),
(3, 'ghi'),
(4, 'jkl'),
(5, 'mno'),
(6, 'pqr'),
(7, 'def'),
(8, 'vxw'),
(9, 'yz');
One query of the dozens that I've tried.
SELECT
t1.data t1_data,
t2.data t2_data
FROM test t1
JOIN test t2
ON t2.id BETWEEN 1 AND 3
AND t1.id > 6
AND t1.data <> t2.data
So I want to get this result:
+----------+
| data |
+----------+
| abc |
| ghi |
+----------+
SELECT t1.data AS t1_data
FROM test t1
WHERE t1.id BETWEEN 1 AND 3
AND NOT EXISTS(
SELECT *
FROM test t2
WHERE t2.data = t1.data
AND t2.id > 6
);
This is an example of a set-within-sets subquery. I like to approach these using aggregation with the having clause, because this is the most general approach. In your case:
select t1.data
from test t1
group by t.data
having sum(id between 1 and 3) > 0 and
sum(id > 6) = 0;
The conditions in the having clause count the number of rows that meet each condition. The first says that there is at least one row (for a given data) with the id between 1 and 3. The second says there are no rows where the id is greater than 6.
You can use a NOT EXISTS clause:
SELECT DISTINCT t1.data
FROM test t1
WHERE t1.id BETWEEN 1 AND 3
AND NOT EXISTS
(
SELECT 1
FROM test t2
WHERE t2.data = t1.data
AND t2.id > 6
);
I'm using DISTINCT here because I assume it's possible for example to have a data value with id=2 and the same data value with id=3. Remove it as necessary.
There are a couple ways to do it (probably performance-wise an outer join might be best) but conceptually it is this:
SELECT t1.data
FROM test t1
WHERE t1.id < 4
AND t1.data NOT IN
(SELECT t2.data
FROM test t2
WHERE t2.id > 6)
The outer join version would look like this:
SELECT t1.data
FROM test t1 LEFT OUTER JOIN test t2
ON t1.data = t2.data and t1.id < 4 and t2.id > 6
WHERE t2.id IS NULL

How to use select into statement?

I want to insert records from Table1 and Table2 into Table3 and my Table3 has Two columns:
studentId
subjectId
And I want to insert these 2 values from Table1(contains 1000 student Id's) and From Table2(contains 5 subjects). To achieve that I have used following query but it gave me error
Query:
INSERT INTO StudentSubject(studentId,subjectId)
SELECT studentId FROM Table1 UNION SELECT subjectId FROM Table2
But I got this error message:
Msg 120, Level 15, State 1, Line 1
The select list for the INSERT statement contains fewer items than the insert list. The number of SELECT values must match the number of INSERT columns.
INSERT into StudentSubject(studentId,subjectId)
SELECT a.studentId,b.subjectId
FROM Table1 a CROSS JOIN Table2 b

MySQL query: Single Table multiple comparisions

I have the following mysql table:
id | member |
1 | abc
1 | pqr
2 | xyz
3 | pqr
3 | abc
I have been trying to write a query which would return the id which has exact same members as a given id. For example, if given id is 1 then the query should return 3 because both id 1 and id 3 have exact same members viz. {abc, pqr}. Any pointers? Appreciate it.
EDIT: The table may have duplicates, e.g. id 3 may have members {abc, abc} instead of {pqr, abc}, in which case the query should not return id 3.
Here's a solution that finds matching pairs for the entire table - you can add a where clause to filter as needed. Basically it does a self-join based on equal "member" and unequal "id". It then compares the resulting count grouped by the 2 ids and compares them to the total count of those ids from the original table. If they both match, it means they have the same exact members.
select
t1.id, t2.id
from
table t1
inner join table t2
on t1.member = t2.member
and t1.id < t2.id
inner join (select id, count(1) as cnt from table group by id) c1
on t1.id = c1.id
inner join (select id, count(1) as cnt from table group by id) c2
on t2.id = c2.id
group by
t1.id, t2.id, c1.cnt, c2.cnt
having
count(1) = c1.cnt
and count(1) = c2.cnt
order by
t1.id, t2.id
This is some sample data I used which returned matches of (1,3) and (6,7)
insert into table
values
(1, 'abc'), (1, 'pqr'), (2, 'xyz'), (3, 'pqr'), (3, 'abc'), (4, 'abc'), (5, 'pqr'),
(6, 'abc'), (6, 'def'), (6, 'ghi'), (7, 'abc'), (7, 'def'), (7, 'ghi')
similar (to Derek Kromm's) approach using sub-queries:
SELECT id
FROM mc a
WHERE
id != 1 AND
member IN (
SELECT member FROM mc WHERE id=1)
GROUP BY id
HAVING
COUNT(*) IN (
SELECT COUNT(*) FROM mc WHERE id=1) AND
COUNT(*) IN (
SELECT COUNT(*) FROM mc where id=a.id);
a logic here is we need all ids that match following 2 conditions:
member is among those that belong to id 1
total number of members is same as number of those that belong to id 1
total number of selected members equal to total number of members for current id
try this:
declare #id int
set #id=1
select a.id from
(select id,COUNT(*) cnt from sample_table
where member in (select member from sample_table where id=#id)
and id <>#id
group by id)a
join
(select count(distinct member) cnt from sample_table where id=#id)b
on a.cnt=b.cnt

Why does SELECT results differ between mysql and sqlite?

I'm re-asking this question in a simplified and expanded manner.
Consider these sql statements:
create table foo (id INT, score INT);
insert into foo values (106, 4);
insert into foo values (107, 3);
insert into foo values (106, 5);
insert into foo values (107, 5);
select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
having avg2 > avg1);
Using sqlite, the select statement returns:
id avg1
---------- ----------
106 4.5
107 4.0
and mysql returns:
+------+--------+
| id | avg1 |
+------+--------+
| 106 | 4.5000 |
+------+--------+
As far as I can tell, mysql's results are correct, and sqlite's are incorrect. I tried to cast to real with sqlite as in the following but it returns two records still:
select T1.id, cast(avg(cast(T1.score as real)) as real) avg1
from foo T1
group by T1.id
having not exists (
select T2.id, cast(avg(cast(T2.score as real)) as real) avg2
from foo T2
group by T2.id
having avg2 > avg1);
Why does sqlite return two records?
Quick update:
I ran the statement against the latest sqlite version (3.7.11) and still get two records.
Another update:
I sent an email to sqlite-users#sqlite.org about the issue.
Myself, I've been playing with VDBE and found something interesting. I split the execution trace of each loop of not exists (one for each avg group).
To have three avg groups, I used the following statements:
create table foo (id VARCHAR(1), score INT);
insert into foo values ('c', 1.5);
insert into foo values ('b', 5.0);
insert into foo values ('a', 4.0);
insert into foo values ('a', 5.0);
PRAGMA vdbe_listing = 1;
PRAGMA vdbe_trace=ON;
select avg(score) avg1
from foo
group by id
having not exists (
select avg(T2.score) avg2
from foo T2
group by T2.id
having avg2 > avg1);
We clearly see that somehow what should be r:4.5 has become i:5:
I'm now trying to see why that is.
Final edit:
So I've been playing enough with the sqlite source code. I understand the beast much better now, although I'll let the original developer sort it out as he seems to already be doing it:
http://www.sqlite.org/src/info/430bb59d79
Interestingly, to me at least, it seems that the newer versions (some times after the version I'm using) supports inserting multiple records as used in a test case added in the aforementioned commit:
CREATE TABLE t34(x,y);
INSERT INTO t34 VALUES(106,4), (107,3), (106,5), (107,5);
I tried to mess with some variants of query.
It seems, like sqlite has errors in using of previous declared fields in a nested HAVING expressions.
In your example avg1 under second having is always equal to 5.0
Look:
select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
SELECT 1 AS col1 GROUP BY col1 HAVING avg1 = 5.0);
This one returns nothing, but execution of the following query returns both records:
...
having not exists (
SELECT 1 AS col1 GROUP BY col1 HAVING avg1 <> 5.0);
I can not find any similar bug at sqlite tickets list.
Lets look at this two ways, i'll use postgres 9.0 as my reference database
(1)
-- select rows from foo
select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
-- where we don't have any rows from T2
having not exists (
-- select rows from foo
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
-- where the average score for any row is greater than the average for
-- any row in T1
having avg2 > avg1);
id | avg1
-----+--------------------
106 | 4.5000000000000000
(1 row)
then let's move some of the logic inside the subquery, getting rid of the 'not' :
(2)
-- select rows from foo
select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
-- where we do have rows from T2
having exists (
-- select rows from foo
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
-- where the average score is less than or equal than the average for any row in T1
having avg2 <= avg1);
-- I think this expression will be true for all rows as we are in effect doing a
--cartesian join
-- with the 'having' only we don't display the cartesian row set
id | avg1
-----+--------------------
106 | 4.5000000000000000
107 | 4.0000000000000000
(2 rows)
so you have got to ask yourself -- what do you actually mean when you do this correlated subquery inside a having clause, if it evaluates every row against every row from the primary query we are making a cartesian join and I don't think we should be pointing fingers at the SQL engine.
if you want every row that is less than the maximum average What you should be saying is:
select T1.id, avg(T1.score) avg1
from foo T1 group by T1.id
having avg1 not in
(select max(avg1) from (select id,avg(score) avg1 from foo group by id))
Have you tried this version? :
select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
having avg(T2.score) > avg(T1.score));
Also this one (which should be giving same results):
select T1.*
from
( select id, avg(score) avg1
from foo
group by id
) T1
where not exists (
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
having avg(T2.score) > avg1);
The query can also be handled with derived tables, instead of subquery in HAVING clause:
select ta.id, ta.avg1
from
( select id, avg(score) avg1
from foo
group by id
) ta
JOIN
( select avg(score) avg1
from foo
group by id
order by avg1 DESC
LIMIT 1
) tmp
ON tmp.avg1 = ta.avg1