Joining table to itself and selecting values that don't match

Joining table to itself and selecting values that don't match - mysql

I want to get all data in id's 1-3 that are NOT in id's > 6
I'm using id's for simplicity, but I'm really using timestamps.
CREATE TABLE test (
id bigint(20) NOT NULL AUTO_INCREMENT,
data varchar(3) NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO test (id, data) VALUES
(1, 'abc'),
(2, 'def'),
(3, 'ghi'),
(4, 'jkl'),
(5, 'mno'),
(6, 'pqr'),
(7, 'def'),
(8, 'vxw'),
(9, 'yz');
One query of the dozens that I've tried.
SELECT
t1.data t1_data,
t2.data t2_data
FROM test t1
JOIN test t2
ON t2.id BETWEEN 1 AND 3
AND t1.id > 6
AND t1.data <> t2.data
So I want to get this result:
+----------+
| data |
+----------+
| abc |
| ghi |
+----------+

SELECT t1.data AS t1_data
FROM test t1
WHERE t1.id BETWEEN 1 AND 3
AND NOT EXISTS(
SELECT *
FROM test t2
WHERE t2.data = t1.data
AND t2.id > 6
);

This is an example of a set-within-sets subquery. I like to approach these using aggregation with the having clause, because this is the most general approach. In your case:
select t1.data
from test t1
group by t.data
having sum(id between 1 and 3) > 0 and
sum(id > 6) = 0;
The conditions in the having clause count the number of rows that meet each condition. The first says that there is at least one row (for a given data) with the id between 1 and 3. The second says there are no rows where the id is greater than 6.

You can use a NOT EXISTS clause:
SELECT DISTINCT t1.data
FROM test t1
WHERE t1.id BETWEEN 1 AND 3
AND NOT EXISTS
(
SELECT 1
FROM test t2
WHERE t2.data = t1.data
AND t2.id > 6
);
I'm using DISTINCT here because I assume it's possible for example to have a data value with id=2 and the same data value with id=3. Remove it as necessary.

There are a couple ways to do it (probably performance-wise an outer join might be best) but conceptually it is this:
SELECT t1.data
FROM test t1
WHERE t1.id < 4
AND t1.data NOT IN
(SELECT t2.data
FROM test t2
WHERE t2.id > 6)
The outer join version would look like this:
SELECT t1.data
FROM test t1 LEFT OUTER JOIN test t2
ON t1.data = t2.data and t1.id < 4 and t2.id > 6
WHERE t2.id IS NULL

Related

SQL Group_concat not get all data

I have 2 table and second table use relationship
table1
id name
---------
1 alpha
2 beta
table2
id name relation
-------------------
1 2015 2
2 2016 2
3 2017 2
4 2018 2
I want to see
name data
-------------------------
beta 2015,2016,2017,2018
alpha NULL
I tried the following sql query but the output is not what I wanted
I use:
SELECT
t1.name,
GROUP_CONCAT(t2.name SEPARATOR ',')
FROM table1 AS t1
LEFT JOIN table2 AS t2
ON t2.relation = t1.id
Output:
alpha 2015,2016,2017,2018
Alpha doesn't get any value in the other related tablature. the values in the output belong to the beta.

You need GROUP BY:
SELECT t1.name,
GROUP_CONCAT(t2.name SEPARATOR ',')
FROM table1 t1 LEFT JOIN
table2 t2
ON t2.relation = t1.id
GROUP BY t1.name;
In most databases (and recent versions of MySQL), your query would fail. It is an aggregation query (because of the GROUP_CONCAT()). But, t1.name is not an argument to an aggregation function and it is not a GROUP BY key.
MySQL does allow this type of query. It returns exactly one row. The value of t1.name on the one row in the result set comes from an arbitrary row.

No FKs for fiddle:
CREATE TABLE Table1 (`id` int, `name` varchar(5)) ;
INSERT INTO Table1
(`id`, `name`)
VALUES
(1, 'alpha'),
(2, 'beta')
;
CREATE TABLE Table2 (`id` int, `name` int, `relation` int);
INSERT INTO Table2
(`id`, `name`, `relation`)
VALUES
(1, 2015, 2),
(2, 2016, 2),
(3, 2017, 2),
(4, 2018, 2)
;
Statement:
SELECT
t1.name,
GROUP_CONCAT(t2.name SEPARATOR ',') -- missing an AS .... => ugly name from fiddle
FROM table1 AS t1
LEFT JOIN table2 AS t2
ON t2.relation = t1.id
group by t1.name
Output:
name GROUP_CONCAT(t2.name SEPARATOR ',')
alpha (null)
beta 2017,2018,2015,2016

Fetching top data from the table for each primary key

I have a series of foreign key with each key constitutes to more than one row in the table. How can I fetch only the top row which matches the specified condition?
I have table like this
ID NAME DATE
----------------------
1 abc 5/10/15
1 abc 6/11/15
2 pqr 7/11/15
2 pqr 8/10/15
3 xyz 9/12/15
I need to output to be like this
where the condition is date > 5/11/15 and ID in (1,2)
ID NAME DATE
-----------------
1 abc 6/11/15
2 pqr 7/11/15

You can do what you want using row_number(). I'm not sure exactly what you want though. My best guess is getting the row with the smallest date that meets the conditions:
select t.*
from (select t.*,
row_number() over (partition by id order by date) as seqnum
from t
where date > '2015-11-05' and id in (1, 2)
) t
where seqnum = 1;

Use NOT EXISTS to return a row as long as no other row has same name and an earlier date:
select t1.*
from tablename t1
where not exists (select * from tablename t2
where t2.name = t1.name
and t2.date < t1.date
and t2.date > '5/11/15' and t2.ID in (1,2))
and t1.date > '5/11/15' and t1.ID in (1,2)
JOIN alternative, perhaps better MySQL answer:
select t1.*
from tablename t1
join (select name, min(date) from tablename
where date > '5/11/15' and t2.ID in (1,2)
group by name) as t2
on t1.name = t2.name and t1.date = t2.date
where t1.date > '5/11/15' and t1.ID in (1,2)
Core SQL-99.

Query group by 2 column

I have 1 table with 4 columns
id, name, key, date
1,'A' ,'x1','2015-11-11'
2,'A' ,'x1','2015-11-11'
3,'B' ,'x2','2015-11-11'
4,'B' ,'x2','2015-11-11'
5,'A' ,'x1','2015-11-12'
6,'A' ,'x1','2015-11-12'
7,'B' ,'x2','2015-11-12'
8,'B' ,'x2','2015-11-12'
9,'D' ,'x3','2015-11-12'
I want group by [key] and [date]. Result I want is:
2015-11-11 2
2015-11-12 1
2: date 2015-11-11 have 4 rows (1,2,3,4) but duplicate key, so when group by we only have 2 row.
1: date 2015-11-12 have 5 rows (5,6,7,8,9) but have 4 rows (5,6,7,8) duplicate with date 2015-11-11, I don't want calculator => we only have 1 rows (9)
I'm sorry for my english. I hope you can understand my question.
Please help me every way. I'm using mysql.

select key, date, (select count(*) from tablename t2
where t2.key = t1.key
and t2.date = t1.date
and not exists (select 1 from tablename t3
where t3.key = t2.key
and t3.date < t2.date))
from tablename t1
You can use a correlated sub-query to count that date's keys. Do not count if that date's key-value have already been found for an older date.
Alternative solution:
select t1.key, t1.date, count(*)
from tablename t1
LEFT JOIN (select key, min(date) as date from tablename group by key) t2
ON t2.key = t1.key and t2.date = t1.date
group by t1.key, t1.date

How can I add subtotal to table in MySQL?

Assume my table looks like the following:
id count sub_total
1 10 NULL
2 15 NULL
3 10 NULL
4 25 NULL
How can I update this table to look like the following?
id count sub_total
1 10 10
2 15 25
3 10 35
4 25 60
I can do this easy enough in the application layer. But I'd like to learn how to do it in MySQL. I've been trying lots of variations using SUM(CASE WHEN... and other groupings to no avail.

If your id field is sequential and growing then a correlated subquery is one way:
select *, (select sum(count) from t where t.id <= t1.id)
from t t1
or as a join:
select t1.id, t1.count, sum(t2.count)
from t t1
join t t2 on t2.id <= t1.id
group by t1.id, t1.count
order by t1.id
To update your table (assuming the column sub_total already exists):
update t
join (
select t1.id, sum(t2.count) st
from t t1
join t t2 on t2.id <= t1.id
group by t1.id
) t3 on t.id = t3.id
set t.sub_total = t3.st;
Sample SQL Fiddle showing the update.

Why does SELECT results differ between mysql and sqlite?

I'm re-asking this question in a simplified and expanded manner.
Consider these sql statements:
create table foo (id INT, score INT);
insert into foo values (106, 4);
insert into foo values (107, 3);
insert into foo values (106, 5);
insert into foo values (107, 5);
select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
having avg2 > avg1);
Using sqlite, the select statement returns:
id avg1
---------- ----------
106 4.5
107 4.0
and mysql returns:
+------+--------+
| id | avg1 |
+------+--------+
| 106 | 4.5000 |
+------+--------+
As far as I can tell, mysql's results are correct, and sqlite's are incorrect. I tried to cast to real with sqlite as in the following but it returns two records still:
select T1.id, cast(avg(cast(T1.score as real)) as real) avg1
from foo T1
group by T1.id
having not exists (
select T2.id, cast(avg(cast(T2.score as real)) as real) avg2
from foo T2
group by T2.id
having avg2 > avg1);
Why does sqlite return two records?
Quick update:
I ran the statement against the latest sqlite version (3.7.11) and still get two records.
Another update:
I sent an email to sqlite-users#sqlite.org about the issue.
Myself, I've been playing with VDBE and found something interesting. I split the execution trace of each loop of not exists (one for each avg group).
To have three avg groups, I used the following statements:
create table foo (id VARCHAR(1), score INT);
insert into foo values ('c', 1.5);
insert into foo values ('b', 5.0);
insert into foo values ('a', 4.0);
insert into foo values ('a', 5.0);
PRAGMA vdbe_listing = 1;
PRAGMA vdbe_trace=ON;
select avg(score) avg1
from foo
group by id
having not exists (
select avg(T2.score) avg2
from foo T2
group by T2.id
having avg2 > avg1);
We clearly see that somehow what should be r:4.5 has become i:5:
I'm now trying to see why that is.
Final edit:
So I've been playing enough with the sqlite source code. I understand the beast much better now, although I'll let the original developer sort it out as he seems to already be doing it:
http://www.sqlite.org/src/info/430bb59d79
Interestingly, to me at least, it seems that the newer versions (some times after the version I'm using) supports inserting multiple records as used in a test case added in the aforementioned commit:
CREATE TABLE t34(x,y);
INSERT INTO t34 VALUES(106,4), (107,3), (106,5), (107,5);

I tried to mess with some variants of query.
It seems, like sqlite has errors in using of previous declared fields in a nested HAVING expressions.
In your example avg1 under second having is always equal to 5.0
Look:
select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
SELECT 1 AS col1 GROUP BY col1 HAVING avg1 = 5.0);
This one returns nothing, but execution of the following query returns both records:
...
having not exists (
SELECT 1 AS col1 GROUP BY col1 HAVING avg1 <> 5.0);
I can not find any similar bug at sqlite tickets list.

Lets look at this two ways, i'll use postgres 9.0 as my reference database
(1)
-- select rows from foo
select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
-- where we don't have any rows from T2
having not exists (
-- select rows from foo
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
-- where the average score for any row is greater than the average for
-- any row in T1
having avg2 > avg1);
id | avg1
-----+--------------------
106 | 4.5000000000000000
(1 row)
then let's move some of the logic inside the subquery, getting rid of the 'not' :
(2)
-- select rows from foo
select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
-- where we do have rows from T2
having exists (
-- select rows from foo
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
-- where the average score is less than or equal than the average for any row in T1
having avg2 <= avg1);
-- I think this expression will be true for all rows as we are in effect doing a
--cartesian join
-- with the 'having' only we don't display the cartesian row set
id | avg1
-----+--------------------
106 | 4.5000000000000000
107 | 4.0000000000000000
(2 rows)
so you have got to ask yourself -- what do you actually mean when you do this correlated subquery inside a having clause, if it evaluates every row against every row from the primary query we are making a cartesian join and I don't think we should be pointing fingers at the SQL engine.
if you want every row that is less than the maximum average What you should be saying is:
select T1.id, avg(T1.score) avg1
from foo T1 group by T1.id
having avg1 not in
(select max(avg1) from (select id,avg(score) avg1 from foo group by id))

Have you tried this version? :
select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
having avg(T2.score) > avg(T1.score));
Also this one (which should be giving same results):
select T1.*
from
( select id, avg(score) avg1
from foo
group by id
) T1
where not exists (
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
having avg(T2.score) > avg1);
The query can also be handled with derived tables, instead of subquery in HAVING clause:
select ta.id, ta.avg1
from
( select id, avg(score) avg1
from foo
group by id
) ta
JOIN
( select avg(score) avg1
from foo
group by id
order by avg1 DESC
LIMIT 1
) tmp
ON tmp.avg1 = ta.avg1

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Joining table to itself and selecting values that don't match - mysql

SELECT t1.data AS t1_data FROM test t1 WHERE t1.id BETWEEN 1 AND 3 AND NOT EXISTS( SELECT * FROM test t2 WHERE t2.data = t1.data AND t2.id > 6 );

Related

SQL Group_concat not get all data

Fetching top data from the table for each primary key

Query group by 2 column

How can I add subtotal to table in MySQL?

Why does SELECT results differ between mysql and sqlite?

Categories

Resources