Why does SELECT results differ between mysql and sqlite? - mysql

I'm re-asking this question in a simplified and expanded manner.
Consider these sql statements:
create table foo (id INT, score INT);
insert into foo values (106, 4);
insert into foo values (107, 3);
insert into foo values (106, 5);
insert into foo values (107, 5);
select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
having avg2 > avg1);
Using sqlite, the select statement returns:
id avg1
---------- ----------
106 4.5
107 4.0
and mysql returns:
+------+--------+
| id | avg1 |
+------+--------+
| 106 | 4.5000 |
+------+--------+
As far as I can tell, mysql's results are correct, and sqlite's are incorrect. I tried to cast to real with sqlite as in the following but it returns two records still:
select T1.id, cast(avg(cast(T1.score as real)) as real) avg1
from foo T1
group by T1.id
having not exists (
select T2.id, cast(avg(cast(T2.score as real)) as real) avg2
from foo T2
group by T2.id
having avg2 > avg1);
Why does sqlite return two records?
Quick update:
I ran the statement against the latest sqlite version (3.7.11) and still get two records.
Another update:
I sent an email to sqlite-users#sqlite.org about the issue.
Myself, I've been playing with VDBE and found something interesting. I split the execution trace of each loop of not exists (one for each avg group).
To have three avg groups, I used the following statements:
create table foo (id VARCHAR(1), score INT);
insert into foo values ('c', 1.5);
insert into foo values ('b', 5.0);
insert into foo values ('a', 4.0);
insert into foo values ('a', 5.0);
PRAGMA vdbe_listing = 1;
PRAGMA vdbe_trace=ON;
select avg(score) avg1
from foo
group by id
having not exists (
select avg(T2.score) avg2
from foo T2
group by T2.id
having avg2 > avg1);
We clearly see that somehow what should be r:4.5 has become i:5:
I'm now trying to see why that is.
Final edit:
So I've been playing enough with the sqlite source code. I understand the beast much better now, although I'll let the original developer sort it out as he seems to already be doing it:
http://www.sqlite.org/src/info/430bb59d79
Interestingly, to me at least, it seems that the newer versions (some times after the version I'm using) supports inserting multiple records as used in a test case added in the aforementioned commit:
CREATE TABLE t34(x,y);
INSERT INTO t34 VALUES(106,4), (107,3), (106,5), (107,5);

I tried to mess with some variants of query.
It seems, like sqlite has errors in using of previous declared fields in a nested HAVING expressions.
In your example avg1 under second having is always equal to 5.0
Look:
select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
SELECT 1 AS col1 GROUP BY col1 HAVING avg1 = 5.0);
This one returns nothing, but execution of the following query returns both records:
...
having not exists (
SELECT 1 AS col1 GROUP BY col1 HAVING avg1 <> 5.0);
I can not find any similar bug at sqlite tickets list.

Lets look at this two ways, i'll use postgres 9.0 as my reference database
(1)
-- select rows from foo
select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
-- where we don't have any rows from T2
having not exists (
-- select rows from foo
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
-- where the average score for any row is greater than the average for
-- any row in T1
having avg2 > avg1);
id | avg1
-----+--------------------
106 | 4.5000000000000000
(1 row)
then let's move some of the logic inside the subquery, getting rid of the 'not' :
(2)
-- select rows from foo
select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
-- where we do have rows from T2
having exists (
-- select rows from foo
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
-- where the average score is less than or equal than the average for any row in T1
having avg2 <= avg1);
-- I think this expression will be true for all rows as we are in effect doing a
--cartesian join
-- with the 'having' only we don't display the cartesian row set
id | avg1
-----+--------------------
106 | 4.5000000000000000
107 | 4.0000000000000000
(2 rows)
so you have got to ask yourself -- what do you actually mean when you do this correlated subquery inside a having clause, if it evaluates every row against every row from the primary query we are making a cartesian join and I don't think we should be pointing fingers at the SQL engine.
if you want every row that is less than the maximum average What you should be saying is:
select T1.id, avg(T1.score) avg1
from foo T1 group by T1.id
having avg1 not in
(select max(avg1) from (select id,avg(score) avg1 from foo group by id))

Have you tried this version? :
select T1.id, avg(T1.score) avg1
from foo T1
group by T1.id
having not exists (
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
having avg(T2.score) > avg(T1.score));
Also this one (which should be giving same results):
select T1.*
from
( select id, avg(score) avg1
from foo
group by id
) T1
where not exists (
select T2.id, avg(T2.score) avg2
from foo T2
group by T2.id
having avg(T2.score) > avg1);
The query can also be handled with derived tables, instead of subquery in HAVING clause:
select ta.id, ta.avg1
from
( select id, avg(score) avg1
from foo
group by id
) ta
JOIN
( select avg(score) avg1
from foo
group by id
order by avg1 DESC
LIMIT 1
) tmp
ON tmp.avg1 = ta.avg1

Related

MySQL: How to find sequence of values in column

I have long list of rows with random values:
| id | value |
|----|-------|
| 1 | abcd |
| 2 | qwer |
| 3 | jklm |
| 4 | yxcv |
| 5 | tzui |
Then I have an array of few values:
array('qwer', 'jklm');
And I need to know, if this sequence of values from array already exists in table in given order. In this case the sequence of values exists.
I tried to concat all values from table and array and match two strings, which works great with few rows but there are actually hundred of thousand of rows in table. I believe there should be a better solution.
If your list is short, you could just do a self-join and spell out the conditions for each joined table reference:
select t1.id from MyTable as t1 join MyTable as t2
where t1.value='qwer' and t2.value='jklm' and t1.id=t2.id-1;
This returns an empty set if there's no such sequence. And of course it assumes that the id numbers are consecutive (they are in your example, but in general that's a risky assumption).
This doesn't work well if your list gets really long. There's a hard limit of 63 table references MySQL supports in a single query.
Here's another solution, which works for any size list, but only if your id values are known to be consecutive:
select t1.id from MyTable as t1 join MyTable as t2
on t2.id between t1.id and t1.id+1
where t1.value = 'qwer' and t2.value in ('qwer','jklm')
group by t1.id
having group_concat(t2.value order by t2.id) = 'qwer,jklm';
The t1 row is the beginning of the potential matching sequence of rows, so it must match the first value in your list.
Then join to the t2 rows, which are the complete set of potentially matching rows.
The set of t2 rows is also limited to a set no more than N rows, based on the size of your list of N values you're searching for. But SQL has no way of making a group based on the number of rows, we can only limit based on some value in the row. So that's why this works if your id values can be assumed to be consecutive.
This way you can do it for the whole set:
select value1, value2
from
(
select *
from (
SELECT [IMEPAC] value1 , ROW_NUMBER() over(order by [MATBR]) rn1
FROM [PACM]
) a1 join
(
SELECT [IMEPAC] value2 , ROW_NUMBER() over(order by [MATBR]) rn2
FROM [PACM]
) a2 on a1.rn1 = a2.rn2 + 1
) a
group by value1, value2
having count(*) > 1
It is written for MS SQL but you can easily rewrite it to fit mysql too.
I run this against table with > 400000 rows on IMEPAC which is not part of any index and it run (first and only once) for 6 sec.
Here is Mysql version:
select value1, value2, count(*) count
from
(
select *
from (
SELECT #row_number1:= #row_number1 + 1 AS rn1, content as value1
FROM docs,(SELECT #row_number1:=0) AS t
order by id
) a1 join
(
SELECT #row_number2:= #row_number2 + 1 AS rn2, content value2
FROM docs,(SELECT #row_number2:=0) AS t
order by id
) a2 on a1.rn1 = a2.rn2 + 1
) a
group by value1, value2
having count(*) > 1;
SQL Fiddle here

SQL Group_concat not get all data

I have 2 table and second table use relationship
table1
id name
---------
1 alpha
2 beta
table2
id name relation
-------------------
1 2015 2
2 2016 2
3 2017 2
4 2018 2
I want to see
name data
-------------------------
beta 2015,2016,2017,2018
alpha NULL
I tried the following sql query but the output is not what I wanted
I use:
SELECT
t1.name,
GROUP_CONCAT(t2.name SEPARATOR ',')
FROM table1 AS t1
LEFT JOIN table2 AS t2
ON t2.relation = t1.id
Output:
alpha 2015,2016,2017,2018
Alpha doesn't get any value in the other related tablature. the values in the output belong to the beta.
You need GROUP BY:
SELECT t1.name,
GROUP_CONCAT(t2.name SEPARATOR ',')
FROM table1 t1 LEFT JOIN
table2 t2
ON t2.relation = t1.id
GROUP BY t1.name;
In most databases (and recent versions of MySQL), your query would fail. It is an aggregation query (because of the GROUP_CONCAT()). But, t1.name is not an argument to an aggregation function and it is not a GROUP BY key.
MySQL does allow this type of query. It returns exactly one row. The value of t1.name on the one row in the result set comes from an arbitrary row.
No FKs for fiddle:
CREATE TABLE Table1 (`id` int, `name` varchar(5)) ;
INSERT INTO Table1
(`id`, `name`)
VALUES
(1, 'alpha'),
(2, 'beta')
;
CREATE TABLE Table2 (`id` int, `name` int, `relation` int);
INSERT INTO Table2
(`id`, `name`, `relation`)
VALUES
(1, 2015, 2),
(2, 2016, 2),
(3, 2017, 2),
(4, 2018, 2)
;
Statement:
SELECT
t1.name,
GROUP_CONCAT(t2.name SEPARATOR ',') -- missing an AS .... => ugly name from fiddle
FROM table1 AS t1
LEFT JOIN table2 AS t2
ON t2.relation = t1.id
group by t1.name
Output:
name GROUP_CONCAT(t2.name SEPARATOR ',')
alpha (null)
beta 2017,2018,2015,2016

Joining two tables with date ranges

I have two SQL tables that contain start and end dates.
Table 1: Name, AddedDate
Table 2: Name, RemovedDate
I'm am looking to join these two tables and dump the data into a temp table to show when a name was added and removed from a list.
The same name may have been added and removed multiple times.
Desired Output Example
- Name, AddedDate, RemovedDate
- Jane, 2017-02-01, 2017-02-03
- Bill, 2017-01-28, (blank)
- Mike, 2017-01-15, 2017-01-19
- Jane, 2017-01-13, 2017-01-14
Can someone please help? Thanks.
Another option is an OUTER APPLY (If SQL Server)
Example
Declare #Table1 table (Name varchar(25),AddedDate date)
Insert Into #Table1 Values
('Jane', '2017-02-01'),
('Bill', '2017-01-28'),
('Mike', '2017-01-15'),
('Jane', '2017-01-13')
Declare #Table2 table (Name varchar(25),RemovedDate date)
Insert Into #Table2 Values
('Jane', '2017-02-03'),
('Mike', '2017-01-19'),
('Jane', '2017-01-14')
Select A.Name
,A.AddedDate
,B.RemovedDate
From #Table1 A
Outer Apply (
Select RemovedDate=min(RemovedDate)
From #Table2
Where Name=A.Name
and RemovedDate>=A.AddedDate
) B
Returns
Name AddedDate RemovedDate
Jane 2017-02-01 2017-02-03
Bill 2017-01-28 NULL
Mike 2017-01-15 2017-01-19
Jane 2017-01-13 2017-01-14
Using correlated subquery to consider only the last time the name was added (and potentially removed)...
select Name,
AddedDate,
(select max(RemovedDate)
from table_2
where Name=q1.Name
and RemovedDate >= q1.AddedDate) as RemovedDate
from (select Name,
max(AddedDate) as AddedDate
from table_1
group by name) as q1
order by AddedDate desc,
Name;
Same correlated subquery approach to show every time a name was added and removed...
select Name,
AddedDate,
(select min(RemovedDate)
from table_2
where Name=q1.Name
and RemovedDate >= t1.AddedDate) as RemovedDate
from table_1 t1
order by AddedDate desc,
Name;
You could use left join on name
select t1.name, t1.AddedDate, t2.RemovedDate
from table1 t1
left join table2 t2 on t1.name = t2.name
order by name, t1.AddedDate, t2.RemovedDate
For SQL Server Use the below script
;WITH CTE
AS
(
SELECT
SeqNo = ROW_NUMBER() OVER(PARTITION BY T1.Name ORDER BY T1.AddedDate DESC,T2.RemovedDate DESC)
T1.Name,
T1.AddedDate,
T2.RemovedDate
FROM Table1 T1
LEFT JOIN Table2 T2
ON LTRIM(RTRIM(T1.name)) = LTRIM(RTRIM(T2.name))
)
SELECT
*
FROM CTE
WHERE SeqNo = 1
If you want all the records for a Name, such as if a name was added and removed multiple times and you want each of the dates, then just execute without the
WHERE SeqNo = 1
part

mysql - Referencing alias for calculation after UNION

Ok, here's the query (pseudo-query):
SELECT *, (t1.field + t2.field) as 'result', (t1.field * t2.field) as result2 from((select as t1 limit 1) UNION ALL (select as t2 limit 1))
I need both rows returned, then do the math on the two fields into the result aliases. I know it's not graceful, but I have to kludge two queries together (the first is the union, and the second is the math)
So, how do I reference and use those two inner aliases? The inner aliases aren't accessible to the outer select.
I have a suspicion there's an obvious solution here that my brain is missing.
When you union two statements together your result is a single resultset. What you'll build:
FROM
(
(SELECT f1, f2 FROM table1 LIMIT 1)
UNION
(SELECT g1, g2 FROM table2 LIMIT 1)
) derived_table_1
This will give you a single result set named derived_table_ with two fields named f1 and f2 respectively. There will be two rows, one from your first SELECT statement and another from your second. The table aliases that you assigned inside your UNION query are no longer referencable. They exist only within their own SELECT statements.
If you have a relationship between Table1 and Table2 then you want a JOIN here:
SELECT
t1.f1 + t2.g1 as result1,
t1.f2 + t2.g2 as result2,
FROM
table1 as t1
INNER JOIN table2 as t2 ON
t1.f1 = t2.g1
If, instead no relationship exists, then you are probably looking for you original, and kludgy, union with a SUM in the SELECT:
SELECT
sum(derived_table_1.f1) as result,
sum(derived_table_1.f2) as result2
FROM
(
(SELECT f1, f2 FROM table1 LIMIT 1)
UNION
(SELECT g1, g2 FROM table2 LIMIT 1)
) derived_table_1
Editted to add a SQLFIDDLE with the last example: http://sqlfiddle.com/#!2/c8707/10
The column names or aliases for the result of a UNION are always determined by the first query. The column names or aliases defined in the subsequent queries of the union are ignored.
Demo:
mysql> create table foo ( a int, b int, c int );
mysql> insert into foo values (1,2,3);
mysql> create table bar (x int, y int, z int);
mysql> insert into bar values (4,5,6);
mysql> select a, b, c from (select a, b, c from foo union select x, y, z from bar) as t;
+------+------+------+
| a | b | c |
+------+------+------+
| 1 | 2 | 3 |
| 4 | 5 | 6 |
+------+------+------+
mysql> select x from (select a, b, c from foo union select x, y, z from bar) as t;
ERROR 1054 (42S22): Unknown column 'x' in 'field list'

Joining table to itself and selecting values that don't match

I want to get all data in id's 1-3 that are NOT in id's > 6
I'm using id's for simplicity, but I'm really using timestamps.
CREATE TABLE test (
id bigint(20) NOT NULL AUTO_INCREMENT,
data varchar(3) NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO test (id, data) VALUES
(1, 'abc'),
(2, 'def'),
(3, 'ghi'),
(4, 'jkl'),
(5, 'mno'),
(6, 'pqr'),
(7, 'def'),
(8, 'vxw'),
(9, 'yz');
One query of the dozens that I've tried.
SELECT
t1.data t1_data,
t2.data t2_data
FROM test t1
JOIN test t2
ON t2.id BETWEEN 1 AND 3
AND t1.id > 6
AND t1.data <> t2.data
So I want to get this result:
+----------+
| data |
+----------+
| abc |
| ghi |
+----------+
SELECT t1.data AS t1_data
FROM test t1
WHERE t1.id BETWEEN 1 AND 3
AND NOT EXISTS(
SELECT *
FROM test t2
WHERE t2.data = t1.data
AND t2.id > 6
);
This is an example of a set-within-sets subquery. I like to approach these using aggregation with the having clause, because this is the most general approach. In your case:
select t1.data
from test t1
group by t.data
having sum(id between 1 and 3) > 0 and
sum(id > 6) = 0;
The conditions in the having clause count the number of rows that meet each condition. The first says that there is at least one row (for a given data) with the id between 1 and 3. The second says there are no rows where the id is greater than 6.
You can use a NOT EXISTS clause:
SELECT DISTINCT t1.data
FROM test t1
WHERE t1.id BETWEEN 1 AND 3
AND NOT EXISTS
(
SELECT 1
FROM test t2
WHERE t2.data = t1.data
AND t2.id > 6
);
I'm using DISTINCT here because I assume it's possible for example to have a data value with id=2 and the same data value with id=3. Remove it as necessary.
There are a couple ways to do it (probably performance-wise an outer join might be best) but conceptually it is this:
SELECT t1.data
FROM test t1
WHERE t1.id < 4
AND t1.data NOT IN
(SELECT t2.data
FROM test t2
WHERE t2.id > 6)
The outer join version would look like this:
SELECT t1.data
FROM test t1 LEFT OUTER JOIN test t2
ON t1.data = t2.data and t1.id < 4 and t2.id > 6
WHERE t2.id IS NULL