mysql when I join same table twice aggregate is wrong - mysql

I basically have a table that holds counts for every date. I want to create a query that gives me the total # of counts over the entire table, as well as the total for yesterday. But when I try to join the table twice, the aggregates are off. Below is how you can replicate the results.
CREATE TABLE a (id int primary key);
CREATE TABLE b (a_id int, b_id int, date date, count int, primary key (a_id,b_id,date));
INSERT INTO a VALUES (1);
INSERT INTO b VALUES (1, 1, UTC_DATE(), 5);
INSERT INTO b VALUES (1, 2, UTC_DATE(), 10);
INSERT INTO b VALUES (1, 1, UTC_DATE()-1, 7);
INSERT INTO b VALUES (1, 2, UTC_DATE()-1, 12);
SELECT A.id,SUM(B.count) AS total_count,SUM(Y.count) AS y FROM a AS A
LEFT JOIN b AS B ON (B.a_id=A.id)
LEFT JOIN b AS Y ON (Y.a_id=A.id AND Y.date=UTC_DATE()-1)
GROUP BY A.id;
Results in:
+----+-------------+------+
| id | total_count | y |
+----+-------------+------+
| 1 | 68 | 76 |
+----+-------------+------+
The correct result should be:
+----+-------------+------+
| id | total_count | y |
+----+-------------+------+
| 1 | 34 | 22 |
+----+-------------+------+
What's going on here? Is this a bug in mysql or am I not understanding how the joins are working.

No, it's not a bug in MySQL.
Your JOIN conditions are generating "duplicate" rows. (Remove the aggregate functions and the GROUP BY, and you'll see what's happening.
That row from table "a" is matching four rows from table "b". That's all fine and good. But when you add the join to the third table ("y"), each row returned from that third "y" table (two rows) is being "matched" to every row from the "b" table... so you wind up with a total of eight rows in your result set. (That's why the "total_count" is getting doubled.)
To get the result set you specify, you don't need to join that table "b" second time. Instead, just use a conditional test to determine whether that "count" should be included in the "y" total or not.
e.g.
SELECT a.id
, SUM(b.count) AS total_count
, SUM(IF(b.date=UTC_DATE()-1 ,b.count,0)) AS y
FROM a a
LEFT
JOIN b b ON (b.a_id=a.id)
GROUP BY a.id;
Note that the MySQL IF expression can be replaced with an equivalent ANSI CASE expression for improved portability:
, SUM(CASE WHEN b.date=UTC_DATE()-1 THEN b.count ELSE 0 END) AS y
If you did want to do JOIN to that "b" table a second time, you would want the JOIN condition to be such that a row from "y" would match, at most, ONE row from "b", so as not to introduce any duplicates. So you'd basically need the join condition to include all of the columns in the primary key.
(Note that the predicates in the join condition for table "y" guarantee that each from from "y" will match no more than ONE row from "b"):
SELECT a.id
, SUM(b.count) AS total_count
, SUM(y.count) AS y
FROM a a
LEFT
JOIN b b
ON b.a_id=a.id
LEFT
JOIN b y
ON y.a_id = b.a_id
AND y.b_id = b.b_id
AND y.date = b.date
AND y.date = UTC_DATE()-1
GROUP BY a.id;
(To get the first statement to return an identical resultset, with a potential NULL in place of a zero, you'd need to replace the '0' constant in the IF expression with 'NULL'.
, SUM(IF(b.date=UTC_DATE()-1 ,b.count,NULL)) AS y

SELECT A.id,b_count AS total_count,y_count as y
FROM a AS A
LEFT JOIN (select a_id,SUM(B.Count) b_count from b
group by B.A_id) AS B1 ON (B1.a_id=A.id)
LEFT JOIN (select a_id,SUM(Count) y_count from b
where date=UTC_DATE()-1
group by B.A_id) AS Y ON (Y.a_id=A.id)
SQLFiddle Demo

Related

Sql Query to find duplicates in 2 columns where the values in first column are same

I have a table where the first column contains States and second column contains Zip Code. I want to find duplicate Zip Codes in the same State. So, the first column can have same values but i need to find the duplicates in the second column that have the same values in the first column.
Table :
+---+----+------+
| Z | A | B |
+---+----+------+
| 1 | GA | 1234 |
| 2 | GA | 321 |
| 3 | GA | 234 |
| 4 | GA | 9890 |
| 5 | GA | 1234 |
+---+----+------+
The query should return the value of the zip code that has a duplicate i.e 1234. I have around 10000+ records.
Thank You.
Try using a GROUP BY query and retain zip codes appearing in duplicate.
SELECT A, B
FROM yourTable
GROUP BY A, B
HAVING COUNT(*) > 1
Note that we can group by state and zip code assuming that a given zip code only appears once, for a given state.
Please try the following...
SELECT Z AS RecordNumber,
tblTable.A AS State,
tblTable.B AS ZipCode
FROM tblTable
JOIN ( SELECT A,
B
FROM tblTable
GROUP BY A,
B
HAVING COUNT( * ) > 1
) AS duplicatesFinder ON tblTable.A = duplicatesFinder.A
AND tblTable.B = duplicatesFinder.B
ORDER BY tblTable.A,
tblTable.B,
Z;
This statement starts with a subquery that selects every unique combination of State and Zip Code that occurs more than once in the source table (which I have called tblTable in the absence of the table's name).
The results of this subquery are then joined to the source table based on shared values of State and Zip Code. This JOIN effectively eliminates all records from the source table that have a unique State / Zip Code combination from our results dataset.
The list of duplicated States / Zip Codes is then returned along with the values of Z associated with each pairing.
If you have any questions or comments, then please feel free to post a Comment accordingly.
Appendix
My code was tetsted against a database created using the following script...
CREATE TABLE tblTable
(
Z INT,
A CHAR( 2 ),
B INT
);
INSERT INTO tblTable ( Z,
A,
B )
VALUES ( 1, 'GA', 1234 ),
( 2, 'GA', 321 ),
( 3, 'GA', 234 ),
( 4, 'GA', 9890 ),
( 5, 'GA', 1234 );
try this:
select A,B, count(CONCAT_WS('',A,B)) as cnt from
(select * from yourtable) as a group by A,B having count(CONCAT_WS('',A,B))>1
result for all duplicate records or more than one records:
GA 1234 2
It sounds like you want both rows returned where duplicates are found. This should work:
with cte1 as (
select
A
,B
,count(1) over (partition by A, B) as counter
from table_name
)
select
A
,B
from cte1
where 1=1
and counter > 1
order by A, B
;
If you want to know how many duplicate rows there are in total, you can select the "counter" field in the final select:
with cte1 as (
select
A
,B
,count(1) over (partition by A, B) as counter
from table_name
)
select
A
,B
,counter
from cte1
where 1=1
and counter > 1
order by A, B
;
You can use below query.
SELECT A, B, COUNT(*)
FROM TABLE_NAME
GROUP BY A, B
HAVING COUNT(*) > 1;

mysql select twice itself again in one table

How to select in 1 query below. This query need re search that's find value to their own loop.
This is different from other sub query , using 1 table only
TAble T
| num| WHOSE
| 1 | A
| 1 | C
| 2 | B
| 2 | C
| 3 | D
Criteria to match records (conditions):
The value in column whose is not C
The value in column num does not match a value for another record in condition 1.
I want to find the record the value 3 in column num (which has D for column whose).
select * from T where whose <> C and ( num is not one of c's)
1 A can not because C has 1
2 B can not because C has 2
3 D is what I want, because it doesn't have C in column whose nor share a value in column num with a record that does have C in the column whose.
First select num of those records where whose is C. Then select those records where whose is not C and also where num is not one of the ones in subquery.
Select * from T where whose <> 'C' and num not in (Select Num from T where whose = 'C' )
Another way to achieve the same result is with a LEFT JOIN on the same table:
SELECT T.*
FROM T
LEFT JOIN T t2 on t2.num = T.num and t2.whose = 'C'
WHERE T.whose <> 'C' AND t2.whose IS NULL
Check it out on this SQL Fiddle, where the result is:
| num | whose |
| 3 | D |
Additionally, a similar way to write the query is to use the NOT EXISTS clause in the WHERE conditions, like this:
SELECT T.* from T
WHERE T.whose <> 'C' AND NOT EXISTS (SELECT 1 FROM T t2 WHERE
t2.num = T.num AND t2.whose = 'C')
Check it out in this SQL fiddle.
To read more about the comparison between EXISTS and LEFT JOIN see this article. In the summary at the end it has the following conclusions:
MySQL can optimize all three methods to do a sort of NESTED LOOPS ANTI JOIN.
...
However, these three methods generate three different plans which are executed by three different pieces of code. The code that executes EXISTS predicate is about 30% less efficient than those that execute index_subquery and LEFT JOIN optimized to use Not exists method.
That's why the best way to search for missing values in MySQL is using a LEFT JOIN / IS NULL or NOT IN rather than NOT EXISTS.

setting flag value base on record exist in another table

Using MySQL.
Below are my table structure.
Table : A
aId(PK) aValue1 aValue2
----------------------------------
1 value-a1 value-b1
2 value-a2 value-b2
3 value-a3 value-b3
4 value-a4 value-b4
Table: B
bId(PK) aId(FK) bValue1 bValue2
-----------------------------------------------------
1 1 val-1 value-1
2 1 val-2 value-2
3 2 val-3 value-3
How I can achieve below result in single query.
I want all records from table:A and flag value against each records if relevant record exist in Table:B
I tried INNER JOIN, LEFT / RIGHT but not helping.
RESULT
aId aValue1 aValue2 bId (flag if record exist Y else N)
-----------------------------------------------
1 value-a1 value-b1 Y
2 value-a2 value-b2 Y
3 value-a3 value-b3 N
4 value-a4 value-b4 N
My Query: This returning me more than 4 rows. which is wrong
SELECT
c.* , if( d.bId is NULL,'N','Y')
from a c
LEFT JOIN b d ON a.aId = d.aId
Just add the condition to print the Y or N flag.
SELECT DISTINCT c.* , IF(d.bId is null, 'Y', 'N')
FROM tableA c
LEFT JOIN tableB d ON a.aId = d.aId

Removing duplicates from result of multiple join on tables with different columns in MySQL

I am trying to make one statement to pull data from 3 related tables (as in they all share a common string index). I am having trouble preventing MySQL from returning the product of two of the tables, making the result set much larger than I want it. Each table has a different number of columns, and I would prefer to not use UNION anyway, because the data in each table is separate.
Here is an example:
Table X is the main table and has fields A B.
Table Y has fields A C D.
Table Z has fields A E F G.
-
My ideal result would have the form:
A1 B1 C1 D1 E1 F1 G1
A1 B2 C2 D2 00 00 00
A2 B3 C3 D3 E2 F2 G2
A2 B4 00 00 E3 F3 G3
etc...
-
Here is the simplest SQL I have tried that shows my problem (that is, it returns the product of Y * Z indexed by data from A:
SELECT DISTINCT *
FROM X
LEFT JOIN Y USING (A)
LEFT JOIN Z USING (A)
-
I have tried adding a group by clause to fields on Y and Z. But, if I only group by one column, it only returns the first result matched with each unique value in that column (ie: A1 C1 E1, A1 C2 E1, A1 C3 E1). And if I group by two columns it returns the product of the two tables again.
I've also tried doing multiple select statements in the query, then joining the resulting tables, but I received the product of the tables as output again.
Basically I want to merge the results of three select statements into a single result, without it giving me all combinations of the data. If I need to, I can resort to doing multiple queries. However, since they all contain a common index, I feel there should be a way to do it in one query that I am missing.
Thanks for any help.
I don't know if I understand your problem, but why are you using a LEFT JOIN? The story sounds more like an INNER JOIN. Nothing here calls for a UNION.
[Edit]
OK, I think I see what you want now. I've never tried what I am about to suggest, and what's more, some DBs don't support it (yet), but I think you want a windowing function.
WITH Y2 AS (SELECT Y.*, ROW_NUMBER() OVER (PARTITION BY A) AS YROW FROM Y),
Z2 AS (SELECT Z.*, ROW_NUMBER() OVER (PARTITION BY A) AS ZROW FROM Z)
SELECT COALESCE(Y2.A,Z2.A) AS A, Y2.C, Y2.D, Z2.E, Z2.F, Z2.G
FROM Y2 FULL OUTER JOIN Z2 ON Y2.A=Z2.A AND YROW=ZROW;
The idea is to print the list in as few rows as possible, right? So if A1 has 10 entries in Y and 7 in Z, then we get 10 rows with 3 having NULLs for the Z fields. This works in Postgres. I do not believe this syntax is available in MySQL.
Y:
a | d | c
---+---+----
1 | 1 | -1
1 | 2 | -1
2 | 0 | -1
Z:
a | f | g | e
---+---+---+---
1 | 9 | 9 | 0
2 | 1 | 1 | 0
3 | 0 | 1 | 0
Output of statement above:
a | c | d | e | f | g
---+----+---+---+---+---
1 | -1 | 1 | 0 | 9 | 9
1 | -1 | 2 | | |
2 | -1 | 0 | 0 | 1 | 1
3 | | | 0 | 0 | 1
Yep, UNION is not the answer.
I'm thinking you want:
SELECT *
FROM x
JOIN y ON x.a = y.a
JOIN z ON x.a = z.a
GROUB BY x.a;
I found a new way editing this post and this can be used to merg two table
according to unique ids.
Try this:
create table y
(
a int,
d int,
c int
)
create table z
(
a int,
f int,
g int,
e int
)
go
insert into y values(1,1,-1)
insert into y values(1,2,-1)
insert into y values(2,0,-1)
insert into z values(1,9,9,0)
insert into z values(2,1,1,0)
insert into z values(3,0,1,0)
go
select * from y
select * from z
WITH Y2 AS (SELECT Y.*, ROW_NUMBER() OVER (ORDER BY A) AS YROW FROM Y where A = 3),
Z2 AS (SELECT Z.*, ROW_NUMBER() OVER (ORDER BY A) AS ZROW FROM Z where A = 3)
SELECT COALESCE(Y2.A,Z2.A) AS A, Y2.C, Y2.D, Z2.E, Z2.F, Z2.G
FROM Y2 FULL OUTER JOIN Z2 ON Y2.A=Z2.A AND YROW=ZROW;
PostgreSQL is always the right answer to most MySQL issues, but your problem could have been solved this way :
The issue you experienced was that you had two left joins, i.e.
A left join X left join Y which inevitably gives you A x X x Y where you wanted (AxX)x(AxY)
A simple solution could be :
select x.A,x.B,x.C,x.D,y.E,y.F,y.G from (SELECT A.A,A.B,X.C,X.D FROM A LEFT JOIN X ON A.A=X.A) x INNER JOIN (SELECT A.A,Y.E,Y.F,Y.G FROM A LEFT JOIN Y ON A.A=Y.A) y ON x.A=y.A
For the test details :
CREATE TABLE A (A varchar(3),B varchar(3));
CREATE TABLE X (A varchar(3),C varchar(3), D varchar(3));
CREATE TABLE Y (A varchar(3),E varchar(3), F varchar(3), G varchar(3));
INSERT INTO A(A,B) VALUES ('A1','B1'), ('A2','B2'), ('A3','B3'), ('A4','B4');
INSERT INTO X(A,C,D) VALUES ('A1','C1','D1'), ('A3','C3','D3'), ('A4','C4','D4');
INSERT INTO Y(A,E,F,G) VALUES ('A1','E1','F1','G1'), ('A2','E2','F2','G2'), ('A4','E4','F4','G4');
select x.A,x.B,x.C,x.D,y.E,y.F,y.G from (SELECT A.A,A.B,X.C,X.D FROM A LEFT JOIN X ON A.A=X.A) x INNER JOIN (SELECT A.A,Y.E,Y.F,Y.G FROM A LEFT JOIN Y ON A.A=Y.A) y ON x.A=y.A
As a summary, yes MySQL has many many many issues, but this is not one of them - most of the issues concern more advanced stuff.
If I understand correctly, table X has a 1:n relationship with both tables Y and Z. So, the behaviour you see is expected. The result you get is a kind of Cross Product.
If X has Person data, Y has Address data for those persons and Z has Phone data for those persons, then it's natural your query to show all combinations of addresses and phones for every person. If someone has 3 addresses and 4 phones in your tables, then the query shows 12 rows in the result.
You could avoid it by either using a UNION query or issuing two queries:
SELECT X.*
, Y.*
FROM X
LEFT JOIN Y
ON Y.A = X.A
and:
SELECT X.*
, Z.*
FROM X
LEFT JOIN Z
ON Z.A = X.A

MySQL cartesian product conditions

I need alittle help with a mysql query.
I have 3 tables
x1 x2 x3
1 1 1
2 2 2
2 2 2
and I have 2 joins
select distinct
x1.int1 as a,
x2.int1 as b,
x3.int1 as c
from
x1
JOIN
x2
JOIN
x3
but I would like to generate the cartesian product with the condition that the results
should contain just the just the 3 numbers from x1 (1,2,2) in all orders and I don't know what condition to put in the query
it's a permutation simulation of three elements(1,2,2)
result should be
1,2,2
2,1,2
2,2,1
Thanks
Is that you want ?
SELECT DISTINCT * FROM x1 A,x1 B,x1 C
There are a number of ways to get the result you are after, given that the permutations are always made up of (1,2,2).
The simplest is to create a table containing the permutuations:
create table perm ( `int1` int, `int2` int, `int3` int );
insert into perm values (1,2,2), (2,1,2), (2,2,1);
Another is to take your existing joins, and restrict the answers to the set of valid permutations:
select distinct
x1.int1 as a,
x2.int1 as b,
x3.int1 as c
from x1
JOIN x2
JOIN x3
WHERE (a=1 and b=2 and c=2)
OR (a=2 and b=1 and c=2)
OR (a=2 and b=2 and c=1);
Another is to add the permutations table into the join:
select distinct
x1.int1 as a,
x2.int1 as b,
x3.int1 as c
from x1
JOIN x2
JOIN x3
JOIN perm p on p.`int1` = a and p.`int2` = b and p.`int3` = c
Another approach would be to join against table x1 twice more, ensuring that every row in x1 appears in each result:
select distinct
c1.int1 as a,
x2.int1 as b,
x3.int1 as c
from x1 as c1
JOIN x2
JOIN x3
JOIN x1 as c2 on c2.`int1` = b and c2.`int1` != c1.`int1` and c2.`int1` != c3.`int1`
JOIN x1 as c3 on c3.`int1` = c and c3.`int1` != c1.`int1` and c3.`int1` != c2.`int1`
... but this will not work given that value 2 appears in x1 twice. Some unique per-row value would be needed to distinguish one row containing 2 from another.
The permutation table is easiest.
Second attempt - following clarification of question.
create table p ( bit int not null, v int not null );
insert into p values (1,1), (2,2), (4,2);
select distinct p1.v, p2.v, p3.v
from p as p1 join p as p2 join p as p3
where p1.bit + p2.bit + p3.bit = 7;
Column 'v' holds the values you want to permute ie. 1,2,2.
The important part is that column 'bit' must be assigned a unique value for each row, and the set of values must be such that the sum can only be arrived at if every row appears once and only once.
The simplest set of values to satisfy this is the sequence 2^0, 2^1 .. 2^31. You are limited to 32 rows for a 32-bit int. For a table with 3 rows, the sum is 1+2+4=7.
The result is:
+---+---+---+
| v | v | v |
+---+---+---+
| 2 | 2 | 1 |
| 2 | 1 | 2 |
| 1 | 2 | 2 |
+---+---+---+
If more rows are added, more joins have to be added to the query, and the sum of the bit column recalculated.