mysql - two left joins - double counting - mysql

Long time user, first time poster. I've found similar questions/answers, typically involving subqueries, but I'm not sure how to apply to my situation.
I have 3 tables:
table1
id
table2
id | val (each id has 1 of 3 possible values)
table3
id | val (each id has 1 of 3 possible values)
EDIT: Example: (table1 = unique id of everyone who attended a theme park; table2 = which attraction each visitor visited first; table3 = which attraction each visitor visited second).
I want to write a query to look up 7 different counts:
(1) count of the unique ids in table1
(2) count of the number of ids that have each of the possible values in table2
(3) count of the number of ids that have each of the possible values in table3
My MySQL query:
SELECT
count(DISTINCT table1.id) AS x1,
SUM(IF(table2.val='1'),1,0)) AS x2,
SUM(IF(table2.val='2'),1,0)) AS x3,
SUM(IF(table2.val='3'),1,0)) AS x4,
SUM(IF(table3.val='1'),1,0)) AS x5,
SUM(IF(table3.val='2'),1,0)) AS x6,
SUM(IF(table3.val='3'),1,0)) AS x7
FROM
table1
LEFT JOIN
table2 ON table1.id=table2.id
LEFT JOIN
table3 ON table1.id=table3.id
Results:
x1 = correct (because of DISTINCT)
x2,x3,x4 = correct
x5,x6,x7 = TWICE the number they should be (because I'm getting cartesian product?)
Any suggestions?

You are getting a Cartesian result. Since you are not showing how many "1", "2" or "3" counts per "ID", just do a select sum() from those tables by themselves. Since a sum with no group by will always result in ONE record, you don't need any join and it will pull the results of one record per each summary with no Cartesian result. Since your original query was LEFT JOIN to the others, the ID would have already existed on table 1, so why re-query count distinct in each sub-table.
SELECT
SumForTable1.x1,
SumForTable2.x2,
SumForTable2.x3,
SumForTable2.x4,
SumForTable3.x5,
SumForTable3.x6,
SumForTable3.x7
FROM
( select count(DISTINCT table1.id) AS x1
from table1 ) SumForTable1,
( select SUM(IF(table2.val='1'), 1, 0)) AS x2,
SUM(IF(table2.val='2'), 1, 0)) AS x3,
SUM(IF(table2.val='3'), 1, 0)) AS x4
from table2 ) SumForTable2,
( select SUM(IF(table3.val='1'), 1, 0)) AS x5,
SUM(IF(table3.val='2'), 1, 0)) AS x6,
SUM(IF(table3.val='3'), 1, 0)) AS x7
from table3 ) SumForTable3

My guess is you issue is that id is not unique in table1. So even though it is unique in table2/3 (according to your description) each row in table2/3 is joined to two rows in table1 and thus counted twice. Has nothing to do with the left joins, normal inner joins would have the same issue.
If mysql (which I don't know real well) lets you do inline views like oracle does, then you can fix it by writing your query as:
SELECT
count(view1.id) AS x1,
SUM(IF(table2.val='1'),1,0)) AS x2,
SUM(IF(table2.val='2'),1,0)) AS x3,
SUM(IF(table2.val='3'),1,0)) AS x4,
SUM(IF(table3.val='1'),1,0)) AS x5,
SUM(IF(table3.val='2'),1,0)) AS x6,
SUM(IF(table3.val='3'),1,0)) AS x7
FROM
( SELECT DISTINCT table1.id
FROM table1
) view1
LEFT JOIN
table2 ON view1.id=table2.id
LEFT JOIN
table3 ON view1.id=table3.id

I'd remove duplicates on every table:
SELECT
count(t1.id) AS t1,
SUM(IF(t2.val=1,1,0)) AS t21,
SUM(IF(t2.val=2,1,0)) AS t22,
SUM(IF(t2.val=3,1,0)) AS t23,
SUM(IF(t3.val=1,1,0)) AS t31,
SUM(IF(t3.val=2,1,0)) AS t32,
SUM(IF(t3.val=3,1,0)) AS t33
FROM (SELECT DISTINCT * FROM table1) as t1
JOIN (SELECT DISTINCT * FROM table2) as t2 ON t1.id=t2.id
JOIN (SELECT DISTINCT * FROM table3) as t3 ON t1.id=t3.id;

Related

SQL insert into table from two tables

Let say I have a table: table3 and I want to insert values from table1 and table2 into table3
table1: Columns: hID hName
table2: Columns: aID aName
table3: Columns: hID aID
I want to insert table1 hID and table2 aID into table3 How would I do that?
my current query uses an inner join to attempt to join table1 and table2 but table1 and table2 don't share a common column, so how do I this?
my query:
INSERT INTO table3 (hID, aID)
SELECT hID
FROM table1 as t1
INNER JOIN table2 as t2 ON ??;
Your question is rather artificial since relational databases rarely join tables in this manner. It is, however, possible to do so, even with uneven data sets. You can use ROW_NUMBER() to generate an mock value that you can use to join both tables.
For example:
with
l as (select hID, row_number() over() as rn from table1),
r as (select aID, row_number() over() as rn from table2)
select l.hID, r.aID from l left join r on r.rn = l.rn
union
select l.hID, r.aID from l right join r on r.rn = l.rn
If the left values are 1, 2, 3, 5, and 7, and the right values are 101, 200, and 555, the result it produces is (in no particular order):
hID aID
---- ----
1 101
2 220
3 555
5 null
7 null
See running example at DB Fiddle.
Note: The query could be simpler if MySQL implemented FULL JOIN. Unfortunately it doesn't so it needs to be simulated.

Split or identify records based off number of join/where conditions satisfied

Say I have 2 tables that are joined on 5 fields that exist in both tables:
firstname
lastname
address
city
country
What is the most efficient way in determining records where all match, any 4 match, any 3 match, any 2 match, 1 match, no match.
Originally, I was thinking of just puting all 5 in the join condition with AND so I would get the all 5 match records and 0 match records (I used left join and would check for null on one of the fields in the right table). From the list of Null's I could check for each condition using the same left join and union them all filtering the values with null's or the # of nulls to determine my matches.
I am sure there is a better way in doing this. Any suggestions?
Thanks,
SELECT *, cast(count1 as int)+cast(count2 as int)+cast(count3 as int)+cast(count4 as int)+cast(count5 as int)
FROM(
SELECT t1.*, t2.*,
t1.firstname = t2.firstname as count1,
t1.lastname = t2.lastname as count2,
t1.address = t2.address as count3,
t1.city = t2.city as count4,
c1.country = t2.country as count5
FROM Table1 AS t1
CROSS JOIN Table2 AS t2) t3
Use a CROSS JOIN that creates a full cross-product between the two tables. Then add up the number of columns that match between each pair of records.
SELECT t1.*, t2.*,
(t1.firstname = t2.firstname) + (t1.lastname = t2.lastname) + (t1.address = t2.address) + (t1.city = t2.city) + (c1.country = t2.country) AS num_fields_matching
FROM Table1 AS t1
CROSS JOIN Table2 AS t2

Mysql Counting the same value from 2 tables as 2 different counts

I´d like to count a value which exists in 2 different tables. But it should be displayed seperated like this:
Of course I can count it seperately. But I want to have it in one query in one result. Thanks for your help
SELECT `X1`, COUNT(`X1`) AS Sales FROM `table1` GROUP BY `X1`
SELECT `X1`, COUNT(`X1`) AS Purchases FROM `table2` GROUP BY `X1`
This is a pain, because the set of x1 values in each table may not be the same.
Here is one approach using union all and group by:
select x1, sum(sales) as sales, sum(purchases) as purchases
from ((select x1, count(*) as sales, 0 as purchases
from table1
group by x1
) union all
(select x2, 0, count(*)
from table2
group by x1
)
) t12
group by x1;
Use a subquery to get the counts from one of the tables, and then join that with the other table.
SELECT t1.x1, COUNT(*) AS sales, t2.purchases
FROM table1 AS t1
JOIN (SELECT x1, COUNT(*) AS purchases
FROM table2
GROUP BY x1) AS t2
ON t1.x1 = t2.x1
GROUP BY t1.x1
It might actually be more efficient to do the grouping in two subqueries.
SELECT t1.x1, t1.sales, t2.purchases
FROM (SELECT x1, COUNT(*) AS sales
FROM table1
GROUP BY x1) AS t1
JOIN (SELECT x1, COUNT(*) AS purchases
FROM table2
GROUP BY x1) AS t2
ON t1.x1 = t2.x1
However, if there are any x1 values that aren't in both tables, both these queries will leave them out. The ideal solution is FULL OUTER JOIN, but MySQL doesn't have this operation. See Full Outer Join in MySQL for a workaround. Gordon Linoff's answer doesn't have this problem. Or if you have another table that lists all the x1 values, you could use that as the main table, and LEFT JOIN both of the above subqueries with it.

MySQL select fields in one table that are not in another table

I have 2 tables in a MySQL DB:
Table 1 : id, galleryname
Table 2 : galleryid, <many other fields...>
Using PHP, I need to select all rows in Table 1 based on its ID where that id (galleryid) does not appear in Table 2.
Example:
Table 1
1, flowers
2, water
3, mountains
4, winter
Table 2
3, ...
would return these rows from Table 1
1, flowers
2, water
4, winter
I'm not exactly sure how to go about this. I am pretty good at the basics of MySQL but I suspect this is a JOIN or a UNION that is out of my league.
Any help is appreciated.
Try this:
SELECT * FROM table1
WHERE id NOT IN
(SELECT galleryid FROM table2)
or
SELECT * FROM table1
LEFT JOIN table2
ON table1.id = table2.galleryid
WHERE table2.galleryid IS NULL
Left join brings all the t1 records, then filter out those that have t2.galleryid NULL (no records in t2)
SELECT id, galleryname
FROM table1 AS t1
LEFT OUTER JOIN table2 AS t2 ON t1.id = t2.galleryid
WHERE t2.galleryid IS NULL
SELECT * FROM table1
LEFT JOIN table2
ON table1.id = table2.galleryid
WHERE table2.galleryid IS NULL
Someone posted an answer (then deleted it) that gave me ONLY the record that was in both. All others here seemed to give errors but using that original post I made one change and it worked.
Original:
SELECT * FROM table1 INNER JOIN table2 ON galleries.id = images.galleryid
(this gave me just the one that was in both)
Adding the !:
SELECT * FROM table1 INNER JOIN table2 ON galleries.id != images.galleryid
(this gave me what I needed)

Merge 3 structurally identical tables if value in date column exists in all 3

I have a mysql db/server that has 3 tables that are identical in structure:
west, midwest and east.
I would like to create a national table with the sum of the columns of those regional tables, ONLY if the datetime row matches all 3 tables. That way if one hour is missing in a particular table, I don't end up summing 2 regions and calling it national.
Here is how I am thinking to do it:
All 3 tables have a datetime column.
Merge the tables (union?) only if the datetime row exists in all 3 tables.
Aggregate (sum) the columns grouped by datetime column. I would of course be summing all columns which carry int values.
I am not sure how to run a query that would perform this task.
These tables have 11mil rows so an efficient way would be great.
I am also open to other approaches to solve this problem.
I picked the answer from Neil because although the answer would not work if datetime col is not unique i.e. multiple rows in Table1 with the same datetime. Using any other method the performance I got was horrific, hours of query time. I decided to compromise. I created 3 new tables
westh, midwesth and southh.
These 3 new tables are a creation of aggregating the original tables by hour.
I then used Neils second version with a twist:
INNER JOIN Table2 USING (datetime)
While datetime is indexed in my tables that provides superior performance which is a firm criteria for me.
First version:
SELECT T123.dtcol, SUM(T123.intcol) AS intcolsum
FROM (
SELECT Table1.dtcol, Table1.intcol FROM Table1
UNION
SELECT Table2.dtcol, Table2.intcol FROM Table2
UNION
SELECT Table3.dtcol, Table3.intcol FROM Table3
) T123
GROUP BY T123.dtcol
HAVING COUNT(*) = 3
Second version:
SELECT Table1.dtcol, Table1.intcol + Table2.intcol + Table3.intcol AS intcolsum
FROM Table1 T1
INNER JOIN Table2 T2 ON T2.dtcol = T1.dtcol
INNER JOIN Table3 T2 ON T3.dtcol = T1.dtcol
use
SELECT A.dtcol, SUM (A.intcol) intcolsum FROM
(
SELECT 'T1' T, T1.* FROM Table1 T1
UNION
SELECT 'T2' T, T2.* FROM Table2 T2
UNION
SELECT 'T3' T, T3.* FROM Table3 T3
) A
WHERE A.dtcol IN
(
SELECT T1.dtcol
FROM Table1 T1
INNER JOIN Table2 T2 ON T2.dtcol = T1.dtcol
INNER JOIN Table3 T2 ON T3.dtcol = T1.dtcol
)
GROUP BY A.dtcol