Joining tables doubles the values - mysql

I have this table:
CREATE TABLE table1 (
id INT NOT NULL PRIMARY KEY,
value1 INT NOT NULL,
value2 INT NOT NULL
);
CREATE TABLE table2 (
id INT NOT NULL PRIMARY KEY,
table1_id INT NOT NULL,
valuex INT NOT NULL
);
INSERT INTO table1 (id, value1, value2)
VALUES
(1, 10, 15),
(2, 5 , 3);
INSERT INTO table2 (id, table1_id, valuex)
VALUES
(1, 1, 15),
(2, 1, 25),
(3, 2, 14),
(4, 2, 10);
With this:
SELECT COUNT(`table1`.`id`) AS `orders`,
SUM(`value1`) as `sum_value1`, SUM(`value2`) as `sum_value2`,
SUM(`valuex`) as `sum_valuex`
FROM `table1`
INNER JOIN `table2`
ON `table1`.`id` = `table2`.`table1_id`
I get the output:
+----------------------------------------------+
+ orders | sum_value1 | sum_value2 |sum_valuex +
+----------------------------------------------+
+ 4 | 30 | 36 | 64 +
+----------------------------------------------+
But I have only two orders in table1. I know the duplication is being made because of the join, but how can I fix that with adding sum_valuex?
My desired result would be:
+----------------------------------------------+
+ orders | sum_value1 | sum_value2 |sum_valuex +
+----------------------------------------------+
+ 2 | 15 | 18 | 64 +
+----------------------------------------------+
EDIT: I can't use select within select

This is how joins work. If you don't want the rows to multiply before the aggregation, then aggregate before doing the join.
SELECT t2.orders, t1.value1, t1.value2, t2.sum_valuex
FROM `table1` INNER JOIN
(SELECT table1_id, SUM(valuex) as sum_valuex, COUNT(*) as orders
FROM table2
GROUP BY table1_id
) t2
ON t1.id = t2.table1_id

Which table has the orders ? Right now, your count(table1.id) is counting the records in table 2. (That's where this column is, in Table 2) if Table1 is the table with the orders in it then you should be counting the records in table 1
SELECT COUNT(distinct a.id) orders,
SUM(value1) as sum_value1`, SUM(value2) as sum_value2,
SUM(valuex) as sum_valuex
FROM table1 a
JOIN table2 b
ON b.table1_id = a.id

you can get the desire result like this :
SELECT count(orders ) as orders, SUM(t1.value1) as value1, SUM(t1.value2) as value2, SUM(t2.sum_valuex) as sumvaluex
FROM table1 t1 INNER JOIN
(SELECT table1_id, SUM(valuex) as sum_valuex, COUNT(*) as orders
FROM table2
GROUP BY table1_id
) t2
ON t1.id = t2.table1_id
Please dont forget to mark my answer:)

Related

Preferential Select Query

The issue that we are trying to tackle is best shown with the following illustrative example:
CREATE TABLE table_1
(
id INT UNSIGNED AUTO_INCREMENT,
colA INT,
colB VARCHAR(10),
PRIMARY KEY(id)
);
CREATE TABLE table_2
(
id INT UNSIGNED AUTO_INCREMENT,
colY INT,
colZ VARCHAR(10),
PRIMARY KEY(id)
);
INSERT INTO table_1(colA, colB) VALUES(1, 'NPD5A6V9EI'), (2, 'ISO4IK42YQ'), (4, 'J12QAN4O42'), (6,'V8YTZFHCU4');
INSERT INTO table_2(colY, colZ) VALUES(3, 'RBUNWLO753'), (4, 'X2BCEY7O8B'), (5, 'BNUS7R4225'), (6, '72NOWCTH5G');
We would like to select our result based on the value of colA in table_1 but if that does not return a result , we would like to return our result based on the value of colY in table_2. In other words SELECTing from table_2 is the backup for SELECTing from table_1. The query returns NULL only if neither table satisfies the condition.
A pseudo SQL query could be:
SELECT colB FROM table_1 where colA = 3 OR SELECT colZ FROM table_2 where colY = 3;
The query should return output based on the following I/O table:
I O
= =
1 NPD5A6V9EI -- From table_1
2 ISO4IK42YQ -- From table_1
3 RBUNWLO753 -- From table_2
4 J12QAN4O42 -- From table_1 (has precedence over table_2 entry)
5 BNUS7R4225 -- From table_2
6 V8YTZFHCU4 -- From table_1 (has precedence over table_2 entry)
9 NULL
Kindly suggest solutions that:
make use of the latest DB features (for posterity)
work with MySQL version 5.6.51 (for our application)
Write a subquery that generates all the I rows that you want.
Then left join this with the two tables, and use IFNULL to take the matching value from table_1 in preference to table_2.
SELECT ids.id AS I, IFNULL(t1.colB, t2.colZ) AS O
FROM (SELECT 1 AS id UNION ALL SELECT 2 UNION ALL SELECT 3 ... UNION ALL SELECT 9) AS ids
LEFT JOIN table_1 AS t1 ON t1.colA = ids.id
LEFT JOIN table_2 AS t2 ON t2.colY = ids.id
ORDER BY ids.id
I simply don't kn ow where you get your last row.
also with Myql 8 you can ise the window function ROW_NUMBER
the rest is self explantory, the sorting comes from colA and Col1, when there are teh same numbers the second column orderby2 comes and sorts first for the first table
CREATE TABLE table_1
(
id INT UNSIGNED AUTO_INCREMENT,
colA INT,
colB VARCHAR(10),
PRIMARY KEY(id)
);
CREATE TABLE table_2
(
id INT UNSIGNED AUTO_INCREMENT,
colY INT,
colZ VARCHAR(10),
PRIMARY KEY(id)
);
INSERT INTO table_1(colA, colB) VALUES(1, 'NPD5A6V9EI'), (2, 'ISO4IK42YQ'), (4, 'J12QAN4O42'), (6,'V8YTZFHCU4');
INSERT INTO table_2(colY, colZ) VALUES(3, 'RBUNWLO753'), (4, 'X2BCEY7O8B'), (5, 'BNUS7R4225'), (6, '72NOWCTH5G');
SELECT #i := #i +1 AS I,
colB AS O
FROM
(SELECT colA as orderby1,colB,1 ordberby2 froM table_1
UNION
SELECT colY, colZ,2 froM table_2 ) t1,(SELECT #i := 0) t2
ORDER BY orderby1,ordberby2
I | O
-: | :---------
1 | NPD5A6V9EI
2 | ISO4IK42YQ
3 | RBUNWLO753
4 | J12QAN4O42
5 | X2BCEY7O8B
6 | BNUS7R4225
7 | V8YTZFHCU4
8 | 72NOWCTH5G
db<>fiddle here

result of selection is array and I want to use it to "where in" another selection

I have two tables (t1 & t2):
t1 (second column is array)
name | code
ee | 123, 124, 125
ef | 121, 123
______________________
t2
code_id | code_desc
121 | xxxxx
123 | yyyyyyy
124 | xxxxxxxx
if I do this query, all is ok:
SELECT * FROM t2 where code_id in (121,122)
but if I do this query I got NULL cell / result
SELECT * FROM t2 where code_id in (SELECT code FROM t1 where name = ee)
How can I get from one query all the info from two table?
Here is the code, I cant find a good sql online tool
CREATE TABLE t1 (name VARCHAR(200), codes VARCHAR(200));
CREATE TABLE t2 (codes_id VARCHAR(200), codes_desc VARCHAR(200));
INSERT INTO t1 (name, codes) VALUES ('ee', '123,124,125');
INSERT INTO t1 (name, codes) VALUES ('ef', '121,124');
INSERT INTO t1 (name, codes) VALUES ('eh', '123,124,125');
INSERT INTO t2 (codes_id, codes_desc) VALUES ('121', 'yyyyyyyyy');
INSERT INTO t2 (codes_id, codes_desc) VALUES ('122', 'xxxxxxxxx');
INSERT INTO t2 (codes_id, codes_desc) VALUES ('123', 'zzzzzzzzzzz');
SELECT * FROM t2 where code_id in (121,122)
SELECT * FROM t2 where code_id in (SELECT codes FROM t1 where name = 'ee')
You can use find_in_set function:
select *
from t2
where exists (
select 1
from t1
where name = 'ee'
and find_in_set(t2.code_id, t1.code) > 0
)
I'll advise you to normalize your table structure though. Because even though the above query is working, it is non-sargable.

UPDATE FROM SELECT with foreign key on parent with one query

How to update (change from first select table value second) second_table.first_table_id if first_table.email match in both select.
If it even possible. With one query!
----------------------------------------- UPDATE -----------------------------------------
EXAMPLE:
I need to update foreign key of second table if email field match in first table. I need to compare two query results with different parent_id (parents are in in same table with children)
table_1
-------------------------
| id | parent_id | email |
-------------------------
1 NULL NULL
2 NULL NULL
3 1 joe#m.ru
4 2 bob#f.ly
5 1 bob#f.ly
6 2 kira#.us
table_2
----------------
| id | first_id |
----------------
1 3
2 4
3 5
4 6
I have two parents with ids 1 and 2 and some children (ids: 3,4,5,6).
Also, keep in mind: 1 - old, 2 - new
Task: change foreign key in second table if children email with parent_id = 1 and chilren email with parent_id = 2 match (are the same).
In our example in second table row with id = 3 its foreign key field - first_id has to change from 5 to 4.
Following might get you started
UPDATE Table_2 t2u
SET first_id = (
SELECT t2.first_id
FROM Table_2 t2
INNER JOIN Table_1 t1 ON t1.id = t2.first_id
INNER JOIN (
SELECT parent_id = MAX(parent_id), email
FROM Table_1
GROUP BY
email
) t1p ON t1p.email = t1.email
INNER JOIN Table_1 t1i ON t1i.email = t1p.email
AND t1i.parent_id = t1p.parent_id
WHERE t2u.first_id <> t1i.id)
Test script (SQL Server)
;WITH Table_1 (id, parent_id, email) AS (
SELECT 1, NULL, NULL
UNION ALL SELECT 2, NULL, NULL
UNION ALL SELECT 3, 1, 'joe#m.ru'
UNION ALL SELECT 4, 2, 'bob#f.ly'
UNION ALL SELECT 5, 1, 'bob#f.ly'
UNION ALL SELECT 6, 2, 'kira#.us'
)
, Table_2 (id, first_id) AS (
SELECT 1, 3
UNION ALL SELECT 2, 4
UNION ALL SELECT 3, 5
UNION ALL SELECT 4, 6
)
SELECT t2.*, t1i.id as [update with]
FROM Table_2 t2
INNER JOIN Table_1 t1 ON t1.id = t2.first_id
INNER JOIN (
SELECT parent_id = MAX(parent_id), email
FROM Table_1
GROUP BY
email
) t1p ON t1p.email = t1.email
INNER JOIN Table_1 t1i ON t1i.email = t1p.email
AND t1i.parent_id = t1p.parent_id
WHERE t2.first_id <> t1i.id
Output
id first_id update with
----------- ----------- -----------
3 5 4

Update row with average from another table based on the original row

Not sure how to describe what I'm trying to get from this question, but here goes...
I have a table of customer purchases, 't1', with information about the purchase: customer id, date, boolean if the customer was alone, and the purchase amount. In a second table t2 is another list of the same customer ids with a date and a boolean saying whether they were alone.
I want to update the second table with the AVERAGE of values of the previous x purchases they did before that date with whether they were alone.
I setup the tables with:
DROP TABLE IF EXISTS t1;
DROP TABLE IF EXISTS t2;
CREATE TABLE t1 (cid INT, d DATE, i INT, v FLOAT);
INSERT INTO t1 (cid, d,i,v) VALUES (1,'2001-01-01', 0, 10);
INSERT INTO t1 (cid, d,i,v) VALUES (1,'2001-01-02', 1, 20);
INSERT INTO t1 (cid, d,i,v) VALUES (1,'2001-01-03', 1, 30);
INSERT INTO t1 (cid, d,i,v) VALUES (1,'2001-01-04', 1, 40);
INSERT INTO t1 (cid, d,i,v) VALUES (1,'2001-01-05', 0, 50);
INSERT INTO t1 (cid, d,i,v) VALUES (1,'2001-01-06', 0, 60);
INSERT INTO t1 (cid, d,i,v) VALUES (1,'2001-01-07', 0, 70);
INSERT INTO t1 (cid, d,i,v) VALUES (1,'2001-01-08', 1, 80);
INSERT INTO t1 (cid, d,i,v) VALUES (1,'2001-01-09', 0, 90);
INSERT INTO t1 (cid, d,i,v) VALUES (2,'2001-01-04', 1, 35);
CREATE TABLE t2 (cid INT, d DATE, i INT, av2 FLOAT, av3 FLOAT);
INSERT INTO t2 (cid, d,i) VALUES (1,'2001-01-07', 0);
INSERT INTO t2 (cid, d,i) VALUES (1,'2001-01-08', 1);
INSERT INTO t2 (cid, d,i) VALUES (2,'2001-01-08', 0);
INSERT INTO t2 (cid, d,i) VALUES (2,'2001-01-09', 1);
av2 and av3 are the columns where i want the average of the last 2 or 3 transactions. So i need an update statement (two statements really, one for av2 and one for av3) to say "when this customer came in on this date, and they came in alone or not, what was the average of their last x purchases.
So the resulting data should be:
cid d i av2 av3
1 2001-01-07 0 55 40
1 2001-01-08 1 35 40
2 2001-01-08 0 null null
2 2001-01-08 1 35 35
The closest I got was this:
UPDATE t2 SET av=(
SELECT AVG(tcol)
FROM (
SELECT v AS tcol FROM t1 LIMIT 2
) AS tt);
which seems to be going in the right direction (the limit 2 is the 2 or 3 from the av columns. But that just averages x prior purchases (regardless of customer or the boolean). As soon as I put in the WHERE clause to link the data, it chokes:
UPDATE t2 SET av=(
SELECT AVG(tcol)
FROM (
SELECT v AS tcol FROM t1 WHERE t1.d<t2.d and t1.i=t2.i LIMIT 2
) AS tt);
Any ideas? Is there a name for what I'm trying to do? Do I need to describe this differently? Any suggestions?
Thanks,
Philip
Try this one -
SET #r = 0;
SET #cid = NULL;
SET #i = NULL;
UPDATE t2 JOIN (
SELECT t.cid, t.d, t.i, AVG(IF(t.r < 3, t.v, NULL)) av2, AVG(IF(t.r < 4, t.v, NULL)) av3 FROM (
SELECT t.*, IF(#cid = t.cid AND #i = t.i, #r := #r + 1, #r := 1) AS r, #cid := t.cid, #i := t.i FROM (
SELECT t2.*, t1.v FROM t2
JOIN t1
ON t1.cid = t2.cid AND t1.d < t2.d AND t1.i = t2.i
ORDER BY t2.cid, t2.i, t1.d DESC
) t
) t
GROUP BY t.cid, t.i, t.d
) t
ON t2.cid = t.cid AND t2.i = t.i AND t2.d = t.d
SET t2.av2 = t.av2, t2.av3 = t.av3;
SELECT * FROM t2;
+------+------------+------+------+------+
| cid | d | i | av2 | av3 |
+------+------------+------+------+------+
| 1 | 2001-01-07 | 0 | 55 | 40 |
| 1 | 2001-01-08 | 1 | 35 | 30 |
| 2 | 2001-01-08 | 0 | NULL | NULL |
| 2 | 2001-01-09 | 1 | 35 | 35 |
+------+------------+------+------+------+
Note:
The value av3 for cid=1, d=2001-01-08, i=1 should be 30, rigth? ...not 40.

Question on multi-row insertion with subqueries

Say I have the following 2 tables,
CREATE TABLE t1(
name VARCHAR(25) NOT NULL,
time INT,
a INT
);
CREATE TABLE t2(
name VARCHAR(25) NOT NULL,
time INT,
b INT
);
and Im looking to pull all the values (a) out of t1 with a given time, all the values with the previous time (say just time-1 for convenience) then for each name subtract the newer one from the older one and insert those values into t2 with the same time. The slow way of doing this would involve doing something like
SELECT name, a FROM t1 WHERE time = x;
SELECT name, a FROM t1 WHERE time = x-1;
(subtract the as for each name)
INSERT INTO t2 VALUES ....;
From my (limited) understanding of subqueries, there should hopefully be a way to do this all in 1 query. Any ideas? Thanks in advance :)
It looks like you can use the INSERT ... SELECT syntax:
INSERT INTO t2 (name, time, b)
SELECT ta.name, ta.time time, (ta.a - tb.a) b
FROM t1 ta
JOIN t1 tb ON (tb.time = ta.time - 1 AND tb.name = ta.name);
Test case:
INSERT INTO t1 VALUES ('t1', 1, 100);
INSERT INTO t1 VALUES ('t1', 2, 200);
INSERT INTO t1 VALUES ('t1', 3, 500);
INSERT INTO t1 VALUES ('t1', 4, 600);
INSERT INTO t1 VALUES ('t1', 5, 800);
INSERT INTO t1 VALUES ('t1', 6, 900);
Result:
SELECT * FROM t2;
+------+------+------+
| name | time | b |
+------+------+------+
| t1 | 2 | 100 |
| t1 | 3 | 300 |
| t1 | 4 | 100 |
| t1 | 5 | 200 |
| t1 | 6 | 100 |
+------+------+------+
5 rows in set (0.00 sec)
there is im mysql insert ... select
INSERT INTO table ( fields )
SELECT fields FROM table;