Join two tables and remove duplicates - mysql

I'm trying to join two tables. Where table2 has duplates.
The tables look something like
CREATE TABLE ta
(
id int,
cno varchar(30),
d1 varchar(30),
d2 int
);
CREATE TABLE tb
(
id int,
cno varchar(30),
cn1 varchar(30),
cn2 int
);
INSERT INTO ta
(id, cno, d1, d2)
VALUES
(1, '1234','a',2),
(2, '6456','j',3),
(3, '5456','h',4),
(4, '4454','g',5);
INSERT INTO tb
(id, cno, cn1, cn2)
VALUES
(1, '1234', 'a', 21),
(1, '1234', 'a', 22),
(2, '6456', 'b', 33),
(2, '6456', 'c', 34),
(2, '6456', 'c', 35),
(3, '5456', 'c', 36),
(4, '4454', 'c', 37);
I was able to get the result http://sqlfiddle.com/#!2/b282e3/1 in MySQL. However when I run it in Postgresql I get an error http://sqlfiddle.com/#!15/b282e/4
Output should be like http://sqlfiddle.com/#!2/b282e3/1
CNO CN1 CN2 D1 D2
1234 a 21 a 2
4454 c 37 g 5
5456 c 36 h 4
6456 b 33 j 3
Any alternatives for this in Psql?

Use aggregate functions for columns that are not used in GROUP BY:
select t2.cno,
min(t2.cn1) as a,
min(t2.cn2) as b,
min(t1.d1) as c,
min(t1.d2) as d
from ta as t1
inner join tb as t2
on t1.cno=t2.cno
group by t2.cno
http://sqlfiddle.com/#!15/b282e/23

This query in MySQL:
select t2.cno, t2.cn1, t2.cn2, t1.d1, t1.d2
from ta t1 inner join
tb t2
on t1.cno = t2.cno
group by t2.cno;
Is not valid SQL (according to the standard or other databases). The problem is that there are columns in the select that are neither in the group by nor are they arguments to aggregation functions (and they are not "functionally dependent" either). Your use of the group by extension in MySQL is officially discouraged. You can read the documentation about it here.
Ironically, Postgres has an extension called distinct on that does something similar. The syntax is:
select distinct on (t2.cno) t2.cno, t2.cn1, t2.cn2, t1.d1, t1.d2
from ta t1 inner join
tb t2
on t1.cno = t2.cno
order by t2.cno;
distinct on takes a list in parentheses and returns one row per value in the parentheses -- taking the first row and ignoring the rest. These columns need to match the columns in the order by, otherwise Postgres generates a compile-time error.
In most other databases, you would do something similar using row_number(). And you can use that as well in Postgres.

select t2.cno, min (t2.cn1), min(t2.cn2), t1.d1 , t1.d2
from ta as t1
inner join tb as t2 on t1.cno=t2.cno
group by t2.cno, t1.d1 , t1.d2

WITH Queries (Common Table Expressions)
with cte as
(
select cno,cn1,cn2 from tb where cn2 in (select min(cn2) from tb group by cno)
),
cte1 as
(
select d1,d2,cno from ta where cno in (select cno from tb where cn2 in (select
min(cn2) from tb group by cno))
)
select cte.cno,cn1,cn2,d1,d2 from cte inner join cte1 on cte1.cno = cte.cno order
by cte.cno

Related

All possible combinations of data in a single table in alphabetical order

I'm trying to write a simple SQL query to show all possible combinations of data in a single table. Here's the table:
id
fruit
1
apple
2
orange
3
pear
4
plum
I've only got as fair as pairing all the data using CROSS JOIN: "apple,orange", "apple,pear" etc.
SELECT t1.fruit, t2.fruit
FROM fruits t1
CROSS JOIN fruits t2
WHERE t1.fruit < t2.fruit
Instead I'm looking for all unique combinations in alphabetical order, e.g.
apple
apple,orange
apple,orange,pear
apple,orange,pear,plum
apple,pear
apple,plum
apple,orange,plum
apple,pear,plum
orange
orange,pear
orange,pear,plum
orange,plum
pear
pear,plum
plum
i.e. as long as a combination exists once, it doesn't need to appear again in a different order, e.g. with apple,orange, there is no need for orange,apple
This should work for any table size.
Result here
Note: this requires MySQL 8+.
-- TABLE
CREATE TABLE IF NOT EXISTS `fruits`
(
`id` int(6) NOT NULL,
`fruit` char(20)
);
INSERT INTO `fruits` VALUES (1, 'apple');
INSERT INTO `fruits` VALUES (2, 'orange');
INSERT INTO `fruits` VALUES (3, 'pear');
INSERT INTO `fruits` VALUES (4 ,'plum');
-- QUERY
WITH RECURSIVE cte ( combination, curr ) AS (
SELECT
CAST(t.fruit AS CHAR(80)),
t.id
FROM
fruits t
UNION ALL
SELECT
CONCAT(c.combination, ', ', CAST( t.fruit AS CHAR(100))),
t.id
FROM
fruits t
INNER JOIN
cte c
ON (c.curr < t.id)
)
SELECT combination FROM cte;
Credit:
Code adapted from this answer
EDIT: This query doesn't give all the possible combinations.
Below query should work:
WITH RECURSIVE cte AS (
SELECT A.id,
CONCAT(A.fruit,',',GROUP_CONCAT(B.fruit ORDER BY B.id)) AS combinations,
COUNT(*) AS count_of_delims
FROM fruits A
INNER JOIN fruits B
ON A.id<B.id
GROUP BY A.id,A.fruit
UNION ALL
SELECT id,
SUBSTRING_INDEX(combinations,',',count_of_delims),
count_of_delims-1
FROM cte
WHERE count_of_delims>0
)
SELECT combinations FROM cte ORDER BY id;
Here is a working example in DB Fiddle.

mySQL query for two disjoint tables

Is there a way to combine two SELECT statements on two disjoint tables (t1, t1) into once SELECT statement? The ideal query statement would return results from both tables when successful or the results if only the t1 query part is successful or the results if only the t2 query part is successful or nothing if the query on t1 and t2 is unsuccessful.
MySQL UNION doesn't work because the tables are disjoint. JOIN doesn't appear to work because if the query fails for one table the entire query fails.
Test case:
create table t1 (
c11 varchar(2),
c12 varchar(2),
c13 varchar(2),
c14 varchar(2),
primary key (c11));
create table t2 (
c21 varchar(2),
c22 varchar(2),
c23 varchar(2),
primary key(c21));
insert into t1 values ('a', 'b', 'c', 'd');
insert into t2 values ('x', 'y', 'z');
Example of what the two distinct SELECT statements:
SELECT c11, c12, c13, c14 from t1 where c11 = 'a'
Returns a, b, c, d
SELECT c21, c22, c23 from t2 where c21 = 'x'
Returns x, y, z
Examples of what I am trying to achieve:
SELECT * (successful query of t1 and t2) where t1.c11 = 'a' and t2.c21 = 'x'
Returns a, b, c, d, x, y, z
SELECT * (successful query of only t1 and not t2) where t1.c11 = 'a' and t2.c21 = 'v'
Returns a, b, c, d
SELECT (successful query of only t2 and not t1) where t1.c11 = 'd' and t2.c21 = 'x'
Returns x, y, z
SELECT (unsuccessful query of both t1 and t2) where t1.c11 = 'd' and t2.c21 = 'v'
Empty set.
You can just join the tables on a condition which always evaluates true, e.g.
select c11,c12,c13,c14,c21,c22,c23 from t1
left join t2 on true
where c21='x' and c11='a';
Result:
Kinda hard for me to understand why you want to do this, but there you go. Note that, without the where clause, this query will give as many results as there are rows in t1 multiplied by the number of rows in t2. So if they both have 10 rows, an unconditional query would return 100 results.

MySQL: filter child records, include all siblings

There are two MySQL tables:
tparent(id int, some data...)
tchild(id int, parent_id int, some data...)
I need to return all columns (parent plus all children) where at least one of the children matches some criteria.
My current solution:
-- prepare sample data
DROP TABLE IF EXISTS tparent;
DROP TABLE IF EXISTS tchild;
CREATE TABLE tparent (id int, c1 varchar(10), c2 date, c3 float);
CREATE TABLE tchild(id int, parent_id int, c4 float, c5 varchar(20), c6 date);
CREATE UNIQUE INDEX tparent_id_IDX USING BTREE ON tparent (id);
CREATE UNIQUE INDEX tchild_id_IDX USING BTREE ON tchild (id);
INSERT INTO tparent
VALUES
(1, 'a', '2021-01-01', 1.23)
, (2, 'b', '2021-02-01', 1.32)
, (3, 'c', '2021-01-03', 2.31);
INSERT INTO tchild
VALUES
(10, 1, 22.333, 'argh1', '2000-01-01')
, (20, 1, 33.222, 'argh2', '2000-01-02')
, (30, 1, 44.555, 'argh3', '2000-02-02')
, (40, 2, 33.222, 'argh4', '2000-03-02')
, (50, 3, 33.222, 'argh5', '2000-04-02')
, (60, 3, 33.222, 'argh6', '2000-05-02');
-- the query
WITH parent_filter AS
(
SELECT
parent_id
FROM
tchild
WHERE
c4>44
)
SELECT
p.*,
c.*
FROM
tparent p
JOIN tchild c ON p.id = c.parent_id
JOIN parent_filter pf ON p.id = pf.parent_id;
It returns 3 rows for parent id 1 and child ids 10, 20, 30, because child id 30 has a matching record. It does not return data for any other parent id.
However, I am querying tchild twice here (first in the CTE, then again in the main query). As both tables are relatively big (10s - 100s millions of rows, 2-5 child records per parent record on average), I am hitting performance / timing issues.
Is there a better way of achieving this filtering? I.e. without having to query tchild table more than once?
did you try this version?
SELECT *
FROM tparent p
JOIN tchild c ON p.id = c.parent_id AND <criteria>
this way you limit the tchild table with the createria before the actual join
Perhaps you can use this instead:
select p.*, c.*
from tparent p
join tchild c
on p.id = c.parent_id
where exists (select 1 from tchild where <crtiteria>)
This should retrieve all rows for parent and child join when there is at least one record in the child table meeting the criteria.

Delete all duplicate rows in mysql

i have MySQL data which is imported from csv file and have multiple duplicate files on it,
I picked all non duplicates using Distinct feature.
Now i need to delete all duplicates using SQL command.
Note i don't need any duplicates i just need to fetch only noon duplicates
thanks.
for example if number 0123332546666 is repeated 11 time i want to delete 12 of them.
Mysql table format
ID, PhoneNumber
Just COUNT the number of duplicates (with GROUP BY) and filter by HAVING. Then supply the query result to DELETE statement:
DELETE FROM Table1 WHERE PhoneNumber IN (SELECT a.PhoneNumber FROM (
SELECT COUNT(*) AS cnt, PhoneNumber FROM Table1 GROUP BY PhoneNumber HAVING cnt>1
) AS a);
http://sqlfiddle.com/#!9/a012d21/1
complete fiddle:
schema:
CREATE TABLE Table1
(`ID` int, `PhoneNumber` int)
;
INSERT INTO Table1
(`ID`, `PhoneNumber`)
VALUES
(1, 888),
(2, 888),
(3, 888),
(4, 889),
(5, 889),
(6, 111),
(7, 222),
(8, 333),
(9, 444)
;
delete query:
DELETE FROM Table1 WHERE PhoneNumber IN (SELECT a.PhoneNumber FROM (
SELECT COUNT(*) AS cnt, PhoneNumber FROM Table1 GROUP BY PhoneNumber HAVING cnt>1
) AS a);
you could try using a left join with the subquery for min id related to each phonenumber ad delete where not match
delete m
from m_table m
left join (
select min(id), PhoneNumber
from m_table
group by PhoneNumber
) t on t.id = m.id
where t.PhoneNumber is null
otherwise if you want delete all the duplicates without mantain at least a single row you could use
delete m
from m_table m
INNER join (
select PhoneNumber
from m_table
group by PhoneNumber
having count(*) > 1
) t on t.PhoneNumber= m.PhoneNumber
Instead of deleting from the table, I would suggest creating a new one:
create table table2 as
select min(id) as id, phonenumber
from table1
group by phonenumber
having count(*) = 1;
Why? Deleting rows has a lot of overhead. If you are bringing the data in from an external source, then treat the first landing table as a staging table and the second as the final table.

Building SQL Join for complement of data. T-SQL help needed

Assume I have data as,
declare #TableA table
(
TableAID int,
TableAName varchar(10)
)
declare #TableB table
(
TableBID int,
TableBName varchar(10),
TableAID int
)
insert into #TableA values
(1, 'A 1'),
(2, 'A 2'),
(3, 'A 3')
insert into #TableB values
(1, 'B 1', 1),
(2, 'B 2', 2)
I want to write a join and NOT SQL query which returns me data just as shown below,
TableAName TableBName
---------- ----------
A 3 N/A
In short get a complement of the view with Inner Joins!
This is a classic use for an OUTER JOIN and most commonly this is done using a LEFT OUTER JOIN (commonly abbreviated to just LEFT JOIN)
SELECT A.TableAName, B.TableBName
FROM TableA A
LEFT JOIN TableB B on A.TableAID = B.TableAID
WHERE B.TableAID IS NULL
An outer join allows unequal record numbers, here TableA has 3 but TableB has 2. When there is no matching data in TableB NULLs will exist, and hence you can filter for NULL as shown above.
Please do yourself a favour, go here for a visual representation of joins
& look for Left Excluding JOIN