Union of Three tables without any duplicates minus one table in Mysql - mysql

Table:2018
No Email
1 Lilly#gmail.com
2 brens#gmail.com
3 susan#gmail.com
4 resh#gmail.com
Table:2017
No Email
1 chitta#gmail.com
2 resh#gmail.com
3 brens#gmail.com
4 minu#gmail.com
Table:2016
No Email
1 brens#gmail.com
2 chitta#gmail.com
3 lisa#gmail.com
4 monay#gmail.com
5 many#gmail.com
Table:2019
No Email
1 brens#gmail.com
2 chitta#gmail.com
3 rinu#gmail.com
4 emma#gmail.com
I need to perform Union of tables 2018,2017,2016 without any duplicate emails minus table 2019 ,Result should look like
RESULT
No Email
1 Lilly#gmail.com
2 susan#gmail.com
3 resh#gmail.com
4 minu#gmail.com
5 lisa#gmail.com
6 monay#gmail.com
7 many#gmail.com
Minus operation is not available in Mysql.
select a.*from(select *from y2018 union select *from y2017 where not exists(select *from y2018 where y2018.email=y2017.email ) union select *from y2016 where not exists(select *from y2018 where y2018.email=y2016.email ))a LEFT OUTER JOIN y2019 b on a.email=b.email where b.email is null ;
This gives the result but does not eliminate the duplicates in (2017 union 2016)
some one please help me

I need to perform Union of tables 2018,2017,2016 without any duplicate
emails minus table 2019
The most easy to simulate/emulate minus/expect is using NOT IN()
Query
SELECT
(#ROW_NUMBER := #ROW_NUMBER + 1) AS 'No'
, unique_email.Email
FROM (
SELECT
DISTINCT
years_merged.Email
FROM (
SELECT
Email
FROM
y2019
UNION ALL
SELECT
Email
FROM
y2018
UNION ALL
SELECT
Email
FROM
y2017
UNION ALL
SELECT
Email
FROM
y2016
) AS years_merged
WHERE
years_merged.Email NOT IN(SELECT y2019.Email FROM y2019 )
ORDER BY
years_merged.Email ASC
) AS unique_email
CROSS JOIN (SELECT #ROW_NUMBER := 0) AS init
ORDER BY
#ROW_NUMBER ASC
Result
| No | Email |
| --- | --------------- |
| 1 | Lilly#gmail.com |
| 2 | lisa#gmail.com |
| 3 | many#gmail.com |
| 4 | minu#gmail.com |
| 5 | monay#gmail.com |
| 6 | resh#gmail.com |
| 7 | susan#gmail.com |
Yes the order is different but this is the best you can do as SQL standards defines SQL tables to be sorted orderless..
see demo
But a simulate/emulate minus/expect using ... LEFT JOIN ... ON ... WHERE .. IS NULL might optimize better vs a NOT IN()
See demo

I find a better method, creating a view and save a portion of the query in it. It helps to reduce the processing time as well.
create view vm as select *from y2018 union select *from y2017 where not exists(select *from y2018 where y2018.email=y2017.email);
SELECT a.*FROM(SELECT * FROM vm union select *from y2016 where not exists(select *from vm where vm.email=y2016.email))a LEFT JOIN y2019 b ON a.email=b.email where b.email is null;
Here 'vm' is the view

Related

Select all records where last n characters in column are not unique

I have bit strange requirement in mysql.
I should select all records from table where last 6 characters are not unique.
for example if I have table:
I should select row 1 and 3 since last 6 letters of this values are not unique.
Do you have any idea how to implement this?
Thank you for help.
I uses a JOIN against a subquery where I count the occurences of each unique combo of n (2 in my example) last chars
SELECT t.*
FROM t
JOIN (SELECT RIGHT(value, 2) r, COUNT(RIGHT(value, 2)) rc
FROM t
GROUP BY r) c ON c.r = RIGHT(value, 2) AND c.rc > 1
Something like that should work:
SELECT `mytable`.*
FROM (SELECT RIGHT(`value`, 6) AS `ending` FROM `mytable` GROUP BY `ending` HAVING COUNT(*) > 1) `grouped`
INNER JOIN `mytable` ON `grouped`.`ending` = RIGHT(`value`, 6)
but it is not fast. This requires a full table scan. Maybe you should rethink your problem.
EDITED: I had a wrong understanding of the question previously and I don't really want to change anything from my initial answer. But if my previous answer is not acceptable in some environment and it might mislead people, I have to correct it anyhow.
SELECT GROUP_CONCAT(id),RIGHT(VALUE,6)
FROM table1
GROUP BY RIGHT(VALUE,6) HAVING COUNT(RIGHT(VALUE,6)) > 1;
Since this question already have good answers, I made my query in a slightly different way. And I've tested with sql_mode=ONLY_FULL_GROUP_BY. ;)
This is what you need: a subquery to get the duplicated right(value,6) and the main query yo get the rows according that condition.
SELECT t.* FROM t WHERE RIGHT(`value`,6) IN (
SELECT RIGHT(`value`,6)
FROM t
GROUP BY RIGHT(`value`,6) HAVING COUNT(*) > 1);
UPDATE
This is the solution to avoid the mysql error in the case you have sql_mode=only_full_group_by
SELECT t.* FROM t WHERE RIGHT(`value`,6) IN (
SELECT DISTINCT right_value FROM (
SELECT RIGHT(`value`,6) AS right_value,
COUNT(*) AS TOT
FROM t
GROUP BY RIGHT(`value`,6) HAVING COUNT(*) > 1) t2
)
Fiddle here
Might be a fast code, as there is no counting involved.
Live test: https://www.db-fiddle.com/f/dBdH9tZd4W6Eac1TCRXZ8U/0
select *
from tbl outr
where not exists
(
select 1 / 0 -- just a proof that this is not evaluated. won't cause division by zero
from tbl inr
where
inr.id <> outr.id
and right(inr.value, 6) = right(outr.value, 6)
)
Output:
| id | value |
| --- | --------------- |
| 2 | aaaaaaaaaaaaaa |
| 4 | aaaaaaaaaaaaaaB |
| 5 | Hello |
The logic is to test other rows that is not equal to the same id of the outer row. If those other rows has same right 6 characters as the outer row, then don't show that outer row.
UPDATE
I misunderstood the OP's intent. It's the reversed. Anyway, just reverse the logic. Use EXISTS instead of NOT EXISTS
Live test: https://www.db-fiddle.com/f/dBdH9tZd4W6Eac1TCRXZ8U/3
select *
from tbl outr
where exists
(
select 1 / 0 -- just a proof that this is not evaluated. won't cause division by zero
from tbl inr
where
inr.id <> outr.id
and right(inr.value, 6) = right(outr.value, 6)
)
Output:
| id | value |
| --- | ----------- |
| 1 | abcdePuzzle |
| 3 | abcPuzzle |
UPDATE
Tested the query. The performance of my answer (correlated EXISTS approach) is not optimal. Just keeping my answer, so others will know what approach to avoid :)
GhostGambler's answer is faster than correlated EXISTS approach. For 5 million rows, his answer takes 2.762 seconds only:
explain analyze
SELECT
tbl.*
FROM
(
SELECT
RIGHT(value, 6) AS ending
FROM
tbl
GROUP BY
ending
HAVING
COUNT(*) > 1
) grouped
JOIN tbl ON grouped.ending = RIGHT(value, 6)
My answer (correlated EXISTS) takes 4.08 seconds:
explain analyze
select *
from tbl outr
where exists
(
select 1 / 0 -- just a proof that this is not evaluated. won't cause division by zero
from tbl inr
where
inr.id <> outr.id
and right(inr.value, 6) = right(outr.value, 6)
)
Straightforward query is the fastest, no join, just plain IN query. 2.722 seconds. It has practically the same performance as JOIN approach since they have the same execution plan. This is kiks73's answer. I just don't know why he made his second answer unnecessarily complicated.
So it's just a matter of taste, or choosing which code is more readable select from in vs select from join
explain analyze
SELECT *
FROM tbl
where right(value, 6) in
(
SELECT
RIGHT(value, 6) AS ending
FROM
tbl
GROUP BY
ending
HAVING
COUNT(*) > 1
)
Result:
Test data used:
CREATE TABLE tbl (
id INTEGER primary key,
value VARCHAR(20)
);
INSERT INTO tbl
(id, value)
VALUES
('1', 'abcdePuzzle'),
('2', 'aaaaaaaaaaaaaa'),
('3', 'abcPuzzle'),
('4', 'aaaaaaaaaaaaaaB'),
('5', 'Hello');
insert into tbl(id, value)
select x.y, 'Puzzle'
from generate_series(6, 5000000) as x(y);
create index ix_tbl__right on tbl(right(value, 6));
Performances without the index, and with index on tbl(right(value, 6)):
JOIN approach:
Without index: 3.805 seconds
With index: 2.762 seconds
IN approach:
Without index: 3.719 seconds
With index: 2.722 seconds
Just a bit neater code (if using MySQL 8.0). Can't guarantee the performance though
Live test: https://www.db-fiddle.com/f/dBdH9tZd4W6Eac1TCRXZ8U/1
select x.*
from
(
select
*,
count(*) over(partition by right(value, 6)) as unique_count
from tbl
) as x
where x.unique_count = 1
Output:
| id | value | unique_count |
| --- | --------------- | ------------ |
| 2 | aaaaaaaaaaaaaa | 1 |
| 4 | aaaaaaaaaaaaaaB | 1 |
| 5 | Hello | 1 |
UPDATE
I misunderstood OP's intent. It's the reversed. Just change the count:
select x.*
from
(
select
*,
count(*) over(partition by right(value, 6)) as unique_count
from tbl
) as x
where x.unique_count > 1
Output:
| id | value | unique_count |
| --- | ----------- | ------------ |
| 1 | abcdePuzzle | 2 |
| 3 | abcPuzzle | 2 |

SQL writing custom query

I need to write a SQL Query which generates the name of the most popular story for each user (according to total reading counts). Here is some sample data:
story_name | user | age | reading_counts
-----------|-------|-----|---------------
story 1 | user1 | 4 | 12
story 2 | user2 | 6 | 14
story 4 | user1 | 4 | 15
This is what I have so far but I don't think it's correct:
Select *
From mytable
where (story_name,reading_counts)
IN (Select id, Max(reading_counts)
FROM mytable
Group BY user
)
In a Derived Table, you can first determine the maximum reading_counts for every user (Group By with Max())
Now, simply join this result-set to the main table on user and reading_counts, to get the row corresponding to maximum reading_counts for a user.
Try the following query:
SELECT
t1.*
FROM mytable AS t1
JOIN
(
SELECT t2.user,
MAX(t2.reading_counts) AS max_count
FROM mytable AS t2
GROUP BY t2.user
) AS dt
ON dt.user = t1.user AND
dt.max_count = t1.reading_counts
SELECT *
FROM mytable
WHERE user IN
(SELECT user, max(reading_counts)
FROM mytable
GROUP BY user)

How to get data from a table even when count(row) is zero? please see the description for more details and the query?

table one
id mandal_name
1 mandal1
2 mandal2
3 mandal3
table address
id mandal_name date
1 mandal1 2017-07-11 12:34:11
2 mandal1 2017-07-11 12:54:45
3 mandal1 2017-07-11 12:23:23
SELECT count(id) as yesterday_count, mandal FROM address WHERE date(date) = '2017-07-11'
Result obviously
3 , mandal1
Expecting result
3 , mandal1
0 , mandal2
0 , mandal3
...
The key is to use an OUTER JOIN - LEFT JOIN in this case.
You can either do
SELECT m.mandal_name, COUNT(a.id) AS yesterday_count
FROM table_one m LEFT JOIN address a
ON m.mandal_name = a.mandal_name
AND a.date >= '2017-07-11'
AND a.date < '2017-07-12'
GROUP BY m.mandal_name;
or
SELECT m.mandal_name, COALESCE(count, 0) AS yesterday_count
FROM table_one m LEFT JOIN (
SELECT mandal_name, COUNT(*) AS count
FROM address
WHERE date >= '2017-07-11'
AND date < '2017-07-12'
) a
ON m.mandal_name = a.mandal_name;
Here is a SQLFiddle demo
Output
| mandal_name | yesterday_count |
|-------------|-----------------|
| mandal1 | 3 |
| mandal2 | 0 |
| mandal3 | 0 |
Further reading - A Visual Explanation of SQL Joins
On a side note - don't use DATE(date) as it makes it impossible to use an index on date column effectively causing a full table scan.
you can query it like this:
SELECT A.mandal_name,IFNULL(COUNT(*),0)
FROM one A
LEFT JOIN address B ON A.mandal_name = B.mandal_name
WHERE DATE(B.date) = '2017-07-11'
GROUP BY A.mandal_name
just substitute your table name and columns to get the result

Find duplicates from same table and constraint them from another table in sql

Oh, my title is not the best one and as English is not my main language maybe someone can fix that instead of downvoting if they've understood the issue here.
Basically i have two tables - tourneyplayers and results. Tourneyplayers is like a side table which gathers together tournament information across multiple tables - results, tournaments, players etc. I want to check duplicates from the results table over column day1_best, from single tournament and return all the tourneyplayers who have duplicates.
Tourneyplayers contain rows:
Tourneyplayers
tp_id | resultid | tourneyid
1 | 2 | 91
2 | 21 | 91
3 | 29 | 91
4 | 1 | 91
5 | 3 | 92
Results contains rows:
Results:
r_id | day1_best
1 | 3
2 | 1
3 | 4
.. | ..
21 | 1
.. | ..
29 | 2
Now tourney with id = 91 has in total 4 results, with id's 1,2,21 and 29. I want to return values which have duplicates, so currently the result would be
Result
tp_id | resultid | day1_best
1 | 2 | 1
2 | 21 | 1
I tried writing something like this:
SELECT *
FROM tourneyplayers
WHERE resultid
IN (
SELECT r1.r_id
FROM results AS r1
INNER JOIN results AS r2 ON ( r1.day1_best = r2.day1_best )
AND (
r1.r_id <> r2.r_id
)
)
AND tourneyid =91
But in addition to values which had the same day1_best it chose two more which did not have the same. How could i improve my SQL or rewrite it?
First you JOIN both tables, so you know how the data looks like.
SELECT *
FROM tourney_players t
JOIN results r
ON t.`resultid` = r.`r_id`;
Then using the same query you GROUP to see what tourneyid, day1_best combination has multiple rows
SELECT `tourneyid`, `day1_best`, count(*) as total
FROM tourney_players t
JOIN results r
ON t.`resultid` = r.`r_id`
GROUP BY `tourneyid`, `day1_best`;
Finally you use the base JOIN and perform a LEFT JOIN to see what rows has a match and show only those rows.
SELECT t.`tp_id`, r.`r_id`, r.`day1_best`
FROM tourney_players t
JOIN results r
ON t.`resultid` = r.`r_id`
LEFT JOIN (SELECT `tourneyid`, `day1_best`, count(*) as total
FROM tourney_players t
JOIN results r
ON t.`resultid` = r.`r_id`
GROUP BY `tourneyid`, `day1_best`
HAVING count(*) > 1) as filter
ON t.`tourneyid` = filter.`tourneyid`
AND r.`day1_best` = filter.`day1_best`
WHERE filter.`tourneyid` IS NOT NULL;
SQL DEMO
OUTPUT
Please try this :
Select tp.tp_id , tp.resultid ,r.day1_best from (Select * from Tourneyplayers
where tourneyid = 91)as tp inner join (select * from Result day1_best in(select
day1_best from result group by day1_best having count(*)>1 ) )as r on tp.resultid
= r.r_id ;

Return NULL for missing values in an IN list

I have a table like this:
id | val
---------
1 | abc
2 | def
5 | xyz
6 | foo
8 | bar
and a query like
SELECT id, val FROM tab WHERE id IN (1,2,3,4,5)
which returns
id | val
---------
1 | abc
2 | def
5 | xyz
Is there a way to make it return NULLs on missing ids, that is
id | val
---------
1 | abc
2 | def
3 | NULL
4 | NULL
5 | xyz
I guess there should be a tricky LEFT JOIN with itself, but can't wrap my head around it.
EDIT: I see people are thinking I want to "fill the gaps" in a sequence, but actually what I want is to substitute NULL for the missing values from the IN list. For example, this
SELECT id, val FROM tab WHERE id IN (1,100,8,200)
should return
id | val
---------
1 | abc
100 | NULL
8 | bar
200 | NULL
Also, the order doesn't matter much.
EDIT2: Just adding a couple of related links:
How to select multiple rows filled with constants?
Is it possible to have a tableless select with multiple rows?
You could use this trick:
SELECT v.id, t.val
FROM
(SELECT 1 AS id
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5) v
LEFT JOIN tab t
ON v.id = t.id
Please see fiddle here.
Yes, you can. But that will be tricky since there are no sequences in MySQL.
I assume you want just any selection, so it's:
SELECT
*
FROM
(SELECT
(two_1.id + two_2.id + two_4.id +
two_8.id + two_16.id) AS id
FROM
(SELECT 0 AS id UNION ALL SELECT 1 AS id) AS two_1
CROSS JOIN (SELECT 0 id UNION ALL SELECT 2 id) AS two_2
CROSS JOIN (SELECT 0 id UNION ALL SELECT 4 id) AS two_4
CROSS JOIN (SELECT 0 id UNION ALL SELECT 8 id) AS two_8
CROSS JOIN (SELECT 0 id UNION ALL SELECT 16 id) AS two_16
) AS sequence
LEFT JOIN
t
ON sequence.id=t.id
WHERE
sequence.id IN (1,2,3,4,5);
(check the fiddle)
It will work as combination of powers of 2 to generate consecutive table of numbers. Your values are passed to WHERE clause, so you can substitute there any set of values.
I would recommend you to use application for this case - because it will be faster. It may have some sense if you want to use this row set somewhere else (i.e. in some other queries) - but if not, it's a work for your application.
If you'll need higher values, add more rows to sequence generator, like in this fiddle.