I want to create a SELECT that neglects duplicates. The duplicates should be detected by only some columns, while I still want to select all columns of one row.
Example:
CREATE TABLE employee (
id integer,
firstname varchar(100),
lastname varchar(100),
country varchar(100),
salary interger,
//many more fields...
);
select * from employee GROUP BY firstname, lastname, country;
Of course that's invalid sql, but it shows my intention:
If any combination of (firstname, lastname, country) forms a duplicate key, then I only want to select one of those duplicate rows, but all columns of it.
Preferably, out of the duplicates I would want to select the row with the highest value in salary column.
I'm using mysql 8
You can use ROW_NUMBER() to do this. Essentially what you have posted for your grouping instead becomes your partition, and you can then chose how each group is sorted (in the case of the below salary):
select *
from (select *,
row_number() over(partition by firstname, lastname, country
order by salary desc) AS rownum
from employee) AS e
where e.rownum = 1;
You can use row_number() to select unique combinations of firstname, lastname, country:
select t.*
from (select t.*,
row_number() over (partition by firstname, lastname, country order by id) as seqnum
from t
) t
where seqnum = 1;
If you want one row only where there are duplicates, then include a count():
select t.*
from (select t.*,
row_number() over (partition by firstname, lastname, country order by id) as seqnum,
count(*) over (partition by firstname, lastname, country) as cnt
from t
) t
where seqnum = 1 and cnt > 1;
Related
I have a table of Friends (Ann, Bob, Carl) and a table of Fruits (Apple, Banana, Cherry, Date, Fig, Grapefruit)
I need to create an intersection table (Friends X Fruit) that associates each Friend with 3 randomly selected fruits.
For example:
Ann might be associated with Cherry, Date, Fig
Bob might be associated with Apple, Fig, Banana
Carl might be associated with Banana, Cherry, Date
I have developed a script that works well for only ONE friend (pasted below), but I am struggling to expand the script to handle all of the friends at once.
(If I remove the WHERE clause, then every friend gets assigned the same set of fruits, which doesn't meet my requirement).
Setup statements and current script pasted below.
Thank you for any guidance!
CREATE TABLE TempFriends ( FirstName VARCHAR(24) );
CREATE TABLE TempFruits ( FruitName VARCHAR(24) );
CREATE TABLE FriendsAndFruits( FirstName VARCHAR(24), FruitName VARCHAR(24) );
INSERT INTO TempFriends VALUES ('Ann'), ('Bob'), ('Carl');
INSERT INTO TempFruits VALUES ('Apple'), ('Banana'), ('Cherry'), ('Date'), ('Fig'), ('Grapefruit');
INSERT INTO FriendsAndFruits( FirstName, FruitName )
SELECT FirstName, FruitName
FROM TempFriends
INNER JOIN ( SELECT FruitName FROM TempFruits ORDER BY RAND() LIMIT 3 ) RandomFruit
WHERE FirstName = 'Bob';
INSERT INTO FriendsAndFruits
SELECT FirstName,
FruitName
FROM (
SELECT FirstName,
FruitName,
ROW_NUMBER() OVER (PARTITION BY FirstName ORDER BY RAND()) rn
FROM TempFriends
CROSS JOIN TempFruits
) subquery
WHERE rn <= 3;
INSERT INTO FriendsAndFruits( FirstName, FruitName )
SELECT FirstName, FruitName
FROM TempFriends
JOIN LATERAL (
SELECT FruitName
FROM TempFruits
ORDER BY TempFriends.FirstName, RAND() LIMIT 3
) RandomFruit;
INSERT INTO FriendsAndFruits( FirstName, FruitName )
SELECT FirstName, FruitName
FROM TempFriends
JOIN LATERAL (
SELECT TempFriends.FirstName tmp, FruitName
FROM TempFruits
ORDER BY RAND() LIMIT 3
) RandomFruit;
fiddle
I have this very simple table:
CREATE TABLE MyTable
(
Id INT(6) PRIMARY KEY,
Name VARCHAR(200) /* NOT UNIQUE */
);
If I want the Name(s) that is(are) the most frequent and the corresponding count(s), I can neither do this
SELECT Name, total
FROM table2
WHERE total = (SELECT MAX(total) FROM (SELECT Name, COUNT(*) AS total
FROM MyTable GROUP BY Name) table2);
nor this
SELECT Name, total
FROM (SELECT Name, COUNT(*) AS total FROM MyTable GROUP BY Name) table1
WHERE total = (SELECT MAX(total) FROM table1);
Also, (let's say the maximum count is 4) in the second proposition, if I replace the third line by
WHERE total = 4;
it works.
Why is that so?
Thanks a lot
You can try the following:
WITH stats as
(
SELECT Name
,COUNT(id) as count_ids
FROM MyTable
GROUP BY Name
)
SELECT Name
,count_ids
FROM
(
SELECT Name
,count_ids
,RANK() OVER(ORDER BY count_ids DESC) as rank_ -- this ranks all names
FROM stats
) s
WHERE rank_ = 1 -- the most popular ```
This should work in TSQL.
Your queries can't be executed because "total" is no column in your table. It's not sufficient to have it within a sub query, you also have to make sure the sub query will be executed, produces the desired result and then you can use this.
You should also consider to use a window function like proposed in Dimi's answer.
The advantage of such a function is that it can be much easier to read.
But you need to be careful since such functions often differ depending on the DB type.
If you want to go your way with a sub query, you can do something like this:
SELECT name, COUNT(name) AS total FROM myTable
GROUP BY name
HAVING COUNT(name) =
(SELECT MAX(sub.total) AS highestCount FROM
(SELECT Name, COUNT(*) AS total
FROM MyTable GROUP BY Name) sub);
I created a fiddle example which shows both queries mentioned here will produce the same and correct result:
db<>fiddle
Using MySQL v8.0 right now.
The question is:
Write an SQL query to report the id and the salary of the second highest salary from the
Employee table. If there is no second highest salary, the query should
report null.
My dummy data is:
Create table If Not Exists Employee (id int, salary int);
insert into Employee (id, salary) values
(1, 100);
My ideal output is like this:
+------+--------+
| id | salary |
+------+--------+
| NULL | NULL |
+------+--------+
I used DENSE_RANK as a more straightforward way for me to solve this question:
WITH sub AS (SELECT id,
salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS num
FROM Employee )
SELECT id, salary
FROM sub
WHERE num = 2
But I have a problem exporting NULL when there's no second highest salary. I tried IFNULL, but it didn't work. I guess it's because the output is not actually null but just empty.
Thank you in advance.
WITH sub AS (
SELECT id,
salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS num
FROM Employee
)
SELECT id, salary
FROM sub
WHERE num = 2
UNION ALL
SELECT NULL, NULL
WHERE 0 = ( SELECT COUNT(*)
FROM sub
WHERE num = 2 );
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=31f5afb0e7e5dce9c2c128ccc49a6f42
Just making your query a subquery and left joining from a single-row producing subquery seems to me the simplest approach:
select id, salary
from (select null) at_least_one_row
left join (
select id, salary
from (
select id, salary, dense_rank() over (order by salary desc) as num
from Employee
) ranked_employees
where num = 2
) second_highest_salary on true
(I usually prefer a subquery to a cte that's only used once; I find that obfuscatory.)
fbn table:
create table fbn (
id int,
name varchar(40),
birthday date,
gender varchar(10));
INSERT into fbn values
(1, "James", "2000-04-22", "male"),
(2, "Julia", "2006-02-27", "female"),
(3, "Ethan", "2013-05-23", "male"),
(4, "Lion", "2014-09-11", "male"),
(5, "Ethan", "2006-01-01", "male"),
(6, "Lion", "2006-02-01", "male");
what's the name that occur most? note that there are two names (Ethan and Lion) occur most often. I have two solutions as below:
select tmp2.name
from (select tmp.name,
dense_rank() over(order by tmp.name_count desc) rank
from (select name,
count(*) name_count
from fbn
group by name) tmp
) tmp2
where tmp2.rank = 1;
and the 2nd solution:
select name from fbn
group by name
having count(name) = (select count(name) from fbn
group by name
order by count(name) desc
limit 1);
both seem to be working, but both looks like too complicated. is there any other solution that's more concise easier to understand? Thanks
How about this:
SELECT counts,GROUP_CONCAT(NAME) AS names
FROM
(SELECT NAME, COUNT(*) counts FROM fbn GROUP BY NAME) A
GROUP BY counts
ORDER BY counts DESC
LIMIT 1;
Fiddle here : https://www.db-fiddle.com/f/qzmJAx5LFfzZdYyDetUfex/1
I think you can simply:
select count(id) as num_names, name from fbn group by name order by num_names desc limit 1;
For Ms :
select Top 1 count(id) as num_names, name from fbn group by name order by num_names desc
Disable ONLY_FULL_GROUP_BY if its enabled using:
SET GLOBAL sql_mode=(SELECT REPLACE(##sql_mode,'ONLY_FULL_GROUP_BY',''));
Here in inner query provide name with occurrence count. And outer query just use that temporary table to get max count value. Now it will return single row with name & total occurrence(max).
SELECT name, MAX(occurrence)
FROM (SELECT name, COUNT(id) occurrence
FROM fbn
GROUP BY name) t;
having a list of people like:
name date_of_birth
john 1987-09-08
maria 1987-09-08
samuel 1987-09-09
claire 1987-09-10
jane 1987-09-10
rose 1987-09-12
...
How can I get a result view using SQL of how many people are born up to that date, like the output for that table should be:
date count
1987-09-08 2
1987-09-09 3
1987-09-10 5
1987-09-11 5
1987-09-12 6
...
Thanks!
Here is another way, in addition to Gordon's answer. It uses joins:
SELECT
t1.date_of_birth,
COUNT(*) AS count
FROM (SELECT DISTINCT date_of_birth FROM yourTable) t1
INNER JOIN yourTable t2
ON t1.date_of_birth >= t2.date_of_birth
GROUP BY
t1.date_of_birth;
Note: I left out a step. Apparently you also want to report missing dates. If so, then you may replace what I aliased as t1 with a calendar table. For the sake of demonstration, you can inline all the dates:
SELECT
t1.date_of_birth,
COUNT(*) AS count
FROM
(
SELECT '1987-09-08' AS date_of_birth UNION ALL
SELECT '1987-09-09' UNION ALL
SELECT '1987-09-10' UNION ALL
SELECT '1987-09-11' UNION ALL
SELECT '1987-09-12'
) t1
LEFT JOIN yourTable t2
ON t1.date_of_birth >= t2.date_of_birth
GROUP BY
t1.date_of_birth;
Demo
In practice, your calendar table would be a bona fide table which just contains all the dates you want to appear in your result set.
One method is a correlated subquery:
select dob.date_of_birth,
(select count(*) from t where t.date_of_birth <= dob.date_of_birth) as running_count
from (select distinct date_of_birth from t) dob;
This is not particularly efficient. If your data has any size, variables are better (or window functions if you are using MySQL 8.0):
select date_of_birth,
(#x := #x + cnt) as running_count
from (select date_of_birth, count(*) as cnt
from t
group by date_of_birth
order by date_of_birth
) dob cross join
(select #x := 0) params;
Use subquery with correlation approach :
select date_of_birth, (select count(*)
from table
where date_of_birth <= t.date_of_birth
) as count
from table t
group by date_of_birth;