I am looking for an SQL query to give me a list of duplicate entries in a table. However, there are 3 different columns to take into account. First is an ID, Second is a Name, and third is a Date. The situation is that there are multiple Names that are assigned with the same ID, and there are multiple records of those in a day, which makes THOUSANDS of different records per day.
I already filtered it so that only results for the past 7 days will show, but the amount of records is still too much for me to extract. I just want to decrease the number of rows in the output order to properly extract the results.
Sample
|--id-|--name--|-------date------|
| 1 | a |5-9-2015, 10:00am|
| 1 | a |5-8-2015, 10:02am|
| 1 | a |5-8-2015, 11:00am|
| 1 | b |5-8-2015, 10:00am|
| 1 | b |5-8-2015, 10:02am|
| 1 | c |5-8-2015, 10:00am|
| 2 | d |5-8-2015, 10:00am|
expected output
|--id-|--name--|
| 1 | a |
| 1 | b |
| 1 | c |
| 2 | d |
Inclusion of entries without any duplicates are fine. The important thing is to only return a single record of a unique id-name combination for a day.
Thanks in advance for any help that you can give.
You can get the combinations as:
select distinct id, name
from sample;
If you want duplicates, using group by and having:
select id, name
from sample
group by id, name
having count(*) > 1;
EDIT:
If you want this by date, then add date(date) to the group by and perhaps select clauses.
To return single id-name data per day you can use this:
select id, name
from tab
group by id, name, date(date)
The DATE() function extracts the date part of a date or date/time expression.
select id,name
from sample
group by id,name,DATE(date)
having count(*)>1;
Related
Suppose we have a table like the one below.
Id | Name | Group
-----------------
1 | John | 1
2 | Zayn | 2
3 | Four | 2
4 | Ben_ | 3
5 | Joe_ | 2
6 | Anna | 1
The query below will select all of them.
SELECT `Name` FROM `Table` WHERE 1;
How would I select only one person from each group? Who it is doesn't really matter, as long as there's only one name from group 1 and one name from group 2 etc.
The GROUP BY clause isn't fit for this (according to my error console) because I am selecting non aggregated values, which makes sense.
The DISTINCT clause isn't great here either, since I don't want to select the "Group" and definitely not group by their names.
If is not important the resulting name You can anawy leverage some group functions eg with max or min..
leverage the group functions
select max(name) from your_table
group by Group;
otherwise you can use subquery
select name from your_table
where Id in (select min(Id) from your_table group by Group);
I have this table:
What's the correct query to get this result:
ID | Type | Total
1 | A | 300
1 | B | 100
2 | A | 30
2 | B | 40
Which means sum by type first and then group by user id?
Your aggregate functions, such as SUM, are performed on your groups; so there is no "SUM by type first then group by user_id", you are wanting to group by user_id and type.
Like so: GROUP BY user_id, type
If you want to guarantee that ordering in the future also have an ORDER BY user_id, type clause as well. Currently, GROUP BY also orders, but I believe that feature has been marked as deprecated recently.
I recently got this question on interview which I failed to answer. The question was to list the number of duplicates that appear in a column employer like from this table
id | employer | employee
1 | isro | dude1
2 | isro | dude 2
3 | cnd | dude 3
4 | df | dsfdef
5 | dfdf | dfdfd
...
so the result should be like
isro = 2
df = 4
dfsf = 6
how do i achieve this?
I know there is count(*) which i could use with select statement with where clause, but how do i achieve the above result.
The HAVING clause can be used to filter on aggregated values:
SELECT employer, COUNT(*)
FROM yourTable
GROUP BY employer
HAVING COUNT(*) > 1
assuming TableName is the name of the table you want to select from, this would be your answer.
SELECT employer, count(employer)
FROM TableName
GROUP BY employer
HAVING COUNT(*) > 1
here is an answer to a very similar question that has some more info for you.
How to count occurrences of a column value efficiently in SQL?
I have a table that list system licences, multiple licences for each system (the expired ones and existing ones). I've only posted two columns in this question as they're the only important ones.
| id | systemid |
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 2 |
| 5 | 3 |
| 6 | 3 |
I need to get the rows with the id of 2, 4 and 6.
I need to collect 1 record for each systemid and it has to be the earliest (youngest) record, so in this case, the record with the highest id. I've been exploring GROUP BY, ORDER BY and LIMIT but I'm not producing the result I'm after. How do you collect one record for each individual value in one column and make sure it's the record with the highest id?
I KNOW this is wrong, but it's what I'm currently starring at:
SELECT * FROM licences GROUP BY systemid ORDER BY id DESC LIMIT 1
SELECT max(id), systemid FROM table GROUP BY systemid
Note that with a GROUP BY, all columns you select must either be in the GROUP BY clause or wrapped in an aggregating function, like max, min, sum, or average.
This will grab the highest id per systemid.
SELECT MAX(id), systemid
FROM ...
GROUP BY systemid
We have a table called entries which stores user information against a date. Users are allowed to enter the database once per day. Some example data:
+----+------------------+----------+
| id | email | date |
+----+------------------+----------+
| 1 | marty#domain.com | 04.09.13 |
| 2 | john#domain.com | 04.09.13 |
| 3 | marty#domain.com | 05.09.13 |
+----+------------------+----------+
I need to work out how many times there are X entries with the same email in the database. For example, the result should look like this for the above data, where we have 1 instance of one entry and 1 instance of 2 entries:
+---------------+-------+
| times_entered | total |
+---------------+-------+
| 1 | 1 |
| 2 | 1 |
+---------------+-------+
I've tried a few things but the furthest I have been able to get is getting a count of the amount of times each email address was found. It seems like all I need to do from here is collate those results and perform another COUNT on those groups to get the totals, but I'm unsure of how I can do that.
Usually it can be something like
select times_entered, count(*) from
( select count(*) times_entered from entries group by email )
group by times_entered
Not sure if it works for MySQL though...
The following will get the number of records per email:
SELECT COUNT(1) AS times_entered, email
FROM entries
GROUP BY email
Therefore, using this query as a derived table, we can group by the number of records to get the count (we do not need to select the email column in the subquery because we don't care about it):
SELECT times_entered, COUNT(1) AS total
FROM
(
SELECT COUNT(1) AS times_entered
FROM entries
GROUP BY email
) x
GROUP BY times_entered
SQL Fiddle demo
SQL Fiddle demo with a slightly larger data set
It could be something like this:
SELECT times_entered, COUNT( times_entered ) AS total
FROM (
SELECT COUNT( email ) AS times_entered
FROM `entries`
WHERE 1
GROUP BY email
) AS tmp
GROUP BY times_entered;