print all duplicate rows - mysql

Now, I have the following query, which gives me the COUNT() of rows which have the same name. However, let's say that I just wanted to have all the rows printed out.
SELECT l.id, c.first_name, c.last_name,
l.source AS 'affiliateId', COUNT(*),
c.email, ls.create_date, ls.buyer
FROM lead_status AS ls
INNER JOIN leads AS l ON l.id = ls.lead_id
INNER JOIN contacts AS c ON c.lead_id = l.id
WHERE ls.discriminator = 'AUTO_POST' AND l.affiliate_id=1003
AND ls.winner =1 AND l.test =0
AND l.create_date BETWEEN '2011-10-03' AND '2011-10-19'
GROUP BY c.first_name, c.last_name HAVING COUNT(*)>1;
So I'm trying to go from:
joe smith 3
lisa martin 2
To the following:
joe smith
joe smith
joe smith
lisa martin
lisa martin
Help!

You can join with a numbers table:
SELECT T1.col1, T2.col2
FROM
(
-- your long query goes here
) T1
JOIN numbers
ON numbers.x <= T1.cnt
A numbers table is just a table that contains numbers:
+---+
| x |
+---+
| 1 |
| 2 |
| 3 |
etc... as many numbers as you will ever need

u can use expression COUNT(DISTINCT first_name) and then get rid of GROUP BY
so query will be
SELECT l.id, c.first_name, c.last_name, l.source AS 'affiliateId', COUNT(DISTINCT c.first_name, c.last_name) as CountRows,
c.email, ls.create_date, ls.buyer FROM lead_status AS ls
INNER JOIN leads AS l ON l.id = ls.lead_id
INNER JOIN contacts AS c ON c.lead_id = l.id
WHERE ls.discriminator = 'AUTO_POST' AND l.affiliate_id=1003
AND ls.winner =1 AND l.test =0 AND l.create_date BETWEEN '2011-10-03' AND '2011-10-19'
HAVING CountRows>1

I don't remember if MySQL supports subqueries, but I would do something like
select
first, last
from
table where id in (select id from table group by first, last having count(*) > 1)
order by
first, last

Add another join on the table(s) where the duplication appears. For the join condition, have the identifying info be the same (e.g. c.first_name = c2.first_name AND c.last_name = c2.last_name, or l.id = c2.id) and whatever distinguishes the records be different (e.g. l.create_date < l2.create_date). Lastly, group by the ID of the record that contains the duplicates or select distinct rows so you don't get repeats. Without knowing the table schema or where the duplicates might occur, I can't be any more specific.

join the result set back with the contacts tale on the first name and last name fields (assuming that first name + last name forms a unique key)

Related

Show all of sum and count without using group by

I want to Retrieve customer names, total orders (how many time they order the products) and the total amount they're spent in the lifetime. Run a single query WITHOUT Join, group by, having operators. Show only customers who have at least one order.
Here is my database
Customer- CustomerID| CustomerName SalesOrder- SalesOrderID | CustomerID | SaleTotal
100000 | John 1001 | 100000 | 2000
200000 | Jane 1002 | 100000 | 3000
300000 | Sean 1003 | 200000 | 5000
When I query
SELECT CustomerName,count(*) AS Total_Orders,sum(SaleTotal) AS SaleTotal
FROM Customer C,SalesOrderHeader SH WHERE C.CustomerID=SH.CustomerID;
It show only one row.
The answer that I want is
CustomerName | Total_Orders | SaleTotal
John 2 5000
Jane 1 5000
I just new on mysql.
So does anyone here know how to do this?
If you are to do this without joins and group by, then the simplest approach is to use correlated subqueries:
select *
from (
select
c.customerName,
(
select count(*)
from salesOrder so
where so.customerID = c.customerID
) totalOrders,
(
select sum(salesTotal)
from salesOrder so
where so.customerID = c.customerID
) saleTotal
from customer c
) t
where totalOrders > 0
Note that this query is clearly suboptimal - because it scans the salesOrder table twice, while a single scan would suffice. A better way to write this would be:
select c.customerName, count(*) totalOrders, sum(salesTotal) saleTotal
from customer c
inner join saleOrder so on so.customerID = c.customerID
group by c.customerID, c.customerName
There is no need for a having clause here - the inner join filters out customers that have no order already.
Use aggregation . . . and proper join syntax:
SELECT CustomerName, COUNT(*) AS Total_Orders, SUM(SaleTotal) AS SaleTotal
FROM Customer C JOIN
SalesOrderHeader SH
ON C.CustomerID = SH.CustomerID
GROUP BY CustomerName;
Your query would fail in almost any database -- including newer versions of MySQL. You have mixed aggregated columns and unaggregated columns in the SELECT. The unaggregated ones should be in a GROUP BY.
Never use commas in the FROM clause. Always use proper, explicit, standard, readable JOIN syntax.
You have to use below query. You cannot achieve it without join and group by
SELECT CustomerName,count(*) AS Total_Orders,sum(SaleTotal) AS SaleTotal
FROM Customer C,SalesOrderHeader SH WHERE C.CustomerID=SH.CustomerID
group by;

Selecting a count of rows having a max value

Working example: http://sqlfiddle.com/#!9/80995/20
I have three tables, a user table, a user_group table, and a link table.
The link table contains the dates that users were added to user groups. I need a query that returns the count of users currently in each group. The most recent date determines the group that the user is currently in.
SELECT
user_groups.name,
COUNT(l.name) AS ct,
GROUP_CONCAT(l.`name` separator ", ") AS members
FROM user_groups
LEFT JOIN
(SELECT MAX(added), group_id, name FROM link LEFT JOIN users ON users.id = link.user_id GROUP BY user_id) l
ON l.group_id = user_groups.id
GROUP BY user_groups.id
My question is if the query I have written could be optimized, or written better.
Thanks!
Ben
You actual query is not giving you the answer you want; at least, as far as I understand your question. John actually joined group 2 on 2017-01-05, yet it appears on group 1 (that he joined on 2017-01-01) on your results. Note also you're missing one Group 4.
Using standard SQL, I think the next query is what you're looking for. The comments in the query should clarify what each part is doing:
SELECT
user_groups.name AS group_name,
COUNT(u.name) AS member_count,
group_concat(u.name separator ', ') AS members
FROM
user_groups
LEFT JOIN
(
SELECT * FROM
(-- For each user, find most recent date s/he got into a group
SELECT
user_id AS the_user_id, MAX(added) AS last_added
FROM
link
GROUP BY
the_user_id
) AS u_a
-- Join back to the link table, so that the `group_id` can be retrieved
JOIN link l2 ON l2.user_id = u_a.the_user_id AND l2.added = u_a.last_added
) AS most_recent_group ON most_recent_group.group_id = user_groups.id
-- And get the users...
LEFT JOIN users u ON u.id = most_recent_group.the_user_id
GROUP BY
user_groups.id, user_groups.name
ORDER BY
user_groups.name ;
This can be written in a more compact way in MySQL (abusing the fact that, in older versions of MySQL, it doesn't follow the SQL standard for the GROUP BY restrictions).
That's what you'll get:
group_name | member_count | members
:--------- | -----------: | :-------------
Group 1 | 2 | Mikie, Dominic
Group 2 | 2 | John, Paddy
Group 3 | 0 | null
Group 4 | 1 | Nellie
dbfiddle here
Note that this query can be simplified if you use a database with window functions (such as MariaDB 10.2). Then, you can use:
SELECT
user_groups.name AS group_name,
COUNT(u.name) AS member_count,
group_concat(u.name separator ', ') AS members
FROM
user_groups
LEFT JOIN
(
SELECT
user_id AS the_user_id,
last_value(group_id) OVER (PARTITION BY user_id ORDER BY added) AS group_id
FROM
link
GROUP BY
user_id
) AS most_recent_group ON most_recent_group.group_id = user_groups.id
-- And get the users...
LEFT JOIN users u ON u.id = most_recent_group.the_user_id
GROUP BY
user_groups.id, user_groups.name
ORDER BY
user_groups.name ;
dbfiddle here

SQL subquery to return MIN of a column and corresponding values from another column

I'm trying to query
number of courses passed,
the earliest course passed
time taken to pass first course, for each student who is not currently expelled.
The tricky part here is 2). I constructed a sub-query by mapping the course table onto itself but restricting matches only to datepassed=min(datepassed). The query appears to work for a very sample, but when I try to apply it to my full data set (which would return ~1 million records) the query takes impossibly long to execute (left it for >2 hours and still wouldn't complete).
Is there a more efficient way to do this? Appreciate all your help!
Query:
SELECT
S.id,
COUNT(C.course) as course_count,
C2.course as first_course,
DATEDIFF(MIN(C.datepassed),S.dateenrolled) as days_to_first
FROM student S
LEFT JOIN course C
ON C.studentid = S.id
LEFT JOIN (SELECT * FROM course GROUP BY studentid HAVING datepassed IN (MIN(datepassed))) C2
ON C2.studentid = C.studentid
WHERE YEAR(S.dateenrolled)=2013
AND U.id NOT IN (SELECT id FROM expelled)
GROUP BY S.id
ORDER BY S.id
Student table
id status dateenrolled
1 graduated 1/1/2013
3 graduated 1/1/2013
Expelled table
id dateexpelled
2 5/1/2013
Course table
studentid course datepassed
1 courseA 5/1/2014
1 courseB 1/1/2014
1 courseC 2/1/2014
1 courseD 3/1/2014
3 courseA 1/1/2014
3 couseB 2/1/2014
3 courseC 3/1/2014
3 courseD 4/1/2014
3 courseE 5/1/2014
SELECT id, course_count, days_to_first, C2.course first_course
FROM (
SELECT S.id, COUNT(C.course) course_count,
DATEDIFF(MIN(datepassed),S.dateenrolled) as days_to_first,
MIN(datepassed) min_datepassed
FROM student S
LEFT JOIN course C ON C.studentid = S.id
WHERE S.dateenrolled BETWEEN '2013-01-01' AND '2013-12-31'
AND S.id NOT IN (SELECT id FROM expelled)
GROUP BY S.id
) t1 LEFT JOIN course C2
ON C2.studentid = t1.id
AND C2.datepassed = t1.min_datepassed
ORDER BY id
I would try something like:
SELECT s.id, f.course,
COALESCE( DATEDIFF( c.first_pass,s.dateenrolled), 0 ) AS days_to_pass,
COALESCE( c.num_courses, 0 ) AS courses
FROM student s
LEFT JOIN
( SELECT studentid, MIN(datepassed) AS first_pass, COUNT(*) AS num_courses
FROM course
GROUP BY studentid ) c
ON s.id = c.studentid
JOIN course f
ON c.studentid = f.studentid AND c.first_pass = f.datepassed
LEFT JOIN expelled e
ON s.id = e.id
WHERE s.dateenrolled BETWEEN '2013-01-01' AND '2013-12-31'
AND e.id IS NULL
This query assumes a student can pass only one course on a given day, otherwise you can get more than one row for a student as its possible to have many first courses.
For performance it would help to have an index on dateenrolled in student table and a composite index on (studentid,datepassed) in courses table.

Mysql (conditional?) query from two tables

Not sure if I have phrased the title properly, but here it goes. I have these two tables:
table:staff
id Name groupId Status
1 John Smith 1 1
2 John Doe 1 1
3 Jane Smith 2 1
4 Jerry Smith 1 1
table:jobqueue
id job_id staff_id jobStatus
1 1 1 1
2 2 1 1
3 5 2 1
4 7 3 0
Now, what I need to do is to find the staff with the least amount of job assigned to him which I am able to do by querying the jobqueue table.
SELECT min(cstaff),tmp.staff_id FROM (SELECT t.staff_id, count(staff_id) cstaff from jobqueue t join staff s on t.staff_id=s.id join group g on s.groupId=g.id where g.id=26 GROUP BY t.id ) tmp
This works fine, but the problem is if a staff is not assigned to any job at all, this query wont get them, because it only queries the jobqueue table, where that particular staff won't have any entry. I need to modify the query to include the staff table and if a staff is not assigned any job in the jobqueue then I need to get the staff details from the staff table. Basically, I need to find staff for a group who are not assigned any job and if all staffs are assigned job then find staff with the least amount of jobs assigned. Could use some help with this. Also, tagging as Yii as I would like to know if this is doable with Yii active-records. But I am okay with a plain sql query that will work with Yii sql commands.
not sure that it is optimal query, but it works:
select d.groupId, d.name, (select count(*) from jobqueue as e where e.staff_id=d.id) as jobassigned
from staff as d
where d.id in (
select
(
select a.id
from staff as a
left outer join
jobqueue as b
on (a.id = b.staff_id)
where a.groupId = c.groupId
group by a.id
order by count(distinct job_id) asc
limit 1
) as notassigneduserid
from (
select distinct groupId from staff
) as c)
maybe need some comments:
c query is needed to get all distinct groupId - if you have separate table for this, you can replace it
notassigneduserid statement for each groupId select user with minimal job count
d query is needed to fetch actual user names, groupId for all found "unassigned users" and present it
here is the results for data from question:
Group Staff Jobs assigned
1 Jerry Smith 0
2 Jane Smith 1
with
counts as (
select s.groupId
, s.id
, (select count(*) from jobqueue where staff_id = s.id) count
from staff s
group by s.id, s.groupId),
groups as (
select groupId, min(count) mincount
from counts
group by groupId)
select c.groupId, c.id, c.count
from counts c
join groups g on c.groupId = g.groupId
where c.count = g.mincount
This SQL will give you all the staff with the minimum number of jobs in each group. It might be that more than one staff has the same minimum number of jobs. The approach is to use common table expressions to build first a list of counts, and then to retrieve the minimum count for each group. Finally I join the counts and groups tables and retrieve the staff that have the minimum count for each group.
I tested this on SQL Server, but the syntax should work for MySQL as well. To your data I added:
id Name groupId Status
5 Bubba Jones 2 1
6 Bubba Smith 1 1
and
id job_id staff_id jobStatus
5 4 5 1
Results are
group name count
1 Bubba Smith 0
1 Jerry Smith 0
2 Bubba Jones 1
2 Jane Smith 1
BTW, I would not try to do this with active record, it is far too complex.
As Ilya Bursov said this answer wasn't respond exactly what was asked. So here is a more optimized solution:
SELECT *
FROM (
SELECT s.id as id_staff, s.Name, s.groupId, count(distinct t.id) as jobsXstaff
FROM staff s
LEFT JOIN jobqueue t ON s.id=t.staff_id
GROUP BY s.id, s.groupId
ORDER BY s.groupId, jobsXstaff
) tmp
GROUP BY groupId
Old answer below.
This works but without table group which I don't create. You can simply join table groups as you did:
SELECT min(cstaff),tmp.id
FROM (
SELECT s.id, count( staff_id ) cstaff
FROM jobqueue t
RIGHT JOIN staff s ON t.staff_id = s.id
GROUP BY t.id
) tmp
As you see you need to get all values from table staff (right join) and select the id staff from it's own table (s.id instead of t.staff_id). Also you have to get tmp.id instead of staff_id now.

Select statement that that counts total number of distinct entries in one table, depending on data from another table

I have two tables: DATA and USERS
USERS
id sqft postal province city
==========================================================
1 1 Y7R BC Vancouver
2 2 Y7R BC Vancouver
3 1 L5B ON Toronto
and
DATA
id uid power
=======================
1 1 1000
2 2 1300
3 1 1500
uid in table DATA matches to id in table USERS
I want to be able to count the the number of distinct uid in DATA where the postal code is Y7R and sqft is 1
SELECT COUNT(id)
FROM `DATA` AS `d`
INNER JOIN `USERS` AS `u`
ON u.id=d.uid
WHERE u.postal='Y7R' AND u.sqft=1
GROUP BY u.id;
They should be distinct anyway if you have a proper schema, if so just remove the group by clause.
SELECT COUNT(DISTINCT D.UID) FROM DATA D
LEFT JOIN USERS U ON D.UID=U.ID
WHERE U.POSTAL='Y7R' AND U.SQFT=1)
In case you need distinct
You can use this solution:
SELECT COUNT(DISTINCT a.id)
FROM USERS a
JOIN DATA b ON a.id = b.uid
WHERE a.sqft = 1 AND
a.postal = 'Y7R'
Try this one:
SELECT COUNT(DISTINCT a.id)
FROM USERS a
INNER JOIN DATA b
ON a.id = b.uid
WHERE a.sqft = 1 AND
a.postal = 'Y7R'