Decrease find duplicate record query execution time - mysql

I have 3 lac of records. I need to count duplicate records and return all duplicate records (Ex. if example#example.com are 10 times then returns all 10 records with duplicate of 10)
I have created query for that but it takes 15 seconds of time. Any suggestion to decrease time?
SELECT g.guest_name, g.email, b.totalCount AS duplicate_guest
FROM guest g
INNER JOIN (SELECT email, COUNT(Id) AS totalCount FROM guest GROUP BY email ) b ON g.email = b.email

Need to decrease the data for join condition by adding having condition like below query.
Also make sure that there should be an index on email column to optimize it
SELECT g.guest_name, g.email, b.count as duplicate_guests
FROM guest g
INNER JOIN
(
SELECT email, COUNT(Id) AS count
FROM guest
GROUP BY email
HAVING count(*) > 1
) b ON g.email = b.email

Just add HAVING count(*) > 1 to the inner select
SELECT g.guest_name, g.email, b.totalCount AS duplicate_guest
FROM guest g
INNER JOIN
(
SELECT email, COUNT(Id) AS totalCount
FROM guest
GROUP BY email
HAVING count(*) > 1
) b ON g.email = b.email

Related

I need to get specific ids from db if these are in current and last quarter using SQL

[DB Table]
SELECT b.first_name, b.last_name, a.pod_name, a.category, c.user_id,
SUM(IF(QUARTER(CURDATE())-1 OR (QUARTER(CURDATE())-2) AND a.user_id, 1, 0)) AS flag FROM kudos a
INNER JOIN users b ON a.user_id = b.id INNER JOIN users_groups c ON a.user_id = c.user_id
INNER JOIN groups d ON c.group_id = d.id WHERE a.group_name = 'G2' AND d.id IN (7,8,9,11,12,13,14,15,16,17,21,22,23,24,25,26,27,28)
AND QUARTER(CURDATE())-1 = a.quarter ORDER BY a.final_score+0 DESC
I need to get the user_ids of those users which are both in quarter 1 and 2 from table.
Tried above query but failed to get expected results.
Can someone please guide me on this?
if you only need user_id then you can do this :
select user_id
from tablename
where quarter in (1,2)
group by user_id
having count(distinct quarter) = 2
another way is to use window function, assuming you have one user id in each quarter:
select * from (
select * , count(*) over (partition by user_id) cn
from tablename
where quarter in (1,2)
) t where cn = 2

How to get most recent balance from many users balances?

I have two table users and transactions. transactions table has relation with users two table format like below
users
id name email created
1 a a#mail.com 12-03-01
2 b b#mail.com 11-03-01
Transactions
id user_id balance
1 1 250
2 1 550
3 2 50
4 2 1000
I need last inserted users balance from transactions table with all users information. I am new in sql.
So I have tried below code
select * from transactions
where id in (select max(id) from transactions group by user_id)
INNER JOIN users on transactions.user_id=users.id
It's giving me syntax error near inner join.Have I made any mistake in inner join ? or I am in wrong direction ?
If you only want the balance, then a correlated subquery might be faster:
select u.*,
(select t.balance
from transactions t
where t.user_id = u.id
order by t.id desc
limit 1
) as MostRecentBalance
from users u;
For maximum performance, you want an index on transactions(user_id, id desc, balance).
The reason this is faster is because it avoids the aggregation on the entire transactions table. This is even more important if you are only selecting a subset of users.
EDIT:
I originally read this question as one row per user. However, if you only want one row returned -- for the last insert into transactions -- then a simpler method is:
select u.*, t.balance
from users u join
transactions t
on u.id = t.user_id
order by t.id desc
limit 1;
The JOIN should be part of the FROM statement so it should look more like the code below.
select *
from transactions ts
INNER JOIN users
ON (transactions.user_id=users.id)
where ts.id in
(
select max(transactions.id)
from transactions
group by user_id
);
edited to clarify which id is in use as per Gordons suggestion
2 simple methods.
A sub query to get the lastest transaction, and from that all the transaction details and then the user
SELECT users.*
FROM users
INNER JOIN transactions
ON users.id = transactions.user_id
INNER JOIN
(
SELECT MAX(id) AS max_id
FROM transactions
) sub0
ON transactions.id = sub0.max_id
Or you could try ordering by the id descending with a limit of 1:-
SELECT users.*
FROM users
INNER JOIN transactions
ON users.id = transactions.user_id
ORDER BY transactions.id DESC
LIMIT 1
EDIT
To get the last transaction for all users then you could use the following:-
SELECT *
FROM users
INNER JOIN transactions
ON users.id = transactions.user_id
INNER JOIN
(
SELECT user_id, MAX(id) AS max_id
FROM transactions
GROUP BY user_id
) sub0
ON transactions.id = sub0.max_id
ON transactions.user_id = sub0.user_id

Group and count rows and then use them in condition

I have an query like:
SELECT * FROM account AS a
LEFT JOIN (SELECT SUM(bill.amount) total, bill.accountId FROM bill GROUP BY bill.accountId) b ON a.id = b.accountId
WHERE a.partner_id = 1 OR a.partner_id = 2
How can I check, how many groups in "bill" has the same a.partner_id?
For example: 3 groups has partner_id = 1, 2 groups has partner_id = 2.
And later include to left join only groups, if more than 2 groups have the same partner_id.
If I understand correctly, you just want an aggregation on top of your query:
SELECT a.partner_id, count(*) as cnt, sum(total) as total
FROM account a LEFT JOIN
(SELECT SUM(b.amount) as total, b.accountId
FROM bill b
GROUP BY b.accountId
) b
ON a.id = b.accountId
GROUP BY a.partner_id;
You should be able to use the "HAVING" clause. Below is an example from the following link:
https://dev.mysql.com/doc/refman/5.0/en/group-by-handling.html
SELECT name, COUNT(name) AS c FROM orders
GROUP BY name
HAVING c = 1;

Query on two tables with belongs_to/has_many relation

One table is Users with id and email columns.
Another table is Payments with id, created_at, user_id and foo columns.
User has many Payments.
I need a query that returns each user's email, his last payment date and this last payment's foo value. How do I do that? What I have now is:
SELECT users.email, MAX(payments.created_at), payments.foo
FROM users
JOIN payments ON payments.user_id = users.id
GROUP BY users.id
This is wrong, because foo value does not necessarily belong to user's most recent payment.
Try this :
select users.email,foo,create_at
from users
left join(
select a.* from payments a
inner join (
select id,user_id,max(create_at)
from payments
group by id,user_id
)b on a.id = b.id
) payments on users.id = payments.user_id
If users has no payment yet, then foo and create_at would return NULL. if you want to exclude users who has no payment, then use INNER JOIN.
One approach would be to use a MySQL version of rank over partition and then select only those rows with rank = 1:
select tt.email,tt.created_at,tt.foo from (
select t.*,
case when #cur_id = t.id then #r:=#r+1 else #r:=1 end as rank,
#cur_id := t.id
from (
SELECT users.id,users.email, payments.created_at, payments.foo
FROM users
JOIN payments ON payments.user_id = users.id
order by users.id asc,payments.created_at desc
) t
JOIN (select #cur_id:=-1,#r:=0) r
) tt
where tt.rank =1;
This would save hitting the payments table twice. Could be slower though. Depends on your data!

MySQL subquery limit 1

I have an sql query that returns a list of residential units, and a subquery that is supposed to get the last entered bill for that unit.
However when I add LIMIT 1 to the subquery, no bill entries are returned? If I leave it out, I get duplicate unit rows depending on the number of bill for the unit.
select * from unit u
left join (select id as billId, unit_id, added_on, end_reading, bill_type from bills
order by id desc) b ON unit_id = u.id
where community_Id = 1
and unit_section = 7
and unit_floor in (1,2,3,4,5)
order by unit_floor, display_order asc;
Anyone know how I can the subquery result limited to 1 bill?
When using joins that duplicate your results, add a group by statement. It's an alternative of the distinct from a simple select
select * from unit u
left join (select id as billId, unit_id, added_on, end_reading, bill_type from bills
order by id desc) b ON unit_id = u.id
where community_Id = 1
and unit_section = 7
and unit_floor in (1,2,3,4,5)
group by u.id
order by unit_floor, display_order asc;
Think you will need a sub query to get the first (lowest) id for each unit_id from the bills table. Then use that to join between the unit and bills table, getting the other matching columns from bills for that lowest id
SELECT u.*, bills.*
FROM unit u
LEFT OUTER JOIN
(
SELECT unit_id, MIN(id) AS min_id
FROM bills
GROUP BY unit_id
) b ON b.unit_id = u.id
LEFT OUTER JOIN bills
ON b.unit_id = bills.unit_id
AND b.min_id = bills.id
WHERE u.community_Id = 1
AND u.unit_section = 7
AND u.unit_floor in (1,2,3,4,5)
ORDER BY u.unit_floor, u.display_order asc;