Speed up MySQL join to check for duplicates

Speed up MySQL join to check for duplicates - mysql

I'm using the following query to return all duplicate records with the same first and last name. The trick is that the contact_id, has to be in descending order.
The query returns the contacts as expected, but it is just SO SLOW! Takes about 6-8 seconds when checking around 30,000 records.
I have the contact_firstName, contact_lastName, contact_client_id, and contact_id all indexed in the database.
Any ideas what I could do to try and speed this up a bit? Thanks for your help :)
SELECT z.contact_id, z.contact_firstName, z.contact_lastName, RIGHT(z.contact_lastName,1) AS nameNum
FROM (`contacts` x)
JOIN `contacts` z ON `x`.`contact_firstName` = `z`.`contact_firstName`
AND x.contact_lastName = z.contact_lastName
AND x.contact_client_id = ".$ID."
AND z.contact_client_id = ".$ID."
WHERE `x`.`contact_id` < `z`.`contact_id`
GROUP BY `z`.`contact_id`

Not making any promises, but here's an alternative to try:
SELECT c.contact_id, c.contact_firstName, c.contact_lastName, RIGHT(c.contact_lastName,1) AS nameNum
FROM (SELECT contact_firstName, contact_lastName, MIN(contact_id) AS MinID
FROM contacts
WHERE contact_client_id = ".$ID."
GROUP BY contact_firstName, contact_lastName
HAVING COUNT(*) > 1) t
INNER JOIN contacts c
ON t.contact_firstName = c.contact_firstName
AND t.contact_lastName = c.contact_lastName
AND c.contact_client_id = ".$ID."
AND t.MinID <> c.contact_id

SELECT z.contact_id, z.contact_firstName, z.contact_lastName
, RIGHT(z.contact_lastName,1) AS nameNum
FROM `contacts` x
JOIN `contacts` z ON (x.contact_client_id = z.contact_client_id)
WHERE `x`.`contact_id` < `z`.`contact_id`
And x.contact_client_id = '$id'
GROUP BY `z`.`contact_id`
Make sure you have an index on:
- contact_id.
- contact_client_id

Related

Problems with query speed when using a nested query for item count

When I add the nested query for invCount, my query time goes from .03 sec to 14 sec. The query works and I get correct values, but it is very, very slow in comparison. Is that just because I have to many conditions in that query? When I take it out and still have the second nested query, the time is still .03 secs. There is clearly something about the first nested query the database doesn't like, but I am not seeing what it is. I have a foreign key set for all the inner join lines too. Any help or ideas would be appreciated.
SELECT a.*,
f.name,
f.partNumber,
f.showInAdminStore,
f.showInPublicStore,
f.productImage,
r.mastCatID,
(SELECT COUNT(b.inventoryID)
FROM storeInventory b
INNER JOIN events c ON c.eventID = b.eventID
WHERE b.pluID = a.pluID
AND b.listPrice = a.listPrice
AND b.unlimitedQty = a.unlimitedQty
AND (b.packageID = a.packageID OR (b.packageID IS NULL AND a.packageID IS NULL))
AND b.orderID IS NULL
AND c.isOpen = '1'
AND b.paymentTypeID <= '2'
AND (b.inCart < '$cartTime' OR b.inCart IS NULL) ) AS invCount,
(SELECT COUNT(x.inventoryID)
FROM storeInventory x
WHERE x.packageID = a.inventoryID) AS packageCount
FROM storeInventory a
INNER JOIN storePLUs f ON f.pluID = a.pluID
INNER JOIN storeCategories r ON r.catID = f.catID
INNER JOIN events d ON d.eventID = a.eventID
WHERE a.storeFrontID = '1'
AND a.orderID IS NULL
AND a.paymentTypeID <= '2'
AND d.isOpen = '1'
GROUP BY a.packageID, a.unlimitedQty, a.listPrice, a.pluID
Table from query output
UPDATE: 12/12/2022
I changed the line checking the packageID to "AND (b.packageID <=> a.packageID)" as suggested and that cut my query time down to 7.8 seconds from 14 seconds. Thanks for the pointer. I will definitely use that in the future for NULL comparisons.
using "count(*)" took about half a second off. When I take the first nested query out, it drops down to .05 seconds even with the other nested queries in there, so I feel like there is still something causing issues. I tried running it without the other "AND (b.inCart < '$cartTime' OR b.inCart IS NULL)" line and that did take about a second off, but no where what I was hoping for. Is there an operand that includes NULL on a less than comparison? I also tried running it without the inner join in the nested query and that didn't change much at all. Of course removing any of that, throughs the values off and they become incorrect, so I can't run it that way.
Here is my current query setup that still pulls correct values.
SELECT a.*,
f.name,
f.partNumber,
f.showInAdminStore,
f.showInPublicStore,
f.productImage,
r.mastCatID,
(SELECT COUNT(*)
FROM storeInventory b
INNER JOIN events c ON c.eventID = b.eventID
WHERE b.pluID = a.pluID
AND b.listPrice = a.listPrice
AND b.unlimitedQty = a.unlimitedQty
AND (b.packageID <=> a.packageID)
AND b.orderID IS NULL
AND c.isOpen = '1'
AND b.paymentTypeID <= '2'
AND (b.inCart < '$cartTime' OR b.inCart IS NULL) ) AS invCount,
(SELECT COUNT(x.inventoryID)
FROM storeInventory x
WHERE x.packageID = a.inventoryID) AS packageCount
FROM storeInventory a
INNER JOIN storePLUs f ON f.pluID = a.pluID
INNER JOIN storeCategories r ON r.catID = f.catID
INNER JOIN events d ON d.eventID = a.eventID
WHERE a.storeFrontID = '1'
AND a.orderID IS NULL
AND a.paymentTypeID <= '2'
AND d.isOpen = '1'
GROUP BY a.packageID, a.unlimitedQty, a.listPrice, a.pluID
I am not familiar with the term 'Composite indexes' Is that something different than these?
Screenshot of ForeignKeys on Table a

I think
AND (b.packageID = a.packageID
OR (b.packageID IS NULL
AND a.packageID IS NULL)
)
can be simplified to ( https://dev.mysql.com/doc/refman/8.0/en/comparison-operators.html#operator_equal-to ):
AND ( b.packageID <=> a.packageID )
Use COUNT(*) instead of COUNT(x.inventoryID) unless you check for not-NULL.
The subquery to compute packageCount seems strange; you seem to count inventories but join on packages.
The need to reach into another table to check isOpen is part of the performance problem. If eventID is not the PRIMARY KEYforevents, then add INDEX(eventID, isOpen)`.
Some other indexes that may help:
a: INDEX(storeFrontID, orderID, paymentTypeID)
a: INDEX(packageID, unlimitedQty, listPrice, pluID)
b: INDEX(pluID, listPrice, unlimitedQty, orderID)
f: INDEX(pluID, catID)
r: INDEX(catID, mastCatID)
x: INDEX(packageID, inventoryID)
After OP's Update
There is no way to do (x<y OR x IS NULL) except by switching to a UNION. In your case, it is pretty easy to do the conversion. Replace
( SELECT COUNT(*) ... AND ( b.inCart < '$cartTime'
OR b.inCart IS NULL ) ) AS invCount,
with
( SELECT COUNT(*) ... AND b.inCart < '$cartTime' ) +
( SELECT COUNT(*) ... AND b.inCart IS NULL ) AS invCount,
Revised indexes:
storePLUs:
INDEX(pluID, catID)
storeCategories:
INDEX(catID, mastCatID)
events:
INDEX(isOpen, eventID)
storeInventory:
INDEX(pluID, listPrice, unlimitedQty, orderID, packageID)
INDEX(pluID, listPrice, unlimitedQty, orderID, inCart)
INDEX(packageID, inventoryID)
INDEX(storeFrontID, orderID, paymentTypeID)

How to code better this multiple select subquery(subqueries)?

Basically below query now works in order to retrieve what I want.
But I'm pretty sure there is a more efficient way to write this query:
SELECT dg.ultimo_codigo_de_gestion_prejuridico, dg.hora_inicio_gestion,
dg.telefono, dg.fecha_gestion, cg.valor_codigo, cg.contacto
FROM detalle_gestion_con_obligacion
AS dg INNER JOIN codigo_gestion
AS cg ON dg.ultimo_codigo_de_gestion_prejuridico = cg.cod_gestion
WHERE nro_documento = 1234567
AND DATE(fecha_gestion) NOT IN (2018-10-20) AND telefono
IN ((SELECT tel_residencia FROM obligacion WHERE nro_obligacion = 1234567) ,
(SELECT tel_oficina FROM obligacion WHERE nro_obligacion = 1234567),
(SELECT celular FROM obligacion WHERE nro_obligacion = 1234567),
(SELECT tel_residencia1 FROM obligacion WHERE nro_obligacion = 1234567) )
ORDER BY fecha_gestion DESC, hora_inicio_gestion DESC limit 1;
As you might tell, around the IN clause I want to retrieve agreements (detalle_gestion_con_obligacion) where the telephone belongs to that list.
I am looking for a better solution that doesn't need four different SELECT statements.

You can join with obligacion instead of using IN.
SELECT dg.ultimo_codigo_de_gestion_prejuridico, dg.hora_inicio_gestion,
dg.telefono, dg.fecha_gestion, cg.valor_codigo, cg.contacto
FROM detalle_gestion_con_obligacion AS dg
INNER JOIN codigo_gestion AS cg ON dg.ultimo_codigo_de_gestion_prejuridico = cg.cod_gestion
INNER JOIN obligacion AS o ON telefono IN (o.tel_residencia, o.yel_oficina, o.celular, o.tel_residencial)
WHERE nro_documento = 1234567
AND DATE(fecha_gestion) NOT IN (2018-10-20)
ORDER BY fecha_gestion DESC, hora_inicio_gestion DESC
limit 1;

MySQL Multiple Join Query with Limit on One Join

I have a MYSQL query I'm working on that pulls data from multiple joins.
select students.studentID, students.firstName, students.lastName, userAccounts.userID, userstudentrelationship.userID, userstudentrelationship.studentID, userAccounts.getTexts, reports.pupID, contacts.pfirstName, contacts.plastName, reports.timestamp
from userstudentrelationship
join userAccounts on (userstudentrelationship.userID = userAccounts.userID)
join students on (userstudentrelationship.studentID = students.studentID)
join reports on (students.studentID = reports.studentID)
join contacts on (reports.pupID = contacts.pupID)
where userstudentrelationship.studentID = "10000005" AND userAccounts.getTexts = 1 ORDER BY reports.timestamp DESC LIMIT 1
I have a unique situation where I would like one of the joins (the reports join) to be limited to the latest result only for that table (order by reports.timestamp desc limit 1 is what I use), while not limiting the result quantities for the overall query.
By running the above query I get the data I would expect, but only one record when it should return several.
My question:
How can I modify this query to ensure that I receive all possible records available, while ensuring that only the latest record from the reports join used? I expect that each record will possibly contain different data from the other joins, but all records returned by this query will share the same report record

Provided I understand the issue; one could add a join to a set of data (aliased Z below) that has the max timestamp for each student; thereby limiting to one report record (most recent) for each student.
SELECT students.studentID
, students.firstName
, students.lastName
, userAccounts.userID
, userstudentrelationship.userID
, userstudentrelationship.studentID
, userAccounts.getTexts
, reports.pupID
, contacts.pfirstName
, contacts.plastName
, reports.timestamp
FROM userstudentrelationship
join userAccounts
on userstudentrelationship.userID = userAccounts.userID
join students
on userstudentrelationship.studentID = students.studentID
join reports
on students.studentID = reports.studentID
join contacts
on reports.pupID = contacts.pupID
join (SELECT max(timestamp) mts, studentID
FROM REPORTS
GROUP BY StudentID) Z
on reports.studentID = Z.studentID
and reports.timestamp = Z.mts
WHERE userstudentrelationship.studentID = "10000005"
AND userAccounts.getTexts = 1
ORDER BY reports.timestamp

for get all the records you should avoid limit 1 at the end of the query
for join anly one row from reports table you could use subquery as
select
students.studentID
, students.firstName
, students.lastName
, userAccounts.userID
, userstudentrelationship.userID
, userstudentrelationship.studentID
, userAccounts.getTexts
, t.pupID
, contacts.pfirstName
, contacts.plastName
, t.timestamp
from userstudentrelationship
join userAccounts on userstudentrelationship.userID = userAccounts.userID
join students on userstudentrelationship.studentID = students.studentID
join (
select * from reports
order by reports.timestamp limit 1
) t on students.studentID = t.studentID
join contacts on reports.pupID = contacts.pupID
where userstudentrelationship.studentID = "10000005"
AND userAccounts.getTexts = 1

how to optimize mysql query with inner join

select id from customer_details where store_client_id = 2
And
id NOT IN (select customer_detail_id from orders
where store_client_id = 2 and total_spent > 100 GROUP BY customer_detail_id )
Or
id IN (select tcd.id from property_details as pd, customer_details as tcd
where pd.store_client_id = 2 and pd.customer_detail_id = tcd.customer_id and pd.property_key = 'Accepts Marketing'
and pd.property_value = 'no')
And
id IN (select customer_detail_id from orders
where store_client_id = 2 GROUP BY customer_detail_id HAVING count(customer_detail_id) > 0 )
Or
id IN (select tor.customer_detail_id from ordered_products as top, orders as tor
where tor.id = top.order_id and tor.store_client_id = 2
GROUP BY tor.customer_detail_id having sum(top.price) = 1)`
I have this mysql query with inner join so when it run in mysql server it slow down what is the issue cant find.
But after 4-5 minutes it return 15 000 records. This records is not an issue may be.
In some tutorial suggest to use Inner join, Left join,...
But I don't know how to convert this query in Join clause.
Any help will be appreciated. Thanks in advance.

First of all please read relational model and optimizing select statements.

MYSQL RIGHT JOIN

I have 3 tables; events, memberEvents, and members.
Events: eventId, eventName, eventDivision
memberEvents: memberID, eventOne, eventTwo, eventThree, eventFour, eventFive
member: memberID, memberFirstName, memberLastName
I am trying to get it to display events.eventName followed by the memberFirstName & memberLastName of members that are doing that event
This is the query I have been trying:
SELECT * FROM events, memberEvents, members
WHERE events.eventDivision = 'C'
RIGHT JOIN memberEvents.memberID
ON events.eventID = memberEvents.eventOne
OR events.eventID = memberEvents.eventTwo
OR events.eventID = memberEvents.eventThree
OR events.eventID = memberEvents.eventFour
OR events.eventID = memberEvents.eventFive
When I run this i get "#1066 - Not unique table/alias: 'memberEvents'"

Try:
SELECT ev.*, me.* FROM events ev
RIGHT JOIN memberEvents me
ON (ev.eventID = me.eventOne
OR ev.eventID = me.eventTwo
OR ev.eventID = me.eventThree
OR ev.eventID = me.eventFour
OR ev.eventID = me.eventFive)
WHERE ev.eventDivision = 'C'

Did you specifically want to limit the number of events per member to five? Why not just have a memberEvent table that has a primary key made up of foreign keys to member and event?
memberEvent: memberId, eventId
Then your query would be
SELECT
event.eventName,
member.memberFirstName,
member.memberLastName
FROM
event
INNER JOIN
memberEvent
ON
memberEvent.eventID = event.eventId
INNER JOIN
member
ON
memberEvent.memberId = member.memberId
WHERE
event.division = 'C';
Maybe you have a good reason for the table structure you have chosen but it is a denormalised design and if you ever need to increase the number of events per member you'll need to modify your schema and code to suit.

I think you should have a closer look at the defenition
JOIN Syntax
It seems that you have misunderstood the JOIN syntax.
SELECT *
FROM table1 t1 right join
table2 t2 on t1.key1 = t2.key1
and t1.key2 = t2.key2
where t1.somecolumn = somevalue

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Speed up MySQL join to check for duplicates - mysql

Related

Problems with query speed when using a nested query for item count

How to code better this multiple select subquery(subqueries)?

MySQL Multiple Join Query with Limit on One Join

how to optimize mysql query with inner join

MYSQL RIGHT JOIN

Categories

Resources