Optimizing MySql query - mysql

I would like to know if there is a way to optimize this query :
SELECT
jdc_organizations_activities.*,
jdc_organizations.orgName,
CONCAT(jos_hpj_users.firstName, ' ', jos_hpj_users.lastName) AS nameContact
FROM jdc_organizations_activities
LEFT JOIN jdc_organizations ON jdc_organizations_activities.organizationId =jdc_organizations.id
LEFT JOIN jos_hpj_users ON jdc_organizations_activities.contact = jos_hpj_users.userId
WHERE jdc_organizations_activities.status LIKE 'proposed'
ORDER BY jdc_organizations_activities.creationDate DESC LIMIT 0 , 100 ;
Now When i see the query log :
Query_time: 2
Lock_time: 0
Rows_sent: 100
Rows_examined: **1028330**
Query Profile :
2) Should i put indexes on the tables having in mind that there will be a lot of inserts and updates on those tables .
From Tizag Tutorials :
Indexes are something extra that you
can enable on your MySQL tables to
increase performance,cbut they do have
some downsides. When you create a new
index MySQL builds a separate block of
information that needs to be updated
every time there are changes made to
the table. This means that if you
are constantly updating, inserting and
removing entries in your table this
could have a negative impact on
performance.
Update after adding indexes and removing the lower() , group by and the wildcard
Time: 0.855ms

Add indexes (if you haven't) at:
Table: jdc_organizations_activities
simple index on creationDate
simple index on status
simple index on organizationId
simple index on contact
And rewrite the query by removing call to function LOWER() and using = or LIKE. It depends on the collation you have defined for this table but if it's a case insensitive one (like latin1), it will still show same results. Details can be found at MySQL docs: case-sensitivity
SELECT a.*
, o.orgName
, CONCAT(u.firstName,' ',u.lastName) AS nameContact
FROM jdc_organizations_activities AS a
LEFT JOIN jdc_organizations AS o
ON a.organizationId = o.id
LEFT JOIN jos_hpj_users AS u
ON a.contact = u.userId
WHERE a.status LIKE 'proposed' --- or (a.status = 'proposed')
ORDER BY a.creationDate DESC
LIMIT 0 , 100 ;
It would be nice if you posted the execution plan (as it is now) and after these changes.
UPDATE
A compound index on (status, creationDate) may be more appopriate (as Darhazer suggested) for this query, instead of the simple (status). But this is more guess work. Posting the plans (after running EXPLAIN query) would provide more info.
I also assumed that you already have (primary key) indexes on:
jdc_organizations.id
jos_hpj_users.userId

Post the result from EXPLAIN
Generally you need indexes on jdc_organizations_activities.organizationId, jdc_organizations_activities.contact, composite index on jdc_organizations_activities.status and jdc_organizations_activities.creationDate
Why you are using LIKE query for constant lookup (you have no wildcard symbols, or maybe you've edited the query)
The index on status can be used for LIKE 'proposed%' but can't be used for LIKE '%proposed%' - in the later case better leave only index on creationDate

What indexes do you have on these tables? Specifically, have you indexed jdc_organizations_activities.creationDate?
Also, why do you need to group by jdc_organizations_activities.id? Isn't that unique per row, or can an organization have multiple contacts?

The slowness is because mysql has to apply lower() to every row. The solution is to create a new column to store the result of lower, then put an index on that column. Let's also use a trigger to make the solution more luxurious. OK, here we go:
a) Add a new column to hold the lower version of status (make this varchar as wide as status):
ALTER TABLE jdc_organizations_activities ADD COLUMN status_lower varchar(20);
b) Populate the new column:
UPDATE jdc_organizations_activities SET status_lower = lower(status);
c) Create an index on the new column
CREATE INDEX jdc_organizations_activities_status_lower_index
ON jdc_organizations_activities(status_lower);
d) Define triggers to keep the new column value correct:
DELIMITER ~;
CREATE TRIGGER jdc_organizations_activities_status_insert_trig
BEFORE INSERT ON jdc_organizations_activities
FOR EACH ROW
BEGIN
NEW.status_lower = lower(NEW.status);
END;
CREATE TRIGGER jdc_organizations_activities_status_update_trig
BEFORE UPDATE ON jdc_organizations_activities
FOR EACH ROW
BEGIN
NEW.status_lower = lower(NEW.status);
END;~
DELIMITER ;
Your query should now fly.

Related

Mysql - inner join with or condition taking long time mysql

Need help with MySQL query.
I have indexed mandatory columns but still getting results in 160 seconds.
I know I have a problem with Contact conditions without it results are coming in 15s.
Any kind of help is appreciated.
My Query is :
SELECT `order`.invoicenumber, `order`.lastupdated_by AS processed_by, `order`.lastupdated_date AS LastUpdated_date,
`trans`.transaction_id AS trans_id,
GROUP_CONCAT(`trans`.subscription_id) AS subscription_id,
GROUP_CONCAT(`trans`.price) AS trans_price,
GROUP_CONCAT(`trans`.quantity) AS prod_quantity,
`user`.id AS id, `user`.businessname AS businessname,
`user`.given_name AS given_name, `user`.surname AS surname
FROM cdp_order_transaction_master AS `order`
INNER JOIN `cdp_order_transaction_detail` AS trans ON `order`.transaction_id=trans.transaction_id
INNER JOIN cdp_user AS user ON (`order`.user_id=user.id OR CONCAT( user.id , '_CDP' ) = `order`.lastupdated_by)
WHERE `order`.xero_invoice_status='Completed' AND `order`.order_date > '2021-01-01'
GROUP BY `order`.transaction_id
ORDER BY `order`.lastupdated_date
DESC LIMIT 100
1. Index the columns used in the join, where section so that sql does not scan the entire table and only scans the desired columns. A full scan of the table works extremely badly.
create index for cdp_order_transaction_master table :
CREATE INDEX idx_cdp_order_transaction_master_transaction_id ON cdp_order_transaction_master(transaction_id);
CREATE INDEX idx_cdp_order_transaction_master_user_id ON cdp_order_transaction_master(user_id);
CREATE INDEX idx_cdp_order_transaction_master_lastupdated_by ON cdp_order_transaction_master(lastupdated_by);
CREATE INDEX idx_cdp_order_transaction_master_xero_invoice_status ON cdp_order_transaction_master(xero_invoice_status);
CREATE INDEX idx_cdp_order_transaction_master_order_date ON cdp_order_transaction_master(order_date);
create index for cdp_order_transaction_detail table :
CREATE INDEX idx_cdp_order_transaction_detail_transaction_id ON cdp_order_transaction_detail(transaction_id);
create index for cdp_user table :
CREATE INDEX idx_cdp_user_id ON cdp_user(id);
2. Use Owner/Schema Name
If the owner name is not specified, the SQL Server engine tries to find it in all schemas to find the object.

MySQL View 20x slower than Select

I have a query that selects ~8000 rows. When I execute the query it takes 0.1 sec.
When I copy the query into a view and execute the view it takes about 2 seconds. In the first row of explain it selects ~570K rows, i dont know why.
I dont understand the first Row and why it shows up only in the view explain
1 PRIMARY ALL NULL NULL NULL NULL
This is the query (yes i know im not a mysql pro and the query is not that efficent, but it works ans 0.1 sek would be ok for me. Does anyone know why it is so slow in a view?
MariaDB 10.5.9
select
`xxxxxxx`.`auftraege`.`Zustandigkeit` AS `Zustandigkeit`,
`xxxxxxx`.`auftraege`.`cms` AS `cms`,
`xxxxxxx`.`auftraege`.`auftrag_id` AS `auftrag_id`,
`xxxxxxx`.`angebot`.`angebot_id` AS `angebot_id`,
`xxxxxxx`.`kunden`.`kunde_id` AS `kid`,
`xxxxxxx`.`angebot`.`kunde_id` AS `kunde_id`,
`xxxxxxx`.`kunden`.`firma` AS `firma`,
`xxxxxxx`.`auftraege`.`gekuendigt` AS `gekuendigt`,
`xxxxxxx`.`kunden`.`ansprechpartnerVorname` AS `ansprechpartnerVorname`,
`xxxxxxx`.`kunden`.`ansprechpartner` AS `ansprechpartner`,
`xxxxxxx`.`auftraege`.`ampstatus` AS `ampstatus`,
`xxxxxxx`.`auftraege`.`autoMahnungen` AS `autoMahnungen`,
`xxxxxxx`.`kunden`.`mail` AS `mail`,
`xxxxxxx`.`kunden`.`ansprechpartnerAnrede` AS `ansprechpartnerAnrede`,
case
`xxxxxxx`.`kunden`.`ansprechpartnerAnrede`
when
'm'
then
concat('Herr ', ifnull(`xxxxxxx`.`kunden`.`ansprechpartnerVorname`, ''), ifnull(`xxxxxxx`.`kunden`.`ansprechpartner`, ''))
else
concat('Frau ', ifnull(`xxxxxxx`.`kunden`.`ansprechpartnerVorname`, ''), ifnull(`xxxxxxx`.`kunden`.`ansprechpartner`, ''))
end
AS `ansprechpartnerfullName`, `xxxxxxx`.`kunden`.`website` AS `website`, `xxxxxxx`.`personal`.`name_betrieb` AS `name_betrieb`, `xxxxxxx`.`kunden`.`prioritaet` AS `prioritaet`, `xxxxxxx`.`auftraege`.`infoemail` AS `infoemail`, `xxxxxxx`.`auftraege`.`keywords` AS `keywords`, `xxxxxxx`.`auftraege`.`ftp_h` AS `ftp_h`, `xxxxxxx`.`auftraege`.`ftp_u` AS `ftp_u`, `xxxxxxx`.`auftraege`.`ftp_pw` AS `ftp_pw`, `xxxxxxx`.`auftraege`.`lgi_h` AS `lgi_h`, `xxxxxxx`.`auftraege`.`lgi_u` AS `lgi_u`, `xxxxxxx`.`auftraege`.`lgi_pw` AS `lgi_pw`, `xxxxxxx`.`auftraege`.`autoRemind` AS `autoRemind`, `xxxxxxx`.`kunden`.`telefon` AS `telefon`, `xxxxxxx`.`kunden`.`mobilfunk` AS `mobilfunk`, `xxxxxxx`.`auftraege`.`kommentar` AS `kommentar`, `xxxxxxx`.`auftraege`.`phase` AS `phase`, `xxxxxxx`.`auftraege`.`datum` AS `datum`, `xxxxxxx`.`angebot`.`typ` AS `typ`,
case
`xxxxxxx`.`auftraege`.`gekuendigt`
when
'1'
then
'Ja'
else
'Nein'
end
AS `Gekuendigt ? `,
(
select
count(`xxxxxxx`.`status`.`aenderung`)
from
`xxxxxxx`.`status`
where
`xxxxxxx`.`status`.`auftrag_id` = `xxxxxxx`.`auftraege`.`auftrag_id`
)
AS `aenderungen`,
`xxxxxxx`.`auftraege`.`vertragStart` AS `vertragStart`,
`xxxxxxx`.`auftraege`.`vertragEnde` AS `vertragEnde`,
case
`xxxxxxx`.`auftraege`.`zahlungsart`
when
'U'
then
'Überweisung'
when
'L'
then
'Lastschrift'
else
'Unbekannt'
end
AS `Zahlungsart`, `xxxxxxx`.`kunden`.`yyyyy_piwik` AS `yyyyy_piwik`,
(
select
max(`xxxxxxx`.`status`.`datum`) AS `mxDTst`
from
`xxxxxxx`.`status`
where
`xxxxxxx`.`status`.`auftrag_id` = `xxxxxxx`.`auftraege`.`auftrag_id`
and `xxxxxxx`.`status`.`typ` = 'SEO'
)
AS `mxDTst`,
(
select
case
`xxxxxxx`.`rechnungen`.`beglichen`
when
'YES'
then
'isOk'
else
'isAffe'
end
AS `neuUwe`
from
(
`xxxxxxx`.`zahlungsplanneu`
join
`xxxxxxx`.`rechnungen`
on(`xxxxxxx`.`zahlungsplanneu`.`rechnungsnummer` = `xxxxxxx`.`rechnungen`.`rechnungsnummer`)
)
where
`xxxxxxx`.`zahlungsplanneu`.`auftrag_id` = `xxxxxxx`.`auftraege`.`auftrag_id`
and `xxxxxxx`.`rechnungen`.`beglichen` <> 'STO' limit 1
)
AS `neuer`,
(
select
group_concat(`xxxxxxx`.`kunden_keywords`.`keyword` separator ',')
from
`xxxxxxx`.`kunden_keywords`
where
`xxxxxxx`.`kunden_keywords`.`kunde_id` = `xxxxxxx`.`kunden`.`kunde_id`
)
AS `keyword`,
(
select
case
count(0)
when
0
then
'Cool'
else
'Uncool'
end
AS `AusfallVor`
from
`xxxxxxx`.`rechnungen`
where
`xxxxxxx`.`rechnungen`.`rechnung_tag` < current_timestamp() - interval 15 day
and `xxxxxxx`.`rechnungen`.`kunde_id` = `xxxxxxx`.`kunden`.`kunde_id`
and `xxxxxxx`.`rechnungen`.`beglichen` = 'NO' limit 1
)
AS `Liquidiert`
from
(
((((`xxxxxxx`.`auftraege`
join
`xxxxxxx`.`angebot`
on(`xxxxxxx`.`auftraege`.`angebot_id` = `xxxxxxx`.`angebot`.`angebot_id`))
join
`xxxxxxx`.`kunden`
on(`xxxxxxx`.`angebot`.`kunde_id` = `xxxxxxx`.`kunden`.`kunde_id`))
left join
`xxxxxxx`.`kunden_keywords`
on(`xxxxxxx`.`angebot`.`kunde_id` = `xxxxxxx`.`kunden_keywords`.`kunde_id`))
join
`xxxxxxx`.`personal`
on(`xxxxxxx`.`kunden`.`bearbeiter` = `xxxxxxx`.`personal`.`personal_id`))
left join
`xxxxxxx`.`status`
on(`xxxxxxx`.`auftraege`.`auftrag_id` = `xxxxxxx`.`status`.`auftrag_id`)
)
group by
`xxxxxxx`.`auftraege`.`auftrag_id`
order by
NULL
UPDATE 1
1. The View Itself (Duration 1.83 sec)
1.1 Create the View: This is the View i created, it only contains the query from above.
1.2 Executing the View: It takes 1.83 sek to execute the view
1.3 Analyze the View: This is the explain of the view
2. The view with added where clause (Duration 1.86 sec)
2.1 Analyze the View with added where clause #rick wanted me to add a where clause to the view, if i understood him correctly. This is the explain of the view, where i added a where clause, takes 1.86 sec.
3. The Query, that is the source of the view (Duration: 0.1 sec)
3.1 Execute the query directly This is the query, that is the source of the view, when i execute it directly to the server. It takes ~0.1 - 0.2 seconds.
3.2 Analyze the direct queryAnd this is the explain of the pure query.
Why the view is so much slower, by only cupsuling the query inside of the view?
Update 2
These are the indexes I have set
ALTER TABLE angebot ADD INDEX angebot_idx_angebot_id (angebot_id);
ALTER TABLE auftraege ADD INDEX auftraege_idx_auftrag_id (auftrag_id);
ALTER TABLE kunden ADD INDEX kunden_idx_kunde_id (kunde_id);
ALTER TABLE kunden_keywords ADD INDEX kunden_keywords_idx_kunde_id (kunde_id);
ALTER TABLE personal ADD INDEX personal_idx_personal_id (personal_id);
ALTER TABLE rechnungen ADD INDEX rechnungen_idx_rechnungsnummer_beglichen (rechnungsnummer,beglichen);
ALTER TABLE rechnungen ADD INDEX rechnungen_idx_beglichen_kunde_id_rechnung (beglichen,kunde_id,rechnung_tag);
ALTER TABLE status ADD INDEX status_idx_auftrag_id (auftrag_id);
ALTER TABLE status ADD INDEX status_idx_typ_auftrag_id_datum (typ,auftrag_id,datum);
ALTER TABLE zahlungsplanneu ADD INDEX zahlungsplanneu_idx_auftrag_id (auftrag_id);
Be consistent between tables. kunde_id, for example, seems to be declared differently between tables. This may be preventing some obvious optimizations. (There are 6 JOINs that say func in EXPLAIN`.)
Remove the extra parentheses in JOINs. They may be preventing what the Optimizer is happy to do -- rearrange the tables in a JOIN.
Turn the query inside out. By this, I mean to do the minimum amount of work to do the main JOIN. Collect mostly id(s). Then do the dependent subqueries in an outer select. Something like:
SELECT ... ( SELECT ... ), ...
FROM ( SELECT a1.id
FROM a AS a1
JOIN b ON ..
JOIN c ON .. )
JOIN a AS a2 ON a2.id = a1.id
JOIN d ON ...
The "inside-out" kludge may eliminate the need for the GROUP BY. (Your query is too complex for me to see for sure.) If so, then I call the problem "explode-implode" -- Your query first JOINs, producing a temp table with lots of rows ("explodes"). Then it does a GROUP BY ("implodes").
More
These indexes will probably help:
status: (auftrag_id, typ, datum, aenderung)
rechnungen: (beglichen, kunde_id, rechnung_tag)
rechnungen: (rechnungsnummer, beglichen)
zahlungsplanneu: (auftrag_id, rechnungsnummer)
kunden_keywords: (kunde_id, keyword) -- (unless `kunde_id` is the PK)
(I see from all 3 EXPLAINs that you probably have sufficient indexes on kunden_keywords and status. Show me what indexes you have, so I can see if the existing indexes are as good as my suggestions.) "Using index" == "covering index".
Near the end is this LEFT JOIN, but I did not spot any use for the table; perhaps it can be removed?
left join `kunden_keywords` on(`angebot`.`kunde_id` = `kunden_keywords`.`kunde_id`))

MySQL update query where in subquery slow

I'm having issues with MySQL query running extremely slow. It takes about 2 min for each UPDATE to process.
This is the query:
UPDATE msn
SET is_disable = 1
WHERE mid IN
(
SELECT mid from link
WHERE rid = ${param.rid}
);
So my question is, I would like to know how the performance of the UPDATE statement will be affected if the result of the subquery is 0 or NULL. Because I think that maybe the process is slow because the result of the subquery is 0 or NULL.
Thanks a lot in advance.
The issue here is that the subquery following IN has to execute, whether or not it returns any records. I would probably express your update using exists logic:
UPDATE msn m
SET is_disable = 1
WHERE EXISTS (SELECT 1 FROM link l WHERE m.mid = l.mid AND l.rid = ${param.rid});
Then, add the following index to the link table:
CREATE INDEX idx ON link (mid, rid);
You could also try and compare against this version of the index:
CREATE INDEX idx ON link (rid, mid);

Update query on linked MySQL table from SQL Server

I have a MS SQL Server with a linked MySQL server. I need to partially synchronize a table between the two servers. This is done in three steps and based on a condition:
Delete all rows from the MySQL table that do not satisfy the condition
Insert all new rows in the MySQL table that satisfy the condition
Update all rows in the MySQL server that satisfy the condition and have different data between MySQL and SQL Server
Steps 1 and 2 always run without a problem. But step 3 won't run if there is anything to update. The query fails with the following exception: The rowset was using optimistic concurrency and the value of a column has been changed after the containing row was last fetched or resynchronized.].
This is the query that is executed:
update mysqlserver...subscribers
set Firstname = Voornaam,
Middlename = Tussenvoegsel,
Surname = Achternaam,
email = e-mail
from mysqlserver...subscribers as b, tblkandidaat
where (b.kandidaatid = tblkandidaat.kandidaatid) and
(tblkandidaat.kandidaatid in (
select subsc.kandidaatid
from mysqlserver...subscribers subsc inner join tblKandidaat
on (subsc.kandidaatid=tblKandidaat.kandidaatid)
where (subsc.list=1) and
((subsc.firstname COLLATE Latin1_General_CI_AI <> Voornaam
or (subsc.middlename COLLATE Latin1_General_CI_AI <> Tussenvoegsel)
or (subsc.surname COLLATE Latin1_General_CI_AI <> tblKandidaat.Achternaam)
or (subsc.email COLLATE Latin1_General_CI_AI <> tblKandidaat.e-mail))
));
Anybody has an idea about how to prevent this?
Try this query instead:
update b
set
Firstname = Voornaam,
Middlename = Tussenvoegsel,
Surname = Achternaam,
email = e-mail
from
mysqlserver...subscribers b
inner join tblkandidaat k on b.kandidaatid = k.kandidaatid
where
b.list=1
and (
b.firstname COLLATE Latin1_General_CI_AI <> k.Voornaam
or b.middlename COLLATE Latin1_General_CI_AI <> k.Tussenvoegsel
or b.surname COLLATE Latin1_General_CI_AI <> k.Achternaam
or b.email COLLATE Latin1_General_CI_AI <> k.e-mail
)
It's best practice to use ANSI joins and properly separate JOIN conditions from WHERE conditions.
It's more readable to use aliases for all your tables instead of long table names throughout the query.
It's best to use the aliases for all column references instead of leaving them blank. Not only is it a good habit and makes things clearer, it can avoid some very nasty errors in inner-vs-outer table references.
If performance is also an issue: linked server joins sometimes devolve to row-by-row processing in the DB data provider engine. I have found cases where breaking out part of a complex join across a linked server into a regular join followed by a cross apply hugely reduced the unneeded rows being fetched and greatly improved performance. (This was essentially doing a bookmark lookup, aka a nonclustered index scan followed by clustered index seek using those values). While this may not perfectly mesh with how MySql works, it's worth experimenting with. If you can do any kind of trace to see the actual queries being performed on the MySql side you might get insight as to other methods to use for increased performance.
Another performance-improving idea is to copy the remote data locally to a temp table, and add an ActionRequired column. Then update the temp table so it looks like it should, putting 'U', 'I', or 'D' in ActionRequired, then perform the merge/upsert across the linked server with a simple equijoins on the primary key, using ActionRequired. Careful attention to possible race conditions where the remote database could be updated during processing are in order.
Beware of nulls... are all those columns you're comparing non-nullable?
You might try creating a second table in mysql, doing an insert from sql-server into that empty table for all changed lines and doing Step 3 between the two mysql tables.
try to not using sub-query in your where statement. Sub-query may return more than one row, and then you got the error.
try creating a view which has source, destination and has_changed column between and has linked tables joined. then you can issue query
update vi_upd_linkedtable set destination=source where has_changed=1
This is a shot in the dark, but try adding FOR UPDATE or LOCK IN SHARE MODE to the end of your subselect query. This will tell MySQL that you are trying to select stuff for an update within your transaction and should create a row level write lock during the select rather than during the update.
From 13.2.8.3. SELECT ... FOR UPDATE and SELECT ... LOCK IN SHARE MODE Locking Reads:
SELECT ... LOCK IN SHARE MODE sets a
shared mode lock on the rows read. A
shared mode lock enables other
sessions to read the rows but not to
modify them. The rows read are the
latest available, so if they belong to
another transaction that has not yet
committed, the read blocks until that
transaction ends.
For rows where the names are the same, the update is a no-op.
You don't save any work by trying to filter out the rows where they're the same, because the data still has to be compared across the link. So I can't see any benefit to the subquery.
Therefore the query can be simplified a lot:
update mysqlserver...subscribers
set Firstname = Voornaam,
Middlename = Tussenvoegsel,
Surname = Achternaam,
email = e-mail
from mysqlserver...subscribers as b join tblkandidaat
on b.kandidaatid = tblkandidaat.kandidaatid;
where b.list = 1;
Eliminating the subquery might make your the locking issue go away. MySQL does have some issues combining a select and an update on the same table in a given query.
Try this. I wrote several of these today.
update b
set
Firstname = Voornaam,
Middlename = Tussenvoegsel,
Surname = Achternaam,
email = e-mail
from
mysqlserver...subscribers b
inner join tblkandidaat k on b.kandidaatid = k.kandidaatid
where
b.list=1
and (
ISNULL(b.firstname,'') COLLATE Latin1_General_CI_AI <> ISNULL(k.Voornaam,'')
or ISNULL(b.middlename,'') COLLATE Latin1_General_CI_AI <> ISNULL(k.Tussenvoegsel,'')
or ISNULL(b.surname,'') COLLATE Latin1_General_CI_AI <> ISNULL(k.Achternaam,'')
or ISNULL(b.email,'') COLLATE Latin1_General_CI_AI <> ISNULL(k.e-mail,'')
)
Using the ISNULL allows you to null your columns.

How to find duplicates in 2 columns not 1

I have a MySQL database table with two columns that interest me. Individually they can each have duplicates, but they should never have a duplicate of BOTH of them having the same value.
stone_id can have duplicates as long as for each upsharge title is different, and in reverse. But say for example stone_id = 412 and upcharge_title = "sapphire" that combination should only occur once.
This is ok:
stone_id = 412 upcharge_title = "sapphire"
stone_id = 412 upcharge_title = "ruby"
This is NOT ok:
stone_id = 412 upcharge_title = "sapphire"
stone_id = 412 upcharge_title = "sapphire"
Is there a query that will find duplicates in both fields? And if possible is there a way to set my data-base to not allow that?
I am using MySQL version 4.1.22
You should set up a composite key between the two fields. This will require a unique stone_id and upcharge_title for each row.
As far as finding the existing duplicates try this:
select stone_id,
upcharge_title,
count(*)
from your_table
group by stone_id,
upcharge_title
having count(*) > 1
I found it helpful to add a unqiue index using an "ALTER IGNORE" which removes the duplicates and enforces unique records which sounds like you would like to do. So the syntax would be:
ALTER IGNORE TABLE `table` ADD UNIQUE INDEX(`id`, `another_id`, `one_more_id`);
This effectively adds the unique constraint meaning you will never have duplicate records and the IGNORE deletes the existing duplicates.
You can read more about eh ALTER IGNORE here: http://mediakey.dk/~cc/mysql-remove-duplicate-entries/
Update: I was informed by #Inquisitive that this may fail in versions of MySql> 5.5 :
It fails On MySQL > 5.5 and on InnoDB table, and in Percona because of
their InnoDB fast index creation feature [http://bugs.mysql.com/bug.php?id=40344]. In this case
first run set session old_alter_table=1 and then the above command
will work fine
Update - ALTER IGNORE Removed In 5.7
From the docs
As of MySQL 5.6.17, the IGNORE clause is deprecated and its use
generates a warning. IGNORE is removed in MySQL 5.7.
One of the MySQL dev's give two alternatives:
Group by the unique fields and delete as seen above
Create a new table, add a unique index, use INSERT IGNORE, ex:
CREATE TABLE duplicate_row_table LIKE regular_row_table;
ALTER TABLE duplicate_row_table ADD UNIQUE INDEX (id, another_id);
INSERT IGNORE INTO duplicate_row_table SELECT * FROM regular_row_table;
DROP TABLE regular_row_table;
RENAME TABLE duplicate_row_table TO regular_row_table;
But depending on the size of your table, this may not be practical
You can find duplicates like this..
Select
stone_id, upcharge_title, count(*)
from
particulartable
group by
stone_id, upcharge_title
having
count(*) > 1
To find the duplicates:
select stone_id, upcharge_title from tablename group by stone_id, upcharge_title having count(*)>1
To constrain to avoid this in future, create a composite unique key on these two fields.
Incidentally, a composite unique constraint on the table would prevent this from occurring in the first place.
ALTER TABLE table
ADD UNIQUE(stone_id, charge_title)
(This is valid T-SQL. Not sure about MySQL.)
this SO post helped me, but i too wanted to know how to delete and keep one of the rows... here's a PHP solution to delete the duplicate rows and keep one (in my case there were only 2 columns and it is in a function for clearing duplicate category associations)
$dupes = $db->query('select *, count(*) as NUM_DUPES from PRODUCT_CATEGORY_PRODUCT group by fkPRODUCT_CATEGORY_ID, fkPRODUCT_ID having count(*) > 1');
if (!is_array($dupes))
return true;
foreach ($dupes as $dupe) {
$db->query('delete from PRODUCT_CATEGORY_PRODUCT where fkPRODUCT_ID = ' . $dupe['fkPRODUCT_ID'] . ' and fkPRODUCT_CATEGORY_ID = ' . $dupe['fkPRODUCT_CATEGORY_ID'] . ' limit ' . ($dupe['NUM_DUPES'] - 1);
}
the (limit NUM_DUPES - 1) is what preserves the single row...
thanks all
This is what worked for me (ignoring null and blank). Two different email columns:
SELECT *
FROM members
WHERE email IN (SELECT soemail
FROM members
WHERE NOT Isnull(soemail)
AND soemail <> '');