How to find duplicates in 2 columns not 1

How to find duplicates in 2 columns not 1 - mysql

I have a MySQL database table with two columns that interest me. Individually they can each have duplicates, but they should never have a duplicate of BOTH of them having the same value.
stone_id can have duplicates as long as for each upsharge title is different, and in reverse. But say for example stone_id = 412 and upcharge_title = "sapphire" that combination should only occur once.
This is ok:
stone_id = 412 upcharge_title = "sapphire"
stone_id = 412 upcharge_title = "ruby"
This is NOT ok:
stone_id = 412 upcharge_title = "sapphire"
stone_id = 412 upcharge_title = "sapphire"
Is there a query that will find duplicates in both fields? And if possible is there a way to set my data-base to not allow that?
I am using MySQL version 4.1.22

You should set up a composite key between the two fields. This will require a unique stone_id and upcharge_title for each row.
As far as finding the existing duplicates try this:
select stone_id,
upcharge_title,
count(*)
from your_table
group by stone_id,
upcharge_title
having count(*) > 1

I found it helpful to add a unqiue index using an "ALTER IGNORE" which removes the duplicates and enforces unique records which sounds like you would like to do. So the syntax would be:
ALTER IGNORE TABLE `table` ADD UNIQUE INDEX(`id`, `another_id`, `one_more_id`);
This effectively adds the unique constraint meaning you will never have duplicate records and the IGNORE deletes the existing duplicates.
You can read more about eh ALTER IGNORE here: http://mediakey.dk/~cc/mysql-remove-duplicate-entries/
Update: I was informed by #Inquisitive that this may fail in versions of MySql> 5.5 :
It fails On MySQL > 5.5 and on InnoDB table, and in Percona because of
their InnoDB fast index creation feature [http://bugs.mysql.com/bug.php?id=40344]. In this case
first run set session old_alter_table=1 and then the above command
will work fine
Update - ALTER IGNORE Removed In 5.7
From the docs
As of MySQL 5.6.17, the IGNORE clause is deprecated and its use
generates a warning. IGNORE is removed in MySQL 5.7.
One of the MySQL dev's give two alternatives:
Group by the unique fields and delete as seen above
Create a new table, add a unique index, use INSERT IGNORE, ex:
CREATE TABLE duplicate_row_table LIKE regular_row_table;
ALTER TABLE duplicate_row_table ADD UNIQUE INDEX (id, another_id);
INSERT IGNORE INTO duplicate_row_table SELECT * FROM regular_row_table;
DROP TABLE regular_row_table;
RENAME TABLE duplicate_row_table TO regular_row_table;
But depending on the size of your table, this may not be practical

You can find duplicates like this..
Select
stone_id, upcharge_title, count(*)
from
particulartable
group by
stone_id, upcharge_title
having
count(*) > 1

To find the duplicates:
select stone_id, upcharge_title from tablename group by stone_id, upcharge_title having count(*)>1
To constrain to avoid this in future, create a composite unique key on these two fields.

Incidentally, a composite unique constraint on the table would prevent this from occurring in the first place.
ALTER TABLE table
ADD UNIQUE(stone_id, charge_title)
(This is valid T-SQL. Not sure about MySQL.)

this SO post helped me, but i too wanted to know how to delete and keep one of the rows... here's a PHP solution to delete the duplicate rows and keep one (in my case there were only 2 columns and it is in a function for clearing duplicate category associations)
$dupes = $db->query('select *, count(*) as NUM_DUPES from PRODUCT_CATEGORY_PRODUCT group by fkPRODUCT_CATEGORY_ID, fkPRODUCT_ID having count(*) > 1');
if (!is_array($dupes))
return true;
foreach ($dupes as $dupe) {
$db->query('delete from PRODUCT_CATEGORY_PRODUCT where fkPRODUCT_ID = ' . $dupe['fkPRODUCT_ID'] . ' and fkPRODUCT_CATEGORY_ID = ' . $dupe['fkPRODUCT_CATEGORY_ID'] . ' limit ' . ($dupe['NUM_DUPES'] - 1);
}
the (limit NUM_DUPES - 1) is what preserves the single row...
thanks all

This is what worked for me (ignoring null and blank). Two different email columns:
SELECT *
FROM members
WHERE email IN (SELECT soemail
FROM members
WHERE NOT Isnull(soemail)
AND soemail <> '');

Related

convert mysql (on duplicate key ) query to oracle merge

INSERT INTO table1(id,dept_id,name,description,creation_time,modified_time)
VALUES('id','dept_id','name','description','creation_time','modified_time')
ON DUPLICATE KEY UPDATE dept_id=VALUES(dept_id),name=VALUES(name),
description=VALUES(description),creation_time=VALUES(creation_time),
modified_time=VALUES(modified_time)
I used the below oracle to convert the above mysql query. The query fails. Can you please help me figure out what is wrong with the oracle query.
Merge into table1 t1 using
(VALUES ('id','dept_id','name','description','creation_time','modified_time')) as temp
(id,dept_id,name,description,creation_time,modified_time) on t1. id = temp.id
WHEN MATCHED THEN UPDATE SET dept_id=t1.dept_id, description=t1.description, name=t1.name,
creation_time=t1.creation_time, modified_time=t1.modified_time
WHEN NOT MATCHED THEN INSERT (id,dept_id,name,description,creation_time,modified_time)
VALUES ('id','dept_id','name','description','creation_time','modified_time')

To do this, you need to use a table or subquery in the using clause (in your case, you need a subquery).
In Oracle, you can use the dual table if you need to select something without needing to select from an actual table; this is a table that contains only a single row and a single column.
Your merge statement should therefore look something like:
MERGE INTO table1 tgt
USING (SELECT 'id' id,
'dept_id' dept_id,
'name' NAME,
'description' description,
'creation_time' creation_time,
'modified_time' modified_time
FROM dual) src
ON tgt.id = src.id
WHEN MATCHED THEN
UPDATE
SET tgt.dept_id = src.dept_id,
tgt.description = src.description,
tgt.name = src.name,
tgt.creation_time = src.creation_time,
tgt.modified_time = src.modified_time
WHEN NOT MATCHED THEN
INSERT
(tgt.id,
tgt.dept_id,
tgt.name,
tgt.description,
tgt.creation_time,
tgt.modified_time)
VALUES
(src.id,
src.dept_id,
src.name,
src.description,
src.creation_time,
src.modified_time);
Note how the when not matched clause uses the columns from the source subquery, rather than using the literal values you supplied. (I assume that in your actual code, these literal values are actually variables; creation_time is a pretty odd value to store in a column labelled creation_time!).
I've also switched the aliases to make it clearer where you're merging to and from; I find this makes it easier to understand what the merge statement is doing. YMMV.

MySQL: DELETE FROM using a multi conditional "not equal to" WHERE

Hah. So. I've been playing with this particular query where I'm trying to delete a large swath of rows but I end up not doing what I'm expecting. I've run various variations of this query and I'm not having any luck.
Basically I'm trying to do this:
DELETE FROM table WHERE country <> 'MX' OR 'CA';
OR
DELETE FROM table WHERE foobar NOT IN ( 12 OR 5 );
OR
DELETE FROM table WHERE foobar NOT IN ( 'foo' ) OR ( 'bar' );
And a couple other ideas I had that weren't working. I'm just uploading a fresh dataset for the umpteenth time I'd appreciate some help in the right direction.

Actually, logical operators like "OR" are applied to conditions but not values.
So, you can use either
DELETE FROM table WHERE country <> 'MX' AND country <> 'CA';
or
DELETE FROM table WHERE country NOT IN ('MX', 'CA')
The second one is preferable.

Try this:
DELETE FROM table WHERE country NOT IN ('MX', 'CA');

subquery in where clause of UPDATE statement

I have the database of ATM card in which there are fields account_no,card_no,is_blocked,is_activated,issue_date
Fields account number and card numbers are not unique as old card will be expired and marked as is_block=Y and another record with same card number ,account number will be inserted into new row with is_blocked=N . Now i need to update is_blocked/is_activated with help of issue_date i.e
UPDATE card_info set is_blocked='Y' where card_no='6396163270002509'
AND opening_date=(SELECT MAX(opening_date) FROM card_info WHERE card_no='6396163270002509')
but is doesn't allow me to do so
it throws following error
1093 - You can't specify target table 'card_info' for update in FROM clause

Try this instead:
UPDATE card_info ci
INNER JOIN
(
SELECT card_no, MAX(opening_date) MaxOpeningDate
FROM card_info
GROUP BY card_no
) cm ON ci.card_no = cm.card_no AND ci.opening_date = cm.MaxOpeningDate
SET ci.is_blocked='Y'
WHERE ci.card_no = '6396163270002509'

That's one of those stupid limitations of the MySQL parser. The usual way to solve this is to use a JOIN query as Mahmoud has shown.
The (at least to me) surprising part is that it really seems a parser problem, not a problem of the engine itself because if you wrap the sub-select into a derived table, this does work:
UPDATE card_info
SET is_blocked='Y'
WHERE card_no = '6396163270002509'
AND opening_date = ( select max_date
from (
SELECT MAX(opening_date) as_max_date
FROM card_info
WHERE card_no='6396163270002509') t
)

sql query for deleting rows with NOT IN using 2 columns

I have a table with a composite key composed of 2 columns, say Name and ID. I have some service that gets me the keys (name, id combination) of the rows to keep, the rest i need to delete. If it was with only 1 row , I could use
delete from table_name where name not in (list_of_valid_names)
but how do I make the query so that I can say something like
name not in (valid_names) and id not in(valid_ids)
// this wont work since they separately dont identity a unique record or will it?

Use mysql's special "multiple value" in syntax:
delete from table_name
where (name, id) not in (select name, id from some_table where some_condition);
If your list is a literal list, you can still use this approach:
delete from table_name
where (name, id) not in (select 'john', 1 union select 'sally', 2);

Actually, no I retract my comment about needing special juice or being stuck with (AND OR'ing all your options).
Since you have a list of values of what you want to retain, dump that into a temporary table. Then do a delete against the base table for what does not exist in the temporary table (left outer join). I suck at mysql syntax or I'd cobble together your query. Psuedocode is approximate
DELETE
B
FROM
BASE B
LEFT OUTER JOIN
#RETAIN R
ON R.key1 = B.key1
AND R.key2 = B.key
WHERE
R.key1 IS NULL

The NOT EXISTS version:
DELETE
b
FROM
BaseTable b
WHERE
NOT EXISTS
( SELECT
*
FROM
RetainTable r
WHERE
(r.key1, r.key2) = (b.key1, b.key2)
)

Optimizing MySql query

I would like to know if there is a way to optimize this query :
SELECT
jdc_organizations_activities.*,
jdc_organizations.orgName,
CONCAT(jos_hpj_users.firstName, ' ', jos_hpj_users.lastName) AS nameContact
FROM jdc_organizations_activities
LEFT JOIN jdc_organizations ON jdc_organizations_activities.organizationId =jdc_organizations.id
LEFT JOIN jos_hpj_users ON jdc_organizations_activities.contact = jos_hpj_users.userId
WHERE jdc_organizations_activities.status LIKE 'proposed'
ORDER BY jdc_organizations_activities.creationDate DESC LIMIT 0 , 100 ;
Now When i see the query log :
Query_time: 2
Lock_time: 0
Rows_sent: 100
Rows_examined: **1028330**
Query Profile :
2) Should i put indexes on the tables having in mind that there will be a lot of inserts and updates on those tables .
From Tizag Tutorials :
Indexes are something extra that you
can enable on your MySQL tables to
increase performance,cbut they do have
some downsides. When you create a new
index MySQL builds a separate block of
information that needs to be updated
every time there are changes made to
the table. This means that if you
are constantly updating, inserting and
removing entries in your table this
could have a negative impact on
performance.
Update after adding indexes and removing the lower() , group by and the wildcard
Time: 0.855ms

Add indexes (if you haven't) at:
Table: jdc_organizations_activities
simple index on creationDate
simple index on status
simple index on organizationId
simple index on contact
And rewrite the query by removing call to function LOWER() and using = or LIKE. It depends on the collation you have defined for this table but if it's a case insensitive one (like latin1), it will still show same results. Details can be found at MySQL docs: case-sensitivity
SELECT a.*
, o.orgName
, CONCAT(u.firstName,' ',u.lastName) AS nameContact
FROM jdc_organizations_activities AS a
LEFT JOIN jdc_organizations AS o
ON a.organizationId = o.id
LEFT JOIN jos_hpj_users AS u
ON a.contact = u.userId
WHERE a.status LIKE 'proposed' --- or (a.status = 'proposed')
ORDER BY a.creationDate DESC
LIMIT 0 , 100 ;
It would be nice if you posted the execution plan (as it is now) and after these changes.
UPDATE
A compound index on (status, creationDate) may be more appopriate (as Darhazer suggested) for this query, instead of the simple (status). But this is more guess work. Posting the plans (after running EXPLAIN query) would provide more info.
I also assumed that you already have (primary key) indexes on:
jdc_organizations.id
jos_hpj_users.userId

Post the result from EXPLAIN
Generally you need indexes on jdc_organizations_activities.organizationId, jdc_organizations_activities.contact, composite index on jdc_organizations_activities.status and jdc_organizations_activities.creationDate
Why you are using LIKE query for constant lookup (you have no wildcard symbols, or maybe you've edited the query)
The index on status can be used for LIKE 'proposed%' but can't be used for LIKE '%proposed%' - in the later case better leave only index on creationDate

What indexes do you have on these tables? Specifically, have you indexed jdc_organizations_activities.creationDate?
Also, why do you need to group by jdc_organizations_activities.id? Isn't that unique per row, or can an organization have multiple contacts?

The slowness is because mysql has to apply lower() to every row. The solution is to create a new column to store the result of lower, then put an index on that column. Let's also use a trigger to make the solution more luxurious. OK, here we go:
a) Add a new column to hold the lower version of status (make this varchar as wide as status):
ALTER TABLE jdc_organizations_activities ADD COLUMN status_lower varchar(20);
b) Populate the new column:
UPDATE jdc_organizations_activities SET status_lower = lower(status);
c) Create an index on the new column
CREATE INDEX jdc_organizations_activities_status_lower_index
ON jdc_organizations_activities(status_lower);
d) Define triggers to keep the new column value correct:
DELIMITER ~;
CREATE TRIGGER jdc_organizations_activities_status_insert_trig
BEFORE INSERT ON jdc_organizations_activities
FOR EACH ROW
BEGIN
NEW.status_lower = lower(NEW.status);
END;
CREATE TRIGGER jdc_organizations_activities_status_update_trig
BEFORE UPDATE ON jdc_organizations_activities
FOR EACH ROW
BEGIN
NEW.status_lower = lower(NEW.status);
END;~
DELIMITER ;
Your query should now fly.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to find duplicates in 2 columns not 1 - mysql

You should set up a composite key between the two fields. This will require a unique stone_id and upcharge_title for each row. As far as finding the existing duplicates try this: select stone_id, upcharge_title, count() from your_table group by stone_id, upcharge_title having count() > 1

You can find duplicates like this.. Select stone_id, upcharge_title, count() from particulartable group by stone_id, upcharge_title having count() > 1

To find the duplicates: select stone_id, upcharge_title from tablename group by stone_id, upcharge_title having count(*)>1 To constrain to avoid this in future, create a composite unique key on these two fields.

Incidentally, a composite unique constraint on the table would prevent this from occurring in the first place. ALTER TABLE table ADD UNIQUE(stone_id, charge_title) (This is valid T-SQL. Not sure about MySQL.)

This is what worked for me (ignoring null and blank). Two different email columns: SELECT * FROM members WHERE email IN (SELECT soemail FROM members WHERE NOT Isnull(soemail) AND soemail <> '');

Related

convert mysql (on duplicate key ) query to oracle merge

MySQL: DELETE FROM using a multi conditional "not equal to" WHERE

subquery in where clause of UPDATE statement

sql query for deleting rows with NOT IN using 2 columns

Optimizing MySql query

Categories

Resources

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to find duplicates in 2 columns not 1 - mysql

You should set up a composite key between the two fields. This will require a unique stone_id and upcharge_title for each row. As far as finding the existing duplicates try this: select stone_id, upcharge_title, count(*) from your_table group by stone_id, upcharge_title having count(*) > 1

You can find duplicates like this.. Select stone_id, upcharge_title, count(*) from particulartable group by stone_id, upcharge_title having count(*) > 1

To find the duplicates: select stone_id, upcharge_title from tablename group by stone_id, upcharge_title having count(*)>1 To constrain to avoid this in future, create a composite unique key on these two fields.

Incidentally, a composite unique constraint on the table would prevent this from occurring in the first place. ALTER TABLE table ADD UNIQUE(stone_id, charge_title) (This is valid T-SQL. Not sure about MySQL.)

This is what worked for me (ignoring null and blank). Two different email columns: SELECT * FROM members WHERE email IN (SELECT soemail FROM members WHERE NOT Isnull(soemail) AND soemail <> '');

Related

convert mysql (on duplicate key ) query to oracle merge

MySQL: DELETE FROM using a multi conditional "not equal to" WHERE

subquery in where clause of UPDATE statement

sql query for deleting rows with NOT IN using 2 columns

Optimizing MySql query

Categories

Resources

You should set up a composite key between the two fields. This will require a unique stone_id and upcharge_title for each row. As far as finding the existing duplicates try this: select stone_id, upcharge_title, count() from your_table group by stone_id, upcharge_title having count() > 1

You can find duplicates like this.. Select stone_id, upcharge_title, count() from particulartable group by stone_id, upcharge_title having count() > 1