Find duplicates per parent id

Find duplicates per parent id - mysql

I have two DB tabels which form a parent child relationship from Planung to Aufgabe:
Planung:
CREATE TABLE `planung` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`Bezeichnung` varchar(255) DEFAULT NULL,
-- lots of ohter columns
PRIMARY KEY (`id`),
) ENGINE=InnoDB
Aufgabe:
CREATE TABLE `aufgabe` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`planung_id` bigint(20) DEFAULT NULL, -- foreign key to Planung.id
`Nummer` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
-- lots of ohter columns
) ENGINE=InnoDB
I'm looking for a query which gives me all Planung.id and Aufgabe.ID for all duplicate Nummer per Planung. Or in other words: For every Planung Aufgabe.Nummer must be unique, I want to check whehter this is really the case in my DB (I know it isn't).

SELECT planung_id, GROUP_CONCAT(id) AS aufgabe_id, Nummer, COUNT(1) as num_duplicates
FROM aufgabe
GROUP BY planung_id, Nummer
HAVING COUNT(1) > 1
This query gives all planung_id which has duplicate Nummer and displays them like:
planung_id aufgabe_id Nummer num_duplicates
1 2,5,8 1 3
Which means, for Planung 1, there exists three Aufgabe with Nummer 1, and they are 2,5 and 8.
Edit: Shamelessly stolen from #dgw's comment:
Once you have run this query, and corrected all the duplicates. Add a unique index to aufgabe {planung_id, Nummer} to ensure that the database maintains this constraint:
ALTER TABLE aufgabe
ADD CONSTRAINT UNIQUE uq_planung_id_and_nummer (planung_id, Nummer)

select p.id, a.id, a.Nummer from planung p
inner join aufgabe a on a.planung_id = p.id
group by p.id, a.id, a.Nummer having count(*) > 1

This will give you all elements from planung being more than once a parent.
SELECT planung_id,nummer,COUNT(*)
FROM aufgabe
GROUP BY 1,2 HAVING COUNT(*)>1 ;

This will give you every planung_id that has multiple Nummer values:
SELECT planung_id
FROM aufgabe
GROUP BY planung_id
HAVING COUNT(DISTINCT Nummer) > 1
If you also need the aufgabe.id, you can do this:
SELECT *
FROM aufgabe
WHERE
planung_id IN (
<query above>
)
If you need to enforce that at all times, consider adding a key on {planung_id, Nummer} (as #dgw has already suggested).

Related

How to optimize query for Max(Date) in MySQL

I have this SQL Query:
SELECT company.*, salesorder.lastOrderDate
FROM company
INNER JOIN
(
SELECT companyId, MAX(orderDate) AS lastOrderDate
FROM salesorder
GROUP BY companyId
) salesorder ON salesorder.companyId = company.companyId;
This gives me one extra column at the end of a company master table with their last order date.
Problem is, when analyzing this query, it seems like it's not that efficient:
Is there a way to make this more efficient?
salesorder:
orderId, companyId, orderDate
1 333 2015-01-01
2 555 2016-01-01
3 333 2017-01-01
company
companyId, name
333 Acme
555 Microsoft
Query:
companyId, name, lastOrderDate
333 Acme 2017-01-01
555 Microsoft 2016-01-01
EXPLAIN SELECT:
CREATE TABLE `salesorder` (
`orderId` int(11) NOT NULL,
`companyId` int(11) DEFAULT NULL,
`orderDate` date DEFAULT NULL,
PRIMARY KEY (`orderId`),
UNIQUE KEY `orderId_UNIQUE` (`orderId`) /*!80000 INVISIBLE */,
KEY `testComposite` (`companyId`,`orderDate`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
CREATE TABLE `company` (
`companyId` int(11) NOT NULL,
`name` varchar(45) DEFAULT NULL,
PRIMARY KEY (`companyId`),
UNIQUE KEY `companyId_UNIQUE` (`companyId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

It looks like you could simplify the query like this:
SELECT c.*, MAX(o.OrderDate) As lastOrderDate
FROM company c
INNER JOIN salesorder o on o.companyId = c.companyId
GROUP BY <list all company fields here>;
MySql might even let you get away with just c.companyId in the GROUP BY clause, but that's not really standard and not great practice.

Add the composite index with the columns in this order:
INDEX(companyId, orderDate)
Single column indexes are not as efficient (in this query).
Since a PRIMARY KEY is a unique key, do not redundantly declare a UNIQUE key.
With only a few rows in the table, you cannot trust EXPLAIN (and Explain-like output) to say how bad the query will be. Try it with at least a few dozen rows. And provide EXPLAIN FORMAT=JSON SELECT ...
Note that it says "Using index". That says that the subquery in question can be performed entirely inside the index's BTree. This is 'good'. (I presume you did the EXPLAIN after adding my suggested index?)
Your previous image showed a lot of rows; what gives?
I'm still puzzled as to why there are 3 rows in the EXPLAIN and two table scans. Anyway, here is another formulation to try:
SELECT c.*,
( SELECT MAX(orderDate)
FROM salesorder
WHERE companyId = c.companyId
) AS lastOrderDate
FROM company AS c;
(and my INDEX is still important)

How do I SELECT from a table with a JOIN with multiple matching values?

I have the following simple query that works just fine when there is one keyword to match:
SELECT gc.id, gc.name
FROM gift_card AS gc
JOIN keyword ON gc.id = keyword.gc_id
WHERE keyword = 'mini'
GROUP BY gc.id
ORDER BY id DESC
What I want to do is find the id's that match at least two of the keywords I provide. I thought just adding a simple AND would work but I get blank results.
SELECT gc.id, gc.name
FROM gift_card AS gc
JOIN keyword ON gc.id = keyword.gc_id
WHERE keyword = 'mini'
AND keyword = '2012'
GROUP BY gc.id
ORDER BY id DESC
Obviously SQL is not my strong suit so I am looking for some help one what I am doing wrong here.
Here are my table structures:
CREATE TABLE `gift_card` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=52 DEFAULT CHARSET=utf8;
CREATE TABLE `keyword` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`gc_id` int(11) NOT NULL,
`keyword` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`),
UNIQUE KEY `dupes_UNIQUE` (`gc_id`,`keyword`)
) ENGINE=InnoDB AUTO_INCREMENT=477 DEFAULT CHARSET=utf8;

No, and does not work. A column cannot have two different values in one row.
Instead, or . . . and a bit more logic:
SELECT gc.id, gc.name
FROM gift_card gc JOIN
keyword k
ON gc.id = k.gc_id
WHERE k.keyword IN ('mini', '2012')
GROUP BY gc.id
HAVING COUNT(*) = 2 -- both match
ORDER BY id DESC;
It is a good idea to qualify all column names in a query that has more than one table reference.

Mysql. Select where field=1 or field=2 with IF

i need some query.
CREATE TABLE `location_areas_localized` (
`id` int(11) DEFAULT NULL,
`lang_index` varchar(5) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
KEY `id` (`id`),
KEY `lang_index` (`lang_index`),
KEY `name` (`name`),
FULLTEXT KEY `name_2` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO `location_areas_localized` (`id`, `lang_index`,`name`)
VALUES
(1,'ru','Нью Йорк'),
(1,'en','New York'),
(2,'en','Boston'),
(2,'ch','波士顿')
;
Logic of search.
If row with lang_index='ru' AND id IN(1,2) found. it must return all with lang_index='ru'
If one or more rows with lang_index='ru' not exists But exists with lang_index='en' and with some id.
Then it must return all exists with land_index='ru' AND id IN(1,2) and all that not found with lang_index='ru' but found with lang_index='en' (in table - all rows with lang_index='en' always exists)
See on sqlfiddle
I need only one result per id. I tried GROUP BY id but its not works correctly.
Output must be
1,'ru','Нью Йорк'
2,'en','Boston' (because lang_index='ru' with id 2 not found)

SELECT
coalesce(max(CASE WHEN lang_index='ru' THEN name ELSE null END), name) as name
FROM
location_areas_localized
WHERE
id IN (1,2)
AND (lang_index='en' OR lang_index='ru')
group by
id
ORDER BY
FIELD(lang_index,'ru','en');

Without using aggregation functions, it only takes the first matching row. The subquery with ORDER BY enforce the fact that for the same id either the "ru" (or "en", if "ru" is not present) row is the first one.
SELECT *
FROM(
SELECT *
FROM location_areas_localized
ORDER BY FIELD(lang_index,'ru','en','ch')
) as inv
WHERE id IN (1,2)
GROUP BY id
See SQLFiddle example

MySQL JOIN time reduction

This query is taking over a minute to complete:
SELECT keyword, count(*) as 'Number of Occurences'
FROM movie_keyword
JOIN
keyword
ON keyword.`id` = movie_keyword.`keyword_id`
GROUP BY keyword
ORDER BY count(*) DESC
LIMIT 5
Every keyword has an ID associated with it (keyword_id column). And that ID is used to look up the actual keyword from the keyword table.
movie_keyword has 2.8 million rows
keyword has 127,000
However to return just the most used keyword_id's takes only 1 second:
SELECT keyword_id, count(*)
FROM movie_keyword
GROUP BY keyword_id
ORDER BY count(*) DESC
LIMIT 5
Is there a more efficient way of doing this?
Output with EXPLAIN:
1 SIMPLE keyword ALL PRIMARY NULL NULL NULL 125405 Using temporary; Using filesort
1 SIMPLE movie_keyword ref idx_keywordid idx_keywordid 4 imdb.keyword.id 28 Using index
Structure:
CREATE TABLE `movie_keyword` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`movie_id` int(11) NOT NULL,
`keyword_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `idx_mid` (`movie_id`),
KEY `idx_keywordid` (`keyword_id`),
KEY `keyword_ix` (`keyword_id`),
CONSTRAINT `movie_keyword_keyword_id_exists` FOREIGN KEY (`keyword_id`) REFERENCES `keyword` (`id`),
CONSTRAINT `movie_keyword_movie_id_exists` FOREIGN KEY (`movie_id`) REFERENCES `title` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=4256379 DEFAULT CHARSET=latin1;
CREATE TABLE `keyword` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`keyword` text NOT NULL,
`phonetic_code` varchar(5) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_keyword` (`keyword`(5)),
KEY `idx_pcode` (`phonetic_code`),
KEY `keyword_ix` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=127044 DEFAULT CHARSET=latin1;

Untested but should work and be significantly faster in my opinion, not very sure if you're allowed to use limit in a subquery in mysql though, but there are other ways around that.
SELECT keyword, count(*) as 'Number of Occurences'
FROM movie_keyword
JOIN
keyword
ON keyword.`id` = movie_keyword.`keyword_id`
WHERE movie_keyword.keyword_id IN (
SELECT keyword_id
FROM movie_keyword
GROUP BY keyword
ORDER BY count(*) DESC
LIMIT 5
)
GROUP BY keyword
ORDER BY count(*) DESC;
This should be faster because you don't join all the 2.8 million entries in movie_keyword with keyword, just the ones that actually match, which I'm guessing are significantly less.
EDIT since mysql doesn't support limit inside a subquery you have to run
SELECT keyword_id
FROM movie_keyword
GROUP BY keyword
ORDER BY count(*) DESC
LIMIT 5;
first and after fetching the results run the second query
SELECT keyword, count(*) as 'Number of Occurences'
FROM movie_keyword
JOIN
keyword
ON keyword.`id` = movie_keyword.`keyword_id`
WHERE movie_keyword.keyword_id IN (RESULTS_FROM_FIRST_QUERY_SEPARATED_BY_COMMAS)
GROUP BY keyword
ORDER BY count(*) DESC;
replace RESULTS_FROM_FIRST_QUERY_SEPARATED_BY_COMMAS with the proper values programatically from whatever language you're using

The query seems fine but I think the structure is not, try to give index on columns
keyword.id
try,
CREATE INDEX keyword_ix ON keyword (id);
or
ALTER TABLE keyword ADD INDEX keyword_ix (id);
much better if you can post the structures of your tables: keyword and Movie_keyword. Which of the two is the main table and the referencing table?
SELECT keyword, count(movie_keyword.id) as 'Number of Occurences'
FROM movie_keyword
INNER JOIN keyword
ON keyword.`id` = movie_keyword.`keyword_id`
GROUP BY keyword
ORDER BY 'Number of Occurences' DESC
LIMIT 5

I know this is pretty old question, but because I think that xception forgot about delivery tables in mysql, I want to suggest another solution. It requires only one query and it omits joining big data. If someone has such big data and can test it ( maybe question creator ), please share results.
SELECT keyword.keyword, _temp.occurences
FROM (
SELECT keyword_id, COUNT( keyword_id ) AS occurences
FROM movie_keyword
GROUP BY keyword_id
ORDER BY occurences DESC
LIMIT 5
) AS _temp
JOIN keyword ON _temp.keyword_id = keyword.id
ORDER BY _temp.occurences DESC

Deleting Duplicates in MySQL

Query was this:
CREATE TABLE `query` (
`id` int(11) NOT NULL auto_increment,
`searchquery` varchar(255) NOT NULL default '',
`datetime` int(11) NOT NULL default '0',
PRIMARY KEY (`id`)
) ENGINE=MyISAM
first I want to drop the table with:
ALTER TABLE `querynew` DROP `id`
and then delete the double entries..
I tried it with:
INSERT INTO `querynew` SELECT DISTINCT * FROM `query`
but with no success.. :(
and with ALTER TABLE query ADD UNIQUE ( searchquery ) - is it possible to save the queries only one time?

I would use MySQL's multi-table delete syntax:
DELETE q2 FROM query q1 JOIN query q2 USING (searchquery, datetime)
WHERE q1.id < q2.id;

I would do this using an index with the MySQL-specific IGNORE keyword. This kills two birds with one stone: it deletes duplicate rows, and adds a unique index so that you will not get any more of them. It is usually faster than the other methods as well:
alter ignore table query add unique index(searchquery, datetime);

You should be able to do it without first removing the column:
DELETE FROM `query`
WHERE `id` IN (
SELECT `id`
FROM `query` q
WHERE EXISTS ( -- Any matching rows with a lower id?
SELECT *
FROM `query`
WHERE `searchquery` = q.`searchquery`
AND `datetime` = q.`datetime`
AND `id` < q.`id`
)
);
You could also go via a temp table:
SELECT MIN(`id`), `searchquery`, `datetime`
INTO `temp_query`
GROUP BY `searchquery`, `datetime`;
DELETE FROM `query`;
INSERT INTO `query` SELECT * FROM `temp_query`;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Find duplicates per parent id - mysql

select p.id, a.id, a.Nummer from planung p inner join aufgabe a on a.planung_id = p.id group by p.id, a.id, a.Nummer having count(*) > 1

This will give you all elements from planung being more than once a parent. SELECT planung_id,nummer,COUNT() FROM aufgabe GROUP BY 1,2 HAVING COUNT()>1 ;

Related

How to optimize query for Max(Date) in MySQL

How do I SELECT from a table with a JOIN with multiple matching values?

Mysql. Select where field=1 or field=2 with IF

MySQL JOIN time reduction

Deleting Duplicates in MySQL

Categories

Resources

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Find duplicates per parent id - mysql

select p.id, a.id, a.Nummer from planung p inner join aufgabe a on a.planung_id = p.id group by p.id, a.id, a.Nummer having count(*) > 1

This will give you all elements from planung being more than once a parent. SELECT planung_id,nummer,COUNT(*) FROM aufgabe GROUP BY 1,2 HAVING COUNT(*)>1 ;

Related

How to optimize query for Max(Date) in MySQL

How do I SELECT from a table with a JOIN with multiple matching values?

Mysql. Select where field=1 or field=2 with IF

MySQL JOIN time reduction

Deleting Duplicates in MySQL

Categories

Resources

This will give you all elements from planung being more than once a parent. SELECT planung_id,nummer,COUNT() FROM aufgabe GROUP BY 1,2 HAVING COUNT()>1 ;