Replace category names with ids saving hierarchy - mysql

I have such data table:
category_name | subcategory_name | other data
---------------------------------------------
fruits | apples | ...
fruits | oragnes | ...
What is the best way to replace category and subcategory names with its ids, moving them to other table? Getting this in result:
category table:
id | name | parent_id
------------------------
1 | fruits | 0
2 | apples | 1
3 | oragnes| 1
data table:
category_id | subcategory_id | other data
---------------------------------------------
1 | 2 | ...
1 | 3 | ...
I can do all manually using some select distinct and join queries , but is there any better way?

There likely is an easier way to do this; but mysql doesn't contain many of the window functions I'd like to user here, nor common table expressions, nor views with user variables.... So I'm left with a bit of a mess but it seems to work...
I'm assuming your current model could have N-Levels.
I'm also assuming a subCategory doesn't exist under multiple categories..
What this does is
Generate a set of data which contains the "names" we had to find all the names of categories without parents and union those to the names with parents.
Then we assigned a user variables (#) to generate a new ID each item. Two different variables were used so numbering didn't wrap from one query to the next.
We then copied that query into two separate sub queries (A,B in my example) and joined them to get the parent ids.
SQL Fiddle
-- distinct used to get 1 record for each parent and an outer wraper to return just desired results
Select Distinct A.ID as ID, A.SubCategory_name name, B.ID as Parent_ID from (
---this select assigns a row number for each named value
Select #rn:=#rn+1 ID, t1.*
from (
--Get just parents without any parent
Select NULL as Category_Name, F1.category_name as SubCategory_name
FROM (Select distinct category_Name from foo) F1
LEFT JOIN Foo F2
on F1.Category_name = F2.SubCategory_name
where F2.SubCategory_name is null
UNION ALL
--get just children of parents
SELECT category_name, subcategory_Name from foo) T1
-- Used to get a row number assigned
CROSS JOIN (SELECT #rn:=0) t2
-- used to ensure same order applied to both queries so numbers match
-- though now that I think about it I don't think we need numbers in 2nd query
order by Category_name, SubCategory_Name) A
LEFT JOIN (
Select #r:=#r+1 ID, t1.*
from (
Select NULL as Category_Name, F1.category_name as SubCategory_name
FROM (Select distinct category_Name from foo) F1
LEFT JOIN Foo F2
on F1.Category_name = F2.SubCategory_name
where F2.SubCategory_name is null
UNION ALL
SELECT category_name, subcategory_Name from foo) T1
CROSS JOIN (SELECT #r:=0) t2
order by Category_name, SubCategory_Name) B
on B.SubCategory_Name = A.Category_name
and from the above select you could create table, or populate a table.

Related

How to get a list of IDs not present in a (My)SQL table?

I have a MySQL table, that looks like this:
+----+
| id |
+----+
| a |
| c |
| e |
+----+
I want to check which of the following ids a, b, c, d, e does not exist in the table.
Meaning, I want to get a list of the ids b and d back.
Is there a single query I can construct by hand that returns a list of IDs that does not exist in my table, without actually creating a new second permanent SQL table?
This is usually done using a left join or something like this:
select ids.id
from (select 'a' as id union all select 'b' union all select 'c' union all
select 'd' union all select 'e'
) ids
where ids.id not in (select id from t);
If the list is already in a table, then you can use that table (or subquery) instead of a derived table.

How to join two tables, with distinct columns on either side?

I have two tables I'm trying to join to produce a unique set of data for a third table, but having trouble doing this properly.
The left table has an id field, as well as a common join field (a).
The right table has the common join field (a), and another distinct field (b)
I'm trying to extract a result-set of id and b, where neither id nor b are duplicated.
I have an SQL fiddle set up: http://www.sqlfiddle.com/#!9/208de/3/0
The ideal results should be:
id | b
---+---
1 | 1
2 | 2
3 | 3
Each id and b value appears only once (it's only coincidence they match here, that can't be assumed always).
Thanks
What about a CTE along with a DISTINCT, Would that work?
WITH
cte1 (ID, B)
AS
(
SELECT DISTINCT Table1.ID
FROM Table1
WHERE Table1.ID IS NOT NULL
GROUP BY Table1.ID
)
SELECT DISTINCT
Table2.b
FROM Table2 AS sp
INNER JOIN cte1 AS ts
ON sp.b <> ts.ID
ORDER BY ts.ID DESC

Select Matched Pairs from Two Tables

I need to select matched pairs from two tables containing similarly structured data. "Matched Pair" here means two rows that reference each other in the 'match' column.
A single-table matched pair example:
TABLE
----
id | matchid
1 | 2
2 | 1
ID 1 and 2 are a matched pair because each has a match entry for the other.
Now the real question: what is the best (fastest) way to select the matched pairs that appear in both tables:
Table ONE (id, matchid)
Table TWO (id, matchid)
Example data:
ONE TWO
---- ----
id | matchid id | matchid
1 | 2 2 | 3
2 | 3 3 | 2
3 | 2
4 | 5
5 | 4
The desired result is a single row with IDs 2 and 3.
RESULT
----
id | id
2 | 3
This is because 2 & 3 are a matched pair in table ONE and in table TWO. 4 & 5 are a matched pair in table ONE but not TWO, so we don't select them. 1 and 2 are not a match pair at all since 2 does not have a matching entry for 1.
I can get the matched pairs from one table with this:
SELECT a.id, b.id
FROM ONE a JOIN ONE b
ON a.id = b.matchid AND a.matchid = b.id
WHERE a.id < b.id
How should I build a query that selects only the matching pairs that appear in both tables?
Should I:
Select the query above for each table and WHERE EXISTS them together?
Select the query above for each table and JOIN them together?
Select the query above then JOIN table TWO twice, once for 'id' and once for 'matchid'?
Select the query above for each table and loop through to compare them back in php?
Somehow filter table TWO down so we only have to look at the IDs in matched pairs in table ONE?
Do something totally different?
(Since this is a question of efficiency, it is worth noting that the matches will be quite sparse, maybe 1/1000 or less, and each table will have 100,000+ rows.)
I think I get your point. You want to filter the records in which the pairs exists on both tables.
SELECT LEAST(a.ID, a.MatchID) ID, GREATEST(a.ID, a.MatchID) MatchID
FROM One a
INNER JOIN Two b
ON a.ID = b.ID AND
a.matchID = b.matchID
GROUP BY LEAST(a.ID, a.MatchID), GREATEST(a.ID, a.MatchID)
HAVING COUNT(*) > 1
SQLFiddle Demo
Try this Query:
select
O.id,
O.matchid
from
ONE O
where
(CAST(O.id as CHAR(50))+'~'+CAST(O.matchid as CHAR(50)))
in (select CAST(T.id as CHAR(50))+'~'+CAST(T.matchid as CHAR(50)) from TWO T)
Edited Query:
select distinct
Least(O.id,O.matchid) ID,
Greatest(O.id,O.matchid) MatchID
from
ONE O
where
(CAST(O.id as CHAR(50))+'~'+CAST(O.matchid as CHAR(50)))
in (select CAST(T.id as CHAR(50))+'~'+CAST(T.matchid as CHAR(50)) from TWO T)
and (CAST(O.matchid as CHAR(50))+'~'+CAST(O.id as CHAR(50)))
in (select CAST(T.id as CHAR(50))+'~'+CAST(T.matchid as CHAR(50)) from TWO T)
SQL Fiddle
Naive version, which checks all the four rows that need to exist:
-- EXPLAIN ANALYZE
WITH both_one AS (
SELECT o.id, o.matchid
FROM one o
WHERE o.id < o.matchid
AND EXISTS ( SELECT * FROM one x WHERE x.id = o.matchid AND x.matchid = o.id)
)
, both_two AS (
SELECT t.id, t.matchid
FROM two t
WHERE t.id < t.matchid
AND EXISTS ( SELECT * FROM two x WHERE x.id = t.matchid AND x.matchid = t.id)
)
SELECT *
FROM both_one oo
WHERE EXISTS (
SELECT *
FROM both_two tt
WHERE tt.id = oo.id AND tt.matchid = oo.matchid
);
This one is simpler :
-- EXPLAIN ANALYZE
WITH pair AS (
SELECT o.id, o.matchid
FROM one o
WHERE EXISTS ( SELECT * FROM two x WHERE x.id = o.id AND x.matchid = o.matchid)
)
SELECT *
FROM pair pp
WHERE EXISTS (
SELECT *
FROM pair xx
WHERE xx.id = pp.matchid AND xx.matchid = pp.id
)
AND pp.id < pp.matchid
;

MySQL Query same column twice

I have two tables one with ID and NAME
table 1
ID | NAME
1 | first
2 | second
3 | third
and an XREF table with ID and PARENT ID
table2
ID | PARENT ID
1 | 0
2 | 1
3 | 2
and I would like to retrieve the NAME twice like this: NAME | PARENT NAME
If it is possible to go three levels deep but with same 2-column table like this:
result table
NAME | PARENT NAME
first | NULL or EMPTY or this line the not showing at all
second | first
third | second
... then I'd like to figure that out as well.
select t1.Name, t12.Name from
table1 t1
inner join table2 t2 on t1.ID = t2.ID
inner join table1 t12 on t2.ParentID = t12.ID
This would only return 2 rows. If you want to have the first row (for ID=1) you just need to outer join instead.
Consider putting the parentid in the first table as a self-referential relationship rather than having a separate table for it.
Ex.:
table1
ID | PARENTID | NAME
---------------------------
1 NULL first
2 1 second
3 2 third
That way you would only need to join the table on itself rather than going through a 3rd table. (This is however assuming that the rows in table1 can only have a single parent, whereas your design allows one row to have multiple parents at a time)
But for your table structure, this will work:
SELECT
a.name,
c.name AS 'PARENT NAME'
FROM
table1 a
LEFT JOIN
table2 b ON a.id = b.id
LEFT JOIN
table1 c ON b.parentid = c.id
But if you made the parentid in the same table referencing id, the SQL would be reduced to this:
SELECT
a.name,
b.name AS 'PARENT NAME'
FROM
table1 a
LEFT JOIN
table2 b ON a.parentid = b.id

Find duplicate records in MySQL

I want to pull out duplicate records in a MySQL Database. This can be done with:
SELECT address, count(id) as cnt FROM list
GROUP BY address HAVING cnt > 1
Which results in:
100 MAIN ST 2
I would like to pull it so that it shows each row that is a duplicate. Something like:
JIM JONES 100 MAIN ST
JOHN SMITH 100 MAIN ST
Any thoughts on how this can be done? I'm trying to avoid doing the first one then looking up the duplicates with a second query in the code.
The key is to rewrite this query so that it can be used as a subquery.
SELECT firstname,
lastname,
list.address
FROM list
INNER JOIN (SELECT address
FROM list
GROUP BY address
HAVING COUNT(id) > 1) dup
ON list.address = dup.address;
SELECT date FROM logs group by date having count(*) >= 2
Why not just INNER JOIN the table with itself?
SELECT a.firstname, a.lastname, a.address
FROM list a
INNER JOIN list b ON a.address = b.address
WHERE a.id <> b.id
A DISTINCT is needed if the address could exist more than two times.
I tried the best answer chosen for this question, but it confused me somewhat. I actually needed that just on a single field from my table. The following example from this link worked out very well for me:
SELECT COUNT(*) c,title FROM `data` GROUP BY title HAVING c > 1;
Isn't this easier :
SELECT *
FROM tc_tariff_groups
GROUP BY group_id
HAVING COUNT(group_id) >1
?
select `cityname` from `codcities` group by `cityname` having count(*)>=2
This is the similar query you have asked for and its 200% working and easy too.
Enjoy!!!
Find duplicate users by email address with this query...
SELECT users.name, users.uid, users.mail, from_unixtime(created)
FROM users
INNER JOIN (
SELECT mail
FROM users
GROUP BY mail
HAVING count(mail) > 1
) dupes ON users.mail = dupes.mail
ORDER BY users.mail;
we can found the duplicates depends on more then one fields also.For those cases you can use below format.
SELECT COUNT(*), column1, column2
FROM tablename
GROUP BY column1, column2
HAVING COUNT(*)>1;
Finding duplicate addresses is much more complex than it seems, especially if you require accuracy. A MySQL query is not enough in this case...
I work at SmartyStreets, where we do address validation and de-duplication and other stuff, and I've seen a lot of diverse challenges with similar problems.
There are several third-party services which will flag duplicates in a list for you. Doing this solely with a MySQL subquery will not account for differences in address formats and standards. The USPS (for US address) has certain guidelines to make these standard, but only a handful of vendors are certified to perform such operations.
So, I would recommend the best answer for you is to export the table into a CSV file, for instance, and submit it to a capable list processor. One such is LiveAddress which will have it done for you in a few seconds to a few minutes automatically. It will flag duplicate rows with a new field called "Duplicate" and a value of Y in it.
Another solution would be to use table aliases, like so:
SELECT p1.id, p2.id, p1.address
FROM list AS p1, list AS p2
WHERE p1.address = p2.address
AND p1.id != p2.id
All you're really doing in this case is taking the original list table, creating two pretend tables -- p1 and p2 -- out of that, and then performing a join on the address column (line 3). The 4th line makes sure that the same record doesn't show up multiple times in your set of results ("duplicate duplicates").
Not going to be very efficient, but it should work:
SELECT *
FROM list AS outer
WHERE (SELECT COUNT(*)
FROM list AS inner
WHERE inner.address = outer.address) > 1;
This will select duplicates in one table pass, no subqueries.
SELECT *
FROM (
SELECT ao.*, (#r := #r + 1) AS rn
FROM (
SELECT #_address := 'N'
) vars,
(
SELECT *
FROM
list a
ORDER BY
address, id
) ao
WHERE CASE WHEN #_address <> address THEN #r := 0 ELSE 0 END IS NOT NULL
AND (#_address := address ) IS NOT NULL
) aoo
WHERE rn > 1
This query actially emulates ROW_NUMBER() present in Oracle and SQL Server
See the article in my blog for details:
Analytic functions: SUM, AVG, ROW_NUMBER - emulating in MySQL.
This also will show you how many duplicates have and will order the results without joins
SELECT `Language` , id, COUNT( id ) AS how_many
FROM `languages`
GROUP BY `Language`
HAVING how_many >=2
ORDER BY how_many DESC
SELECT firstname, lastname, address FROM list
WHERE
Address in
(SELECT address FROM list
GROUP BY address
HAVING count(*) > 1)
select * from table_name t1 inner join (select distinct <attribute list> from table_name as temp)t2 where t1.attribute_name = t2.attribute_name
For your table it would be something like
select * from list l1 inner join (select distinct address from list as list2)l2 where l1.address=l2.address
This query will give you all the distinct address entries in your list table... I am not sure how this will work if you have any primary key values for name, etc..
Fastest duplicates removal queries procedure:
/* create temp table with one primary column id */
INSERT INTO temp(id) SELECT MIN(id) FROM list GROUP BY (isbn) HAVING COUNT(*)>1;
DELETE FROM list WHERE id IN (SELECT id FROM temp);
DELETE FROM temp;
Personally this query has solved my problem:
SELECT `SUB_ID`, COUNT(SRV_KW_ID) as subscriptions FROM `SUB_SUBSCR` group by SUB_ID, SRV_KW_ID HAVING subscriptions > 1;
What this script does is showing all the subscriber ID's that exists more than once into the table and the number of duplicates found.
This are the table columns:
| SUB_SUBSCR_ID | int(11) | NO | PRI | NULL | auto_increment |
| MSI_ALIAS | varchar(64) | YES | UNI | NULL | |
| SUB_ID | int(11) | NO | MUL | NULL | |
| SRV_KW_ID | int(11) | NO | MUL | NULL | |
Hope it will be helpful for you either!
SELECT t.*,(select count(*) from city as tt where tt.name=t.name) as count FROM `city` as t where (select count(*) from city as tt where tt.name=t.name) > 1 order by count desc
Replace city with your Table.
Replace name with your field name
SELECT id, count(*) as c
FROM 'list'
GROUP BY id HAVING c > 1
This will return you the id with the number of times that id is repeated, or nothing in which case you will not have repeated id.
Change the id in the group by (ex: address) and it will return the number of times an address is repeated identified by the first found id with that address.
SELECT id, count(*) as c
FROM 'list'
GROUP BY address HAVING c > 1
I hope it helps. Enjoy ;)
SELECT *
FROM (SELECT address, COUNT(id) AS cnt
FROM list
GROUP BY address
HAVING ( COUNT(id) > 1 ))
I use the following:
SELECT * FROM mytable
WHERE id IN (
SELECT id FROM mytable
GROUP BY column1, column2, column3
HAVING count(*) > 1
)
Most of the answers here don't cope with the case when you have MORE THAN ONE duplicate result and/or when you have MORE THAN ONE column to check for duplications. When you are in such case, you can use this query to get all duplicate ids:
SELECT address, email, COUNT(*) AS QUANTITY_DUPLICATES, GROUP_CONCAT(id) AS ID_DUPLICATES
FROM list
GROUP BY address, email
HAVING COUNT(*)>1;
If you want to list every result as a single line, you need a more complex query. This is the one I found working:
CREATE TEMPORARY TABLE IF NOT EXISTS temptable AS (
SELECT GROUP_CONCAT(id) AS ID_DUPLICATES
FROM list
GROUP BY address, email
HAVING COUNT(*)>1
);
SELECT d.*
FROM list AS d, temptable AS t
WHERE FIND_IN_SET(d.id, t.ID_DUPLICATES)
ORDER BY d.id;
Find duplicate Records:
Suppose we have table : Student
student_id int
student_name varchar
Records:
+------------+---------------------+
| student_id | student_name |
+------------+---------------------+
| 101 | usman |
| 101 | usman |
| 101 | usman |
| 102 | usmanyaqoob |
| 103 | muhammadusmanyaqoob |
| 103 | muhammadusmanyaqoob |
+------------+---------------------+
Now we want to see duplicate records
Use this query:
select student_name,student_id ,count(*) c from student group by student_id,student_name having c>1;
+--------------------+------------+---+
| student_name | student_id | c |
+---------------------+------------+---+
| usman | 101 | 3 |
| muhammadusmanyaqoob | 103 | 2 |
+---------------------+------------+---+
To quickly see the duplicate rows you can run a single simple query
Here I am querying the table and listing all duplicate rows with same user_id, market_place and sku:
select user_id, market_place,sku, count(id)as totals from sku_analytics group by user_id, market_place,sku having count(id)>1;
To delete the duplicate row you have to decide which row you want to delete. Eg the one with lower id (usually older) or maybe some other date information. In my case I just want to delete the lower id since the newer id is latest information.
First double check if the right records will be deleted. Here I am selecting the record among duplicates which will be deleted (by unique id).
select a.user_id, a.market_place,a.sku from sku_analytics a inner join sku_analytics b where a.id< b.id and a.user_id= b.user_id and a.market_place= b.market_place and a.sku = b.sku;
Then I run the delete query to delete the dupes:
delete a from sku_analytics a inner join sku_analytics b where a.id< b.id and a.user_id= b.user_id and a.market_place= b.market_place and a.sku = b.sku;
Backup, Double check, verify, verify backup then execute.
SELECT * FROM bookings
WHERE DATE(created_at) = '2022-01-11'
AND code IN (
SELECT code FROM bookings
GROUP BY code
HAVING COUNT(code) > 1
) ORDER BY id DESC
Would go with something like this:
SELECT t1.firstname t1.lastname t1.address FROM list t1
INNER JOIN list t2
WHERE
t1.id < t2.id AND
t1.address = t2.address;
select address from list where address = any (select address from (select address, count(id) cnt from list group by address having cnt > 1 ) as t1) order by address
the inner sub-query returns rows with duplicate address then
the outer sub-query returns the address column for address with duplicates.
the outer sub-query must return only one column because it used as operand for the operator '= any'
Powerlord answer is indeed the best and I would recommend one more change: use LIMIT to make sure db would not get overloaded:
SELECT firstname, lastname, list.address FROM list
INNER JOIN (SELECT address FROM list
GROUP BY address HAVING count(id) > 1) dup ON list.address = dup.address
LIMIT 10
It is a good habit to use LIMIT if there is no WHERE and when making joins. Start with small value, check how heavy the query is and then increase the limit.