Merge references to duplicate rows in mysql - mysql

This feels very simple and complex at the same time, but I can't quite get my head around an appropriate way of going about this as mysql query.
I have a table of tags called categories that should only have unique titles for the field cat_title. However, I've noticed that there are multiple rows with the same cat_title field name.
I want to delete all but the first instance of any duplicates. Simple enough, yes. But another table, tagging has a field called tagging_cat_id that references the identifier field, cat_id in the categories table. Deleting duplicates will break these references and point to nothing.
So, the more complex aspect is finding any tagging_cat_id field that references a duplicate row that's about to be deleted and change it to reference the (soon to be unique, single) first row of this cat_title
I am a novice at mysql and this is a bit out of my depth. I was almost tempted to do this manually by hand in a gui. Is there a simple enough method of doing this as a query that I could run on occasion to perform the above? (until what's causing duplicates to be created is resolved). Distrib version is 5.7.21.
Sample Data
Categories
+--------+-----------+
| cat_id | cat_title |
+--------+-----------+
| 1 | green |
| 2 | red |
| 3 | blue |
| 4 | green |
| 5 | green |
| 6 | red |
| 7 | white |
+--------+-----------+
Tagging
+------------+-------------------+----------------+
| tagging_id | tagging_record_id | tagging_cat_id |
+------------+-------------------+----------------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 7 |
| 4 | 3 | 5 |
| 5 | 4 | 6 |
| 6 | 5 | 4 |
| 7 | 5 | 3 |
| 8 | 6 | 5 |
+------------+-------------------+----------------+
I want to convert the above to the following:
Categories
+--------+-----------+
| cat_id | cat_title |
+--------+-----------+
| 1 | green |
| 2 | red |
| 3 | blue |
| 7 | white |
+--------+-----------+
Tagging
+------------+-------------------+----------------+
| tagging_id | tagging_record_id | tagging_cat_id |
+------------+-------------------+----------------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 7 |
| 4 | 3 | 1 |
| 5 | 4 | 2 |
| 6 | 5 | 1 |
| 7 | 5 | 3 |
| 8 | 6 | 1 |
+------------+-------------------+----------------+

If your version of MySql is 8.0+ you can use this query:
SELECT cat_id, MIN(cat_id) OVER (PARTITION BY cat_title) min_id
FROM categories
to identify for each cat_id the minimum cat_id with the same cat_title so you can update the table:
WITH ids AS (
SELECT cat_id, MIN(cat_id) OVER (PARTITION BY cat_title) min_id
FROM categories
)
UPDATE tagging t
INNER JOIN ids i ON i.cat_id = t.tagging_cat_id
SET t.tagging_cat_id = i.min_id
Then you can delete the duplicates:
WITH ids AS (
SELECT cat_id, MIN(cat_id) OVER (PARTITION BY cat_title) min_id
FROM categories
)
DELETE c
FROM categories c INNER JOIN ids i
ON i.cat_id = c.cat_id AND i.min_id < c.cat_id
See the demo.
For previous versions of MySql that do not support window functions and CTEs:
UPDATE tagging t
INNER JOIN categories c ON c.cat_id = t.tagging_cat_id
INNER JOIN (
SELECT cat_title, MIN(cat_id) min_id
FROM categories
GROUP BY cat_title
) m ON m.cat_title = c.cat_title
SET t.tagging_cat_id = m.min_id
and:
DELETE c1
FROM categories c1 INNER JOIN categories c2
ON c2.cat_title = c1.cat_title
WHERE c1.cat_id > c2.cat_id
See the demo.
Results:
cat_id
cat_title
1
green
2
red
3
blue
7
white
and:
tagging_id
tagging_record_id
tagging_cat_id
1
1
1
2
1
2
3
2
7
4
3
1
5
4
2
6
5
1
7
5
3
8
6
1

Related

MySQL How to Select smth by MAX(id)....WHERE userID = some number GROUP BY smth

I have next table in my DB:
personal_prizes
___________ ___________ _________ __________
| id | userId | specId| grp |
|___________|___________|_________|__________|
| 1 | 1 | 1 | 1 |
| 2 | 1 | 2 | 1 |
| 3 | 2 | 3 | 1 |
| 4 | 2 | 4 | 2 |
| 5 | 1 | 5 | 2 |
| 6 | 1 | 6 | 2 |
| 7 | 2 | 7 | 3 |
| 8 | 1 | 13 | 4 |
|___________|___________|_________|__________|
I need to select specId by max id group by grp.
So I have composed next query
SELECT pp.specId
FROM personal_prizes pp
WHERE pp.specId IN (SELECT MAX(pp1.id)
FROM personal_prizes pp1
WHERE pp1.userId = 1
GROUP BY pp1.grp)
And it's work for my little table. But when I try to implement it for my prod db with personal_prizes > 100,000.
Please help me optimize it
The query you have should work fine. Make sure though that you not only have an index on id (which I suppose is the primary key), but also one on specId.
Just as an alternative, you might try this one:
select group_concat(pp.specId order by pp1.id desc)+0 as result_specId
from personal_prizes pp1
left join personal_prizes pp on pp.specId = pp1.id
where pp1.userId = 1
group by pp1.grp
having result_specId is not null;
The idea here is that the sub query is promoted to the main query, and the specId is retrieved by an outer join. The group_concat aggregation function will list the one of interest as the first. The having clause will exclude the cases where no matching specId was found.
Note that this will only give the same results if the specId field is guaranteed to be non-null.

Group by two values

I have the following query:
SELECT
items.*
FROM
`items`
INNER JOIN
`users` ON `items`.`owner` = `users`.`id`
GROUP BY
`items`.`owner`
LIMIT
10
I ensures it is grouped by the user (only one item fetched per user), but I also wish ensure that items with the category, say, "1" only appears once.
But that does not work. Well, query succeeds, but it does not group by category. Multiple categories is still shown. Any ideas?
I have created a SQLFiddle here: http://sqlfiddle.com/#!2/0a4bad/1
Instead of outputting:
+----+----------+-------+
| ID | CATEGORY | OWNER |
+----+----------+-------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 2 | 4 |
| 5 | 2 | 5 |
+----+----------+-------+
It should be outputting:
+----+----------+-------+
| ID | CATEGORY | OWNER |
+----+----------+-------+
| 1 | 1 | 1 |
| 4 | 2 | 2 |
| 5 | 2 | 4 |
| 5 | 2 | 5 |
| 8 | 3 | 3 |
+----+----------+-------+
(notice category 1 is only shown ONCE).
I want to ensure that only one item per owner is shown, and then adtionally ensure that a specific category (say 1 and 5) is only shown once. The category 1 and 5 are overpopulated, and if they are not limited, they will be 90% of the output.
You can use DISTINCT to retrieve unique data:
SELECT DISTINCT items.category
select * from items t1
where category not in (1,2)
or not exists (
select 1 from items t2
where t2.id < t1.id
and t2.category = t1.category
)
group by owner
http://sqlfiddle.com/#!2/0a4bad/27

Join top 3 interest fields along with each user row

I'm trying to get the top 3 interests of each user, probably as a LEFT JOIN query.
The way the app is designed, each user has a set of interests which are no other than 'childs' (rows without parent) of the categories table.
Here are some simplified table schemas w/mock data (see SQL Fiddle demo)
-- Users table
| ID | NAME |
--------------
| 1 | John |
| 2 | Mary |
| 3 | Chris |
-- Categories table -- Interests table
| ID | NAME | PARENT | | ID | USER_ID | CATEGORY_ID |
-------------------------------------- ------------------------------
| 1 | Web Development | (null) | | 1 | 1 | 1 |
| 2 | Mobile Apps | (null) | | 2 | 1 | 1 |
| 3 | Software Development | (null) | | 3 | 1 | 1 |
| 4 | Marketing & Sales | (null) | | 4 | 2 | 1 |
| 5 | Web Apps | 1 | | 5 | 2 | 1 |
| 6 | CSS | 1 | | 6 | 3 | 1 |
| 7 | iOS | 2 | | 7 | 3 | 1 |
| 8 | Streaming Media | 3 | | 8 | 3 | 1 |
| 9 | SEO | 4 |
| 10 | SEM | 4 |
To get the top 3 interests of a given user, I've usually performed this query:
SELECT `c`.`parent` as `category_id`
FROM `interests` `i` LEFT JOIN `categories` `c` ON `c`.`id` = `i`.`category_id`
WHERE `i`.`user_id` = '2'
GROUP BY `c`.`parent`
ORDER BY count(`c`.`parent`) DESC LIMIT 3
This query returns the top 3 categories (parents) of user with id = 2
I would like to find out how I can query the users table and get their top 3 categories either in 3 different fields (preferred) or as a group_concat(..) in one field
SELECT id, name, top_categories FROM users, (...) WHERE id IN ('1', '2', '3');
Any ideas how I should go about doing this?
Thanks!
First build a groped query that lists on distinct rows, the top three skills for each user. Then pivot that into to pull the three skills for eah user out to the right. You will need to use the Max(isnull(skill,'')) expression on the skills in each skill column.
It is very crude way of doing it in MYSQL to get top 3 records for each user
SELECT u.id, c.name
FROM
users u,
categories c,
(SELECT i.id,
i.user_id,
i.category_id,
#running:=if(#previous=i.user_id,#running,0) + 1 as rId,
#previous:=i.user_id
FROM
(SELECT * FROM intersect ORDER BY user_id) i JOIN
(SELECT #running=0, #previous=0 ) r) i
WHERE
u.id = i.USER_ID AND
i.CATEGORY_ID = c.id AND
i.rId <= 3
group by u.id, c.name ;
Hope it helps
FIDDLE

Select Distinct Set Common to Subset From Join Table

Given a join table for m-2-m relationship between booth and user
+-----------+------------------+
| booth_id | user_id |
+-----------+------------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 5 |
| 1 | 9 |
| 2 | 1 |
| 2 | 2 |
| 2 | 5 |
| 2 | 10 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
| 3 | 4 |
| 3 | 6 |
| 3 | 11 |
+-----------+------------------+
How can I get a distinct set of booth records that are common between a subset of user ids? For example, if I am given user_id values of 1,2,3, I expect the result set to include only booth with id 3 since it is the only common booth in the join table above between all user_id's provided.
I'm hoping I'm missing a keyword in MySQL to accompish this. The furthest I've come so far is using ... user_id = all (1,2,3) but this is always returning an empty result set (I believe I understand why it is though).
The SQL query for this will be:
select booth_id from table1 where [user_id]
in (1,2,3) group by booth_id having count(booth_id) =
(select count(distinct([user_id])) from table1 where [user_id] in (1,2,3))
If this could help you creating the MySQL query.

mysql getting data and looking it up in another table

I've got two tables in my database. Table 1 is a list of "timelines" and their corresponding owners and title.
Table 2 is a list of users who have access to the timelines but are followers, not owners.
I'm trying to write a query that outputs the lineID's and corresponding titles that are linked to a userID in either of the two tables.
A query for userID 1 would ideally output:
1 a
2 b
3 c
6 f
Hopefully this isn't too confusing but the purpose is to fill a dynamically generated select box with the LineID and Title for a given UserID...
Table 1 ("owners")
--------------------------
| LineID | UserID | Title |
| 1 | 1 | a |
| 2 | 1 | b |
| 3 | 1 | c |
| 4 | 2 | d |
| 5 | 2 | e |
| 6 | 1 | f |
--------------------------
Table 2 ("followers")
----------------------------
| RowID | LineID | UserID |
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 3 | 1 |
| 4 | 3 | 2 |
| 5 | 2 | 2 |
| 6 | 6 | 1 |
----------------------------
I tried using:
SELECT title
FROM `lines`
LEFT JOIN follow
ON follow.user_id = lines.user_id
WHERE follow.user_id = 1
That ended up producing duplicate rows.
The output I need would ideally be an array consisting of all the lineID's and Titles associated with that userID.
select LineId, Title
from owners
where LineId in (select LineId from followers group by LineId )
order by owners.LineId