MySQL efficient data to junction table query - mysql

The problem:
I want to move the links of the categories from the table companies_1 into the company_categories table. The company_id in the company_categories table need to be equal to the id of the companies_2 table. The records of the companies_1 and the companies_2 table are linked by the "name"-column.
The current code below took me over a night, still unfinished! I want to learn to be more efficient and speed this progress up. I feel like there is very much to optimize because there are A LOT of company records.
Another issue was that i found no way how to check where my query was while looping (resulting in no way to check the progress). Because the progress took so long i killed the query and I'm searching for a better way to solve this issue.
The information:
There is a table with companies like:
----------------------------------------
| companies_1 |
----------------------------------------
| id | category_id | name |
----------------------------------------
| 1 | 1 | example-1 |
| 2 | 2 | example-1 |
| 3 | 1 | example-2 |
| 4 | 2 | example-2 |
| 5 | 3 | example-2 |
| 6 | 1 | example-3 |
----------------------------------------
A table with the DISTINCT company names:
-------------------------
| companies_2 |
-------------------------
| id | name |
-------------------------
| 1 | example-1 |
| 2 | example-2 |
| 3 | example-3 |
-------------------------
A categories table, like:
-------------------------
| categories |
-------------------------
| id | name |
-------------------------
And a junction table, like:
---------------------------------
| company_categories |
---------------------------------
| company_id | category_id |
---------------------------------
The current code:
This code works, but is far from efficient.
DELIMITER $$
DROP PROCEDURE IF EXISTS fill_junc_table$$
CREATE PROCEDURE fill_junc_table()
BEGIN
DECLARE r INT;
DECLARE i INT;
DECLARE i2 INT;
DECLARE loop_length INT;
DECLARE company_old_len INT;
DECLARE _href VARCHAR(255);
DECLARE cat_id INT;
DECLARE comp_id INT;
SET r = 0;
SET i = 0;
SET company_old_len = 0;
SELECT COUNT(*) INTO loop_length FROM companies;
WHILE i < loop_length DO
SELECT href INTO _href FROM company_old LIMIT i,1;
SELECT id INTO comp_id FROM companies WHERE site_href=_href;
SELECT COUNT(*) INTO company_old_len FROM company_old WHERE href=_href;
SET i2 = 0;
WHILE i2 < company_old_len DO
SELECT category_id INTO cat_id FROM company_old WHERE href=_href LIMIT i2,1;
INSERT INTO company_categories (company_id, category_id) VALUES (comp_id, cat_id);
SET r = r + 1;
SET i2 = i2 + 1;
END WHILE;
SET i = i + 1;
END WHILE;
SELECT r;
END$$
DELIMITER ;
CALL fill_junc_table();
Edit (new idea):
I am going to test another way to solve this problem by fully copying the companies_1 table with the following columns (company_id empty on copy):
---------------------------------------------
| company_id | category_id | name |
---------------------------------------------
Then, I will loop through the companies_2 table to fill the correct company_id related to the name-column.
I hope you can give your thoughts about this. When I finish my test I will leave the result over here for others.

To clarify, I don't see any PIVOT transformation in the company_categories. What I see is you want a JUNCTION TABLE because it seems that companies and categories tables have many-to-many relationship.
In your case, you have company which has multiple categories. And you also have categories assigned to multiple companies.
Now base from your requirement:
I want to move the links of the categories from the table companies_1
into the company_categories table. The company_id in the
company_categories table need to be equal to the id of the companies_2
table. The records of the companies_1 and the companies_2 table are
linked by the "name"-column.
I arrived with this query:
INSERT INTO company_categories (company_id, category_id)
SELECT C2.id
, C1.category_id
FROM companies_1 C1
INNER JOIN companies_2 C2 ON C2.name = C1.name
Let me know if this works. The nested loops that you created will really take a while.
As #DanielE pointed out, this query will work in the assumption that company_categories is empty. We will need to use UPDATE otherwise.

Why not just update companies_1?
ALTER TABLE companies_1 ADD (company_id INT)
UPDATE companies_1 SET company_id = (SELECT id FROM companies_2 WHERE name=companies_1.name)
ALTER TABLE companies_1 DROP name, RENAME TO company_categories
SELECT * FROM `company_categories`
Output
id category_id company_id
1 1 1
2 2 1
3 1 2
4 2 2
5 3 2
6 1 3

Related

Merging two records with a common field and increment one field in MySQL & PHP

I have a table as follows. What I would like to avoid is having two product id's in the table. How can I merge the two common fields using a query and increment the quantity?
cartid | prodid | quanity |
1 | 9226582 | 3 |
2 | 9226582 | 5 |
3 | 7392588 | 1 |
The desired results is that the table should be altered as follows:
cartid | prodid | quanity |
1 | 9226582 | 8 |
3 | 7392588 | 1 |
I have searched for answers but all seem too complex. Is there a way to do this in a simple way?
If you want to update the table in the database, you can do this:
create table newtable
(`cartid` int, `prodid` int unique key, `quantity` int);
insert into newtable
select * from yourtable order by cartid
on duplicate key update quantity=newtable.quantity+values(quantity)
select * from newtable
Output:
cartid prodid quantity
1 9226582 8
3 7392588 1
If you're happy with the result you can then
drop table yourtable
alter table newtable rename to yourtable
Use group by and min-
check this-http://sqlfiddle.com/#!9/6c4332/4
select min(cartid) cartid ,prodid,sum(quantity) quantity
from
yourtable
group by prodid
order by cartid
create another table with same schema,
then
insert into newtable
select min(cartid) cartid ,prodid,sum(quantity) quantity
from
yourtable
group by prodid
order by cartid
Rename the newtable

Sort MySQL Query results by amount of recursion in foreign keys

I have the following table:
+----+--------+
| id | parent |
+----+--------+
| 1 | 4 |
| 2 | 1 |
| 3 | NULL |
| 4 | NULL |
| 5 | 2 |
| 6 | 3 |
+----+--------+
I want this table to be ordered like this:
+----+--------+------------------------------------------------------------+
| id | parent | Why it has to be ordered like this |
+----+--------+------------------------------------------------------------+
| 5 | 2 | 5 has parent 2 has parent 1 has parent 4. So 3 rows above. |
| 2 | 1 | 2 has parent 1 has parent 4. So 2 rows above. |
| 1 | 4 | 1 has parent 4. So 1 row above. |
| 6 | 3 | 6 has parent 3. So 1 row above. |
| 4 | NULL | No parent. So 0 rows above. |
| 3 | NULL | No parent. So 0 rows above. |
+----+--------+------------------------------------------------------------+
So I want to recursively count the ancestors of a row and sort on that. How can I do that?
Edit: I'm on MySQL version 5.7.21.
You could do this with a recursive CTE, but you didn't list your mysql version and not all versions can do that, so here is something that should work even for older versions. This does the recursion itself with a temporary table and a while statement. The temporary table gets built with one record for each record in the main table, which holds the parent count data. First we do all records with no parent, then the query inside the while does all the records for the next generation. Note that the syntax may be a little bit off, I haven't done mysql for some time.
--Create temp table to hold the parent count data
CREATE TEMPORARY TABLE ParentCount (id int, pcount int);
--First create a pcount record with count zero for all records with no parent
insert into ParentCount (id, pcount) Select id, 0 from TestData where parent is null;
--If we don't have a parentcount set for every record, keep going
-- This will run once for every level of depth
While (Select COUNT(id) from TestData) <> (Select COUNT(id) from ParentCount) Begin
--add a pcount record for all rows that don't have one yet, but whose
-- parents do have one (ie the next generation)
insert into ParentCount (id, pcount)
Select T.id, P.pcount + 1 as newpcount
from TestData T
inner join ParentCount P on P.id = T.parent
left outer join ParentCount P2 on P2.id = T.id
where P2.id is null;
End;
--final query
Select T.id, T.parent
from TestData T
inner join Parents P on T.id = p.id
order by P.pcount DESC, T.id ASC;

Avoide duplicate id in the response

I am trying to accomplish the following sql statement but I am getting one duplicate id in my response.
SELECT ci.customer_id,
ci.first_name,
ci.user_gender,
ci.customer_status,
fr.relation
FROM customerinfo ci
INNER JOIN familyrelation fr
ON ( fr.personid_two = ci.customer_id )
WHERE ci.customer_id IN (SELECT personid_two
FROM familyrelation
WHERE personid_one = 17)
AND ci.csp_user_id = 5;
When i run this query, I am fetching the proper result, but one customer_id is getting repeated. Any help/advice greatly appreciated.
If your data looks like this
drop table if exists ci,fr;
create table ci(customer_id int, name varchar(3),csp_user_id int);
create table fr(personid_one int,personid_two int,relation varchar(10));
insert into ci values (1,'aaa',5),(2,'bbb',5);
insert into fr values (17,1,'mother'),(17,1,'father'),(17,2,'niece');
Then your query selects the rows I would expect
SELECT ci.customer_id,
ci.name,
fr.relation
FROM ci
INNER JOIN fr
ON ( fr.personid_two = ci.customer_id )
WHERE ci.customer_id IN (SELECT personid_two
FROM fr
WHERE personid_one = 17)
AND ci.csp_user_id = 5;
+-------------+------+----------+
| customer_id | name | relation |
+-------------+------+----------+
| 1 | aaa | mother |
| 1 | aaa | father |
| 2 | bbb | niece |
+-------------+------+----------+
3 rows in set (0.00 sec)

SQL query with EXISTS not working as I thought

Hi these two SQL Queries return the same result
SELECT DISTINCT ItemID
FROM Sale INNER JOIN Department
ON Department.DepartmentID = Sale.DepartmentID
WHERE DepartmentFloor = 2
ORDER BY ItemID
SELECT DISTINCT ItemID
FROM Sale
WHERE EXISTS
(SELECT *
FROM Department
WHERE Sale.DepartmentID = Department.DepartmentID
AND DepartmentFloor = 2)
ORDER BY ItemID;
The Subquery Inside the Exists returns True So why doesnt the secod query return the equivalent of
SELECT DISTINCT ItemID
FROM Sale
Which guves a different result from the two above.
You are getting confused by EXISTS().. It occurs on a line by line basis, based on table correlation, not just a single true/false. This line of your subquery is your correlation clause:
Sale.DepartmentID = Department.DepartmentID
It is saying "Only show the Sale.ItemIDs where that ItemID's Sale.DepartmentID is in Department."
It achieves the same function as a join predicate, like in your first query:
FROM Sale S
JOIN Department D on S.DepartmentID = D.DepartmentID --here
Conversely, this query:
SELECT DISTINCT ItemID
FROM Sale
Has no limiting factor.
As an aside, you also further limit the results of each query with:
WHERE DepartmentFloor = 2
But I don't think that is the part that is throwing you off, I think it is the concept that a correlated subquery occurs for each record. If you were to remove your correlating clause, then the subquery would actually return true always, and you would get all results back.
The subquery isn't always returning true. It will evaluate for each row, joining on DepartmentID where the DepartmentFloor is 2.
SQL Fiddle
MySQL 5.6 Schema Setup:
CREATE TABLE Sale ( ItemID int, DepartmentID int ) ;
INSERT INTO Sale ( ItemID, DepartmentID )
VALUES (1,1), (2,2), (3,3), (4,1), (5,4), (6,2), (7,3), (8,4) ;
CREATE TABLE Department ( DepartmentID int, DepartmentFloor int ) ;
INSERT INTO Department ( DepartmentID, DepartmentFloor )
VALUES (1,1), (2,1), (3,2), (4,2) ;
Query 1:
SELECT *
FROM Department
WHERE DepartmentFloor = 2
Results: This lists only the Departments on DepartmentFloor 2.
| DepartmentID | DepartmentFloor |
|--------------|-----------------|
| 3 | 2 |
| 4 | 2 |
Query 2:
SELECT *
FROM Sale
Results: This lists ALL of your Sales.
| ItemID | DepartmentID |
|--------|--------------|
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 1 |
| 5 | 4 |
| 6 | 2 |
| 7 | 3 |
| 8 | 4 |
Query 3:
SELECT *
FROM Sale
WHERE DepartmentID IN (3,4)
Results: And this one shows what is the equivalent of you EXISTS statement. It only shows 4 rows that will match up in my data. So you'd only get back ItemIDs 3,5,7 and 8.
| ItemID | DepartmentID |
|--------|--------------|
| 3 | 3 |
| 5 | 4 |
| 7 | 3 |
| 8 | 4 |
because the uppper part of the query is equivalent to
SELECT DISTINCT ItemID FROM Sale where EXISTS (true)
the upper is the only query that really check the condition ..

MySQL update join query to solve duplicate Values

I have a Categories table which has some duplicate Categories as described below,
`Categories`
+========+============+============+
| cat_id | cat_name | item_count |
+========+============+============+
| 1 | Category 1 | 2 |
| 2 | Category 1 | 1 |
| 3 | Category 2 | 2 |
| 4 | Category 3 | 1 |
| 5 | Category 3 | 1 |
+--------+------------+------------+
Here is another junction table which relates to another Items table. The item_count in the first table is the total number of items per cat_id.
`Junction`
+========+=========+
| cat_id | item_id |
+========+=========+
| 1 | 100 |
| 1 | 101 |
| 2 | 102 |
| 3 | 103 |
| 3 | 104 |
| 4 | 105 |
| 5 | 106 |
+--------+---------+
How do I add or combine those items from the duplicate Categories into ones each having maximum item_count among their duplicates? (e.g. Category 1).
Also, if the item_count is the same for those duplicate ones, then the Category with maximum cat_id will be chosen and item_count will be combined to that record. (e.g. Category 3).
Note: Instead of removing the duplicate records, the item_count will
be set to 0.
Below is the expected result.
+========+============+============+
| cat_id | cat_name | item_count |
+========+============+============+
| 1 | Category 1 | 3 |
| 2 | Category 1 | 0 |
| 3 | Category 2 | 2 |
| 4 | Category 3 | 0 |
| 5 | Category 3 | 2 |
+--------+------------+------------+
+========+=========+
| cat_id | item_id |
+========+=========+
| 1 | 100 |
| 1 | 101 |
| 1 | 102 |
| 3 | 103 |
| 3 | 104 |
| 5 | 105 |
| 5 | 106 |
+--------+---------+
In the result, there are two duplicates Category 1 and Category 3. And we have 2 scenarios,
cat_id=2 is eliminated because its item_count=1 is less than
that of cat_id=1 which is item_count=2.
cat_id=4 is eliminated even though its item_count is the same
as that of cat_id=5 since 5 is the maximum among duplicate
Category 3.
Please help me if any query that can join and update both tables in order to solve the duplicates.
Here's a SELECT. You can figure out to adapt it to an UPDATE ;-)
I've ignored the jucntion table for simplicity
SELECT z.cat_id
, z.cat_name
, (z.cat_id = x.cat_id) * new_count item_count
FROM categories x
LEFT
JOIN categories y
ON y.cat_name = x.cat_name
AND (y.item_count > x.item_count OR (y.item_count = x.item_count AND y.cat_id > x.cat_id))
LEFT
JOIN
( SELECT a.cat_id, b.*
FROM categories a
JOIN
( SELECT cat_name, SUM(item_count) new_count, MAX(item_count) max_count FROM categories GROUP BY cat_name) b
ON b.cat_name = a.cat_name
) z
ON z.cat_name = x.cat_name
WHERE y.cat_id IS NULL;
+--------+------------+------------+
| cat_id | cat_name | item_count |
+--------+------------+------------+
| 1 | Category 1 | 3 |
| 2 | Category 1 | 0 |
| 3 | Category 2 | 2 |
| 4 | Category 3 | 0 |
| 5 | Category 3 | 2 |
+--------+------------+------------+
DELIMITER $$
DROP PROCEDURE IF EXISTS cursor_proc $$
CREATE PROCEDURE cursor_proc()
BEGIN
DECLARE #cat_id INT;
DECLARE #cat_name VARCHAR(255);
DECLARE #item_count INT;
DECLARE #prev_cat_Name VARCHAR(255);
DECLARE #maxItemPerCategory INT;
DECLARE #maxItemId INT DEFAULT 0;
DECLARE #totalItemsCount INT;
-- this flag will be set to true when cursor reaches end of table
DECLARE exit_loop BOOLEAN;
-- Declare the cursor
DECLARE categories_cursor CURSOR FOR
SELECT select cat_id ,cat_name ,item_count from Categories Order By cat_name, cat_id;
-- set exit_loop flag to true if there are no more rows
DECLARE CONTINUE HANDLER FOR NOT FOUND SET exit_loop = TRUE;
-- open the cursor
OPEN categories_cursor;
-- start looping
categories_loop: LOOP
-- read the name from next row into the variables
FETCH categories_cursor INTO #cat_id, #cat_name, #item_count ;
-- close the cursor and exit the loop if it has.
IF exit_loop THEN
CLOSE categories_loop;
LEAVE categories_loop;
END IF;
IF(#prev_cat_Name <> #cat_name)
THEN
-- Category has changed, set the item_count of the 'best' category with the total items count
IF(#maxItemId > 0)
THEN
UPDATE Categories
SET Categories.item_count=#totalItemsCount
WHERE Categories.cat_id=#maxItemId;
END IF;
-- Reset Values with the actual row values
SET #maxItemPerCategory = #item_count;
SET #prev_cat_Name = #cat_name;
SET #maxItemId = #cat_id
SET #totalItemsCount = #item_count;
ELSE
-- increment the total items count
SET #totalItemsCount = #totalItemsCount + #item_count
-- if the actual row has the maximun item counts, then it is the 'best'
IF (#maxIntPerCategory < #item_count)
THEN
SET #maxIntPerCategory = #item_count
SET #maxItemId = #cat_id
ELSE
-- else, this row is not the best of its Category
UPDATE Categories
SET Categories.item_count=0
WHERE Categories.cat_id=#cat_id;
END IF;
END IF;
END LOOP categories_loop;
END $$
DELIMITER ;
It's not pretty and copied in part from Strawberry's SELECT
UPDATE categories cat,
junction jun,
(select
(z.cat_id = x.cat_id) * new_count c,
x.cat_id newcatid,
z.cat_id oldcatid
from categories x
LEFT
JOIN categories y
ON y.cat_name = x.cat_name
AND (y.item_count > x.item_count OR (y.item_count = x.item_count AND y.cat_id > x.cat_id))
LEFT
JOIN
( SELECT a.cat_id, b.*
FROM categories a
JOIN
( SELECT cat_name, SUM(item_count) new_count, MAX(item_count) max_count FROM categories GROUP BY cat_name) b
ON b.cat_name = a.cat_name
) z
ON z.cat_name = x.cat_name
WHERE
y.cat_id IS NULL) sourceX
SET cat.item_count = sourceX.c, jun.cat_id = sourceX.newcatid
WHERE cat.cat_id = jun.cat_id and cat.cat_id = sourceX.oldcatid
I think it's better to do what you want one step at time:
First, get data you need:
SELECT Max(`cat_id`), sum(`item_count`) FROM `Categories` GROUP BY `cat_name`
With these data you'll be able to check if update was correctly done.
Then, with a loop on acquired data, update:
update Categories set item_count =
(
Select Tot FROM (
Select sum(`item_count`) as Tot
FROM `Categories`
WHERE `cat_name` = '#cat_name') as tmp1
)
WHERE cat_id = (
Select MaxId
FROM (
select max(cat_id) as MaxId
FROM Categories
WHERE `cat_name` = '#cat_name') as tmp2)
Pay attention, if you run twice this code the result will be wrong.
Finally, set others Ids to 0
UPDATE Categories set item_count = 0
WHERE `cat_name` = '#cat_name'
AND cat_id <> (
Select MaxId
FROM (
select max(cat_id) as MaxId
FROM items
WHERE `cat_name` = '#cat_name0') as tmp2)