The Problem
I would like to sanitize a messy database and replace references to duplicate entries. In this custom made (mine is far more complex) example I have two tables:
Octopuses
Colors
We know that:
An octopus has a color.
Table colors contain duplicates
Some octopuses may have the same color as other octopuses, but different color_id.
The way I solved this problem involves TEMPORARY tables. To avoid the error:
Can't Reopen Table 'duplicates'
I simply duplicate my TEMPORARY table many times:
CREATE TEMPORARY TABLE duplicates1 SELECT * FROM duplicates;
CREATE TEMPORARY TABLE duplicates2 SELECT * FROM duplicates;
The Question
I would like to avoid to clone TEMPORARY tables.
The Data
CREATE TABLE `test`.`octopuses` (
`id` INT NOT NULL AUTO_INCREMENT,
`name` VARCHAR(45) NOT NULL,
`color_id` INT NOT NULL,
PRIMARY KEY (`id`));
CREATE TABLE `test`.`colors` (
`id` INT NOT NULL AUTO_INCREMENT,
`name` VARCHAR(45) NOT NULL,
PRIMARY KEY (`id`));
With some colors with duplicate:
INSERT INTO colors (name) VALUES
('cream'), ('sepia'), ('daffodil'), ('lipstick'),
('lipstick'), ('garnet'), ('flamingo'), ('navy'),
('chartreuse'), ('garnet'), ('flamingo'), ('juniper'),
('flint'), ('flint'), ('charcoal'), ('garnet');
And some octopuses:
INSERT INTO octopuses (name, color_id) VALUES
('Bubbles', 1), ('Inky', 8), ('Octavius', 1),
('Sir Inks-A-Lot', 7), ('Octavia', 16), ('Kraken', 6),
('Oncho', 15), ('Big Floppy Sea Spider', 14), ('Calamari', 2),
('Scuba Doo', 13), ('Squidward Tentacles', 5), ('Wiggleton', 9),
('Cthulhu', 2), ('Octopussy', 3), ('Triton', 10),
('Doctor Octopus', 11), ('Billy The Squid', 4), ('Stretch', 12);
The Example
To solve the problem I first create the list of duplicates:
CREATE TEMPORARY TABLE duplicates SELECT
*, COUNT(*) AS count
FROM
colors
GROUP BY name
HAVING count > 1;
Here it is:
mysql> select * FROM duplicates;
+----+----------+-------+
| id | name | count |
+----+----------+-------+
| 4 | lipstick | 2 |
| 6 | garnet | 3 |
| 7 | flamingo | 2 |
| 13 | flint | 2 |
+----+----------+-------+
Then I would like to create a corresponding table where I have the id of a duplicate and the id to be replaced with:
CREATE TEMPORARY TABLE duplicates1 SELECT * FROM duplicates;
CREATE TEMPORARY TABLE duplicates2 SELECT * FROM duplicates;
CREATE TEMPORARY TABLE corresponding SELECT
id, name,
(SELECT
id
FROM
duplicates2
WHERE
duplicates2.name = colors.name) AS first_id
FROM
colors
WHERE
name IN (SELECT
name
FROM
duplicates)
AND id NOT IN (SELECT
id
FROM
duplicates1)
ORDER BY name ASC;
Here the content:
mysql> SELECT * FROM corresponding;
+----+----------+----------+
| id | name | first_id |
+----+----------+----------+
| 11 | flamingo | 7 |
| 14 | flint | 13 |
| 10 | garnet | 6 |
| 16 | garnet | 6 |
| 5 | lipstick | 4 |
+----+----------+----------+
Then I simply update the octopuses table:
CREATE TEMPORARY TABLE corresponding1 SELECT * FROM corresponding;
UPDATE octopuses
SET
color_id = (SELECT
first_id
FROM
corresponding1
WHERE
corresponding1.id = color_id)
WHERE
color_id IN (SELECT
id
FROM
corresponding)
Eventually I remove the duplicates:
DELETE FROM colors WHERE id IN (SELECT id FROM corresponding);
The Summary
This example is perhaps not the best to illustrate my issue, but here I would like to avoid to clone temporary tables and find a way to select with multiple IN conditions on TEMPORARY tables.
Try to think the other way around.
You could do:
UPDATE octopuses
INNER JOIN
(SELECT
*,
(SELECT
id
FROM
colors
WHERE
colors.name = (SELECT
name
FROM
colors
WHERE
color_id = colors.id)
LIMIT 1) AS first_color_id
FROM
octopuses
HAVING color_id <> first_color_id) AS DUP ON dup.color_id = octopuses.color_id
SET
octopuses.color_id = first_color_id
WHERE
octopuses.color_id <> first_color_id;
CREATE TEMPORARY TABLE to_delete SELECT id FROM colors WHERE NOT EXISTS (
SELECT id FROM octopuses WHERE color_id = colors.id
);
DELETE FROM colors WHERE id IN (SELECT id FROM to_delete);
So the answer to your question is:
Whenever you need to clone a temporary table, think twice and you will find another way that do no involve reopening temporary table twice!
Related
I have the following tables (minified for the sake of simplicity):
CREATE TABLE IF NOT EXISTS `product_bundles` (
bundle_id int AUTO_INCREMENT PRIMARY KEY,
-- More columns here for bundle attributes
) ENGINE=InnoDB;
CREATE TABLE IF NOT EXISTS `product_bundle_parts` (
`part_id` int AUTO_INCREMENT PRIMARY KEY,
`bundle_id` int NOT NULL,
`sku` varchar(255) NOT NULL,
-- More columns here for product attributes
KEY `bundle_id` (`bundle_id`),
KEY `sku` (`sku`)
) ENGINE=InnoDB;
CREATE TABLE IF NOT EXISTS `products` (
`product_id` mediumint(8) AUTO_INCREMENT PRIMARY KEY,
`sku` varchar(64) NOT NULL DEFAULT '',
`status` char(1) NOT NULL default 'A',
-- More columns here for product attributes
KEY (`sku`),
) ENGINE=InnoDB;
And I want to show only the 'product bundles' that are currently completely in stock and defined in the database (since these get retrieved from a third party vendor, there is no guarantee the SKU is defined). So I figured I'd need an anti-join to retrieve it accordingly:
SELECT SQL_CALC_FOUND_ROWS *
FROM product_bundles AS bundles
WHERE 1
AND NOT EXISTS (
SELECT *
FROM product_bundle_parts AS parts
LEFT JOIN products AS products ON parts.sku = products.sku
WHERE parts.bundle_id = bundles.bundle_id
AND products.status = 'A'
AND products.product_id IS NULL
)
-- placeholder for other dynamic conditions for e.g. sorting
LIMIT 0, 24
Now, I sincerely thought this would filter out the products by status, however, that seems not to be the case. I then changed one thing up a bit, and the query never finished (although I believe it to be correct):
SELECT SQL_CALC_FOUND_ROWS *
FROM product_bundles AS bundles
WHERE 1
AND NOT EXISTS (
SELECT *
FROM product_bundle_parts AS parts
LEFT JOIN products AS products ON parts.sku = products.sku
AND products.status = 'A'
WHERE parts.bundle_id = bundles.bundle_id
AND products.product_id IS NULL
)
-- placeholder for other dynamic conditions for e.g. sorting
LIMIT 0, 24
Example data:
product_bundles
bundle_id | etc.
1 |
2 |
3 |
product_bundle_parts
part_id | bundle_id | sku
1 | 1 | 'sku11'
2 | 1 | 'sku22'
3 | 1 | 'sku33'
4 | 1 | 'sku44'
5 | 2 | 'sku55'
6 | 2 | 'sku66'
7 | 3 | 'sku77'
8 | 3 | 'sku88'
products
product_id | sku | status
101 | 'sku11' | 'A'
102 | 'sku22' | 'A'
103 | 'sku33' | 'A'
104 | 'sku44' | 'A'
105 | 'sku55' | 'D'
106 | 'sku66' | 'A'
107 | 'sku77' | 'A'
108 | 'sku99' | 'A'
Example result: Since the product status of product #105 is 'D' and 'sku88' from part #8 was not found:
bundle_id | etc.
1 |
I am running Server version: 10.3.25-MariaDB-0ubuntu0.20.04.1 Ubuntu 20.04
So there are a few questions I have.
Why does the first query not filter out products that do not have the status A.
Why does the second query not finish?
Are there alternative ways of achieving the same thing in a more efficient matter, as this looks rather cumbersome.
First of all, I've read that SQL_CALC_FOUND_ROWS * is much slower than running two separate query (COUNT(*) and then SELECT * or, if you make your query inside another programming language, like PHP, executing the SELECT * and then count the number of rows of the result set)
Second: your first query returns all the boundles that doesn't have ANY active products, while you need the boundles with ALL products active.
I'd change it in the following:
SELECT SQL_CALC_FOUND_ROWS *
FROM product_bundles AS bundles
WHERE NOT EXISTS (
SELECT 'x'
FROM product_bundle_parts AS parts
LEFT JOIN products ON (parts.sku = products.sku)
WHERE parts.bundle_id = bundles.bundle_id
AND COALESCE(products.status, 'X') != 'A'
)
-- placeholder for other dynamic conditions for e.g. sorting
LIMIT 0, 24
I changed the products.status = 'A' in products.status != 'A': in this way the query will return all the boundles that DOESN'T have inactive products (I also removed the condition AND products.product_id IS NULL because it should have been in OR, but with a loss in performance).
You can see my solution in SQLFiddle.
Finally, to know why your second query doesn't end, you should check the structure of your tables and how they are indexed. Executing an Explain on the query could help you to find eventual issues on the structure. Just put the keyword EXPLAIN before the SELECT and you'll have your "report" (EXPLAIN SELECT * ....).
I have table users AND orders. After every UPDATE row in orders. I want update DATA in users table namely concat(OLD.DATA + ID which was updated).
Table 'users'.
ID NAME DATA
1 John 1|2
2 Michael 3|4
3 Someone 5
Table 'orders'.
ID USER CONTENT
1 1 ---
2 1 ---
3 2 ---
4 2 ---
5 3 ---
For example:
SELECT `data` from `users` where `id` = 2; // Result: 3|4
UPDATE `orders` SET '...' WHERE `id` > 0;
**NEXT LOOP**
UPDATE `users` SET `data` = concat(OLD.data, ID.rowUpdated) WHERE `user` = 1;
UPDATE `users` SET `data` = concat(OLD.data, ID.rowUpdated) WHERE `user` = 1;
UPDATE `users` SET `data` = concat(OLD.data, ID.rowUpdated) WHERE `user` = 2;
UPDATE `users` SET `data` = concat(OLD.data, ID.rowUpdated) WHERE `user` = 2;
UPDATE `users` SET `data` = concat(OLD.data, ID.rowUpdated) WHERE `user` = 3;
Result:
SELECT data from users where id = 1; // Result: 1|2|1|2
SELECT data from users where id = 2; // Result: 3|4|3|4
SELECT data from users where id = 3; // Result: 5|5
How can I do it?
I think you are making the same mistake I made not too long ago, ie storing an array/object in a column.
I would recommend using the following tables in your scenario:
users
+-----------+-----------+
| id | user_name |
+-----------+-----------+
| 1 | John |
+-----------+-----------+
| 2 | Michael |
+-----------+-----------+
orders
+-----------+-----------+------------+
| id | user_id |date_ordered|
+-----------+-----------+------------+
| 1 | 1 | 2019-03-05 |
+-----------+-----------+------------+
| 2 | 2 | 2019-03-05 |
+-----------+-----------+------------+
Where user_id is the foreign key to users
sales
+-----------+-----------+------------+------------+------------+
| id | order_id | item_sku | qty | price |
+-----------+-----------+------------+------------+------------+
| 1 | 1 | 1001 | 1 | 2.50 |
+-----------+-----------+------------+------------+------------+
| 2 | 1 | 1002 | 2 | 3.00 |
+-----------+-----------+------------+------------+------------+
| 3 | 2 | 1001 | 2 | 2.00 |
+-----------+-----------+------------+------------+------------+
where order_id is the foreign key to orders
Now for the confusing part. You will need to use a series of JOINs to access the relevant data for each user.
SELECT
t3.id AS user_id,
t3.user_name,
t1.id AS order_id,
t1.date_ordered,
SUM((t2.price * t2.qty)) AS order_total
FROM orders t1
JOIN sales t2 ON (t2.order_id = t1.id)
LEFT JOIN users t3 ON (t1.user_id = t3.id)
WHERE user_id=1
GROUP BY order_id;
This will return:
+-----------+--------------+------------+------------+--------------+
| user_id | user_name | order_id |date_ordered| order_total |
+-----------+--------------+------------+------------+--------------+
| 1 | John | 1 | 2019-03-05 | 8.50 |
+-----------+--------------+------------+------------+--------------+
These type of JOIN statements should come up in basically any project using a relational database (that is, if you are designing your DB correctly). Typically I create a view for each of these complicated queries, which can then be accessed with a simple SELECT * FROM orders_view
For example:
CREATE
ALGORITHM = UNDEFINED
DEFINER = `root`#`localhost`
SQL SECURITY DEFINER
VIEW orders_view AS (
SELECT
t3.id AS user_id,
t3.user_name,
t1.id AS order_id,
t1.date_ordered,
SUM((t2.price * t2.qty)) AS order_total
FROM orders t1
JOIN sales t2 ON (t2.order_id = t1.id)
LEFT JOIN users t3 ON (t1.user_id = t3.id)
GROUP BY order_id
)
This can then be accessed by:
SELECT * FROM orders_view WHERE user_id=1;
Which would return the same results as the query above.
Depending on your needs, you will probably need to add a few more tables (addresses, products etc.) and several more rows to each of these tables. Very often you will find that you need to JOIN 5+ tables into a view, and sometimes you might need to JOIN the same table twice.
I hope this helps despite it not exactly answering your question!
It is probably a bad idea to update the USERS table after inserting into (or updating) the ORDERS table. Avoid storing data twice. In your case: you can always get all "order ids" for a user by querying the ORDERS table. Thus, you don't need to store them in the USERS table (again). Example (tested with MySQL 8.0, see dbfiddle):
Tables and data
create table users( id integer primary key, name varchar(30) ) ;
insert into users( id, name ) values
(1, 'John'),(2, 'Michael'),(3, 'Someone') ;
create table orders(
id integer primary key
, userid integer
, content varchar(3) references users (id)
);
insert into orders ( id, userid, content ) values
(101, 1, '---'),(102, 1, '---')
,(103, 2, '---'),(104, 2, '---'),(105, 3, '---') ;
Maybe a VIEW - similar to the one below - will do the trick. (Advantage: you don't need additional columns or tables.)
-- View
-- Inner SELECT: group order ids per user (table ORDERS).
-- Outer SELECT: fetch the user name (table USERS)
create or replace view userorders (
userid, username, userdata
)
as
select
U.id, U.name, O.orders_
from (
select
userid
, group_concat( id order by id separator '|' ) as orders_
from orders
group by userid
) O join users U on O.userid = U.id ;
Once the view is in place, you can just SELECT from it, and you will always get the current "userdata" eg
select * from userorders ;
-- result
userid username userdata
1 John 101|102
2 Michael 103|104
3 Someone 105
-- add some more orders
insert into orders ( id, userid, content ) values
(1000, 1, '***'),(4000, 1, '***'),(7000, 1, '***')
,(2000, 2, ':::'),(5000, 2, ':::'),(8000, 2, ':::')
,(3000, 3, '###'),(6000, 3, '###'),(9000, 3, '###') ;
select * from userorders ;
-- result
userid username userdata
1 John 101|102|1000|4000|7000
2 Michael 103|104|2000|5000|8000
3 Someone 105|3000|6000|9000
I Have this cat id - post id relation table.
+----+--------+---------+
| id | cat_id | post_id |
| | | |
| 1 | 11 | 32 |
| 2 | ... | ... |
+----+--------+---------+
I use SELECT WHERE cat_id = 11 AND post_id = 32 and then if no result found, I do INSERT.
Can I rewrite these two queries in One?
You can do something like this:
insert into cats_rel(cat_id, post_id)
select 11, 32
where not exists (select 1 from cats_rel where cat_id = 11 and post_id = 32);
EDIT:
Oops. That above doesn't work in MySQL because it is missing a from clause (works in many other databases, though). In any case, I usually write this putting the values in a subquery, so they only appear in the query once:
insert into cats_rel(cat_id, post_id)
select toinsert.cat_id, toinsert.post_id
from (select 11 as cat_id, 32 as post_id) toinsert
where not exists (select 1
from cats_rel cr
where cr.cat_id = toinsert.cat_id and cr.post_id = toinsert.post_id
);
You can use Replace
REPLACE INTO 'yourtable'
SET `cat_id` = 11, `post_id` = 32;
if the record exists it will overwrite it otherwise it will be created;
Update :
For this to work you should add a unique key to the pair of columns not only one
ALTER TABLE yourtable ADD UNIQUE INDEX cat_post_unique (cat_id, post_id);
We can use "from dual" clause for MySQL:
insert into cats_rel(cat_id, post_id)
select 11, 32 from dual
where not exists (select 1 from cats_rel where cat_id = 11 and post_id = 32);
I've the following three tables:
Table A:
id VARCHAR(32) | value VARCHAR(32) | groupId INT
abcdef | myValue1 | 1
ghijkl | myValue2 | 2
mnopqr | myValue3 | 1
Table B:
id VARCHAR(32) | value VARCHAR(32) | userId INT
abcdef | myValue4 | 1
uvwxyz | anotherValue | 1
Table C:
id VARCHAR(32) | someOtherColumns...
abcdef
ghijkl
mnopqr
...
uvwxyz
Table A and B are used for a m:n-association, thus the "id"-column in both tables references the same field ("id"-column in table c).
What I want to do is (for instance)... select all entries in table A where groupId = 1
SELECT * FROM TableA WHERE groupId = 1
and also select all entries in table B where userId = 1
SELECT * FROM TableB WHERE userId = 1
That's all no problem... but the following makes the select-statement(s) difficult: How can I merge both select-results and replace the value of the first result? For example:
selecting all entries in Table A where groupId = 1 I'll get abcdef and also mnopqr.
when I select all entries in Table B where userId = 1 I'll also get abdef (and additionally uvwxyz).
Now, the value of abcdef in Table B should replace the value in the selection result of table A. And the uvwxyz-entry should be added to the result.
Finally I'm looking for a query which produces the following table:
id VARCHAR(32) | value VARCHAR(32)
abcdef | myValue4 -- myValue1 from the select-statement in tableA should be overwritten
mnopqr | myValue2 -- from table A
uvwxyz | anotherValue -- from table B
I hope anyone know how to do this... thanks in advance for any suggestion! By the way... it would be great if there is any chance to realize this using one single (long) select statement.
Try this:
SELECT * FROM TableB WHERE userId = 1
UNION
SELECT * FROM TableA WHERE groupId = 1
and id not in (select id from TableB where userid = 1)
#rs points out to use the UNION, which is required since MySQL doesn't have FULL joins.
Favoring the data from table B is a chose for CASE:
select id, case when max(value_b) is not null then max(value_b) else max(value_a) end as final_value
from (
select id, value as 'value_a', null as 'value_b' from tableA
union
select id, null, value from tableB
) ugh
group by 1;
I have a table with name-value pairs and additional attribute. The same name can have more than one value. If that happens I want to return the row which has a higher attribute value.
Table:
ID | name | value | attribute
1 | set1 | 1 | 0
2 | set2 | 2 | 0
3 | set3 | 3 | 0
4 | set1 | 4 | 1
Desired results of query:
name | value
set2 | 2
set3 | 3
set1 | 4
What is the best performing sql query to get the desired results?
the best performing query would be as follows:
select
s.set_id,
s.name as set_name,
a.attrib_id,
a.name as attrib_name,
sav.value
from
sets s
inner join set_attribute_values sav on
sav.set_id = s.set_id and sav.attrib_id = s.max_attrib_id
inner join attributes a on sav.attrib_id = a.attrib_id
order by
s.set_id;
+--------+----------+-----------+-------------+-------+
| set_id | set_name | attrib_id | attrib_name | value |
+--------+----------+-----------+-------------+-------+
| 1 | set1 | 3 | attrib3 | 20 |
| 2 | set2 | 0 | attrib0 | 10 |
| 3 | set3 | 0 | attrib0 | 10 |
| 4 | set4 | 4 | attrib4 | 10 |
| 5 | set5 | 2 | attrib2 | 10 |
+--------+----------+-----------+-------------+-------+
obviously for this to work you're gonna also have to normalise your design and implement a simple trigger:
drop table if exists attributes;
create table attributes
(
attrib_id smallint unsigned not null primary key,
name varchar(255) unique not null
)
engine=innodb;
drop table if exists sets;
create table sets
(
set_id smallint unsigned not null auto_increment primary key,
name varchar(255) unique not null,
max_attrib_id smallint unsigned not null default 0,
key (max_attrib_id)
)
engine=innodb;
drop table if exists set_attribute_values;
create table set_attribute_values
(
set_id smallint unsigned not null,
attrib_id smallint unsigned not null,
value int unsigned not null default 0,
primary key (set_id, attrib_id)
)
engine=innodb;
delimiter #
create trigger set_attribute_values_before_ins_trig
before insert on set_attribute_values
for each row
begin
update sets set max_attrib_id = new.attrib_id
where set_id = new.set_id and max_attrib_id < new.attrib_id;
end#
delimiter ;
insert into attributes values (0,'attrib0'),(1,'attrib1'),(2,'attrib2'),(3,'attrib3'),(4,'attrib4');
insert into sets (name) values ('set1'),('set2'),('set3'),('set4'),('set5');
insert into set_attribute_values values
(1,0,10),(1,3,20),(1,1,30),
(2,0,10),
(3,0,10),
(4,4,10),(4,2,20),
(5,2,10);
This solution will probably perform the best:
Select ...
From Table As T
Left Join Table As T2
On T2.name = T.name
And T2.attribute > T1.attribute
Where T2.ID Is Null
Another solution which may not perform as well (you would need to evaluate against your data):
Select ...
From Table As T
Where Not Exists (
Select 1
From Table As T2
Where T2.name = T.name
And T2.attribute > T.attribute
)
select name,max(value)
from table
group by name
SELECT name, value
FROM (SELECT name, value, attribute
FROM table_name
ORDER BY attribute DESC) AS t
GROUP BY name;
There is no easy way to do this.
A similar question was asked here.
Edit: Here's a suggestion:
SELECT `name`,`value` FROM `mytable` ORDER BY `name`,`attribute` DESC
This isn't quite what you asked for, but it'll at least give you the higher attribute values first, and you can ignore the rest.
Edit again: Another suggestion:
If you know that value is a positive integer, you can do this. It's yucky, but it'll work.
SELECT `name`,CAST (GROUP_CONCAT(`value` ORDER by `attribute` DESC) as UNSIGNED) FROM `mytable` GROUP BY `name`
To include negative integers you could change UNSIGNED to SIGNED.
Might want to benchmark all these options, here's another one.
SELECT t1.name, t1.value
FROM temp t1
WHERE t1.attribute IN (
SELECT MAX(t2.attribute)
FROM temp t2
WHERE t2.name = t1.name);
How about:
SELECT ID, name, value, attribute
FROM table A
WHERE A.attribute = (SELECT MAX(B.attribute) FROM table B WHERE B.NAME = A.NAME);
Edit: Seems like someones said the same already.
Did not benchmark them, but here is how it is doable:
TableName = temm
1) Row with maximum value of attribute :
select t.name, t.value
from (
select name, max(attribute) as maxattr
from temm group by name
) as x inner join temm as t on t.name = x.name and t.attribute = x.maxattr;
2) Top N rows with maximum attribute value :
select name, value
from temm
where (
select count(*) from temm as n
where n.name = temm.name and n.attribute > temm.attribute
) < 1 ; /* 1 can be changed to 2,3,4 ..N to get N rows */