I have the following tables (minified for the sake of simplicity):
CREATE TABLE IF NOT EXISTS `product_bundles` (
bundle_id int AUTO_INCREMENT PRIMARY KEY,
-- More columns here for bundle attributes
) ENGINE=InnoDB;
CREATE TABLE IF NOT EXISTS `product_bundle_parts` (
`part_id` int AUTO_INCREMENT PRIMARY KEY,
`bundle_id` int NOT NULL,
`sku` varchar(255) NOT NULL,
-- More columns here for product attributes
KEY `bundle_id` (`bundle_id`),
KEY `sku` (`sku`)
) ENGINE=InnoDB;
CREATE TABLE IF NOT EXISTS `products` (
`product_id` mediumint(8) AUTO_INCREMENT PRIMARY KEY,
`sku` varchar(64) NOT NULL DEFAULT '',
`status` char(1) NOT NULL default 'A',
-- More columns here for product attributes
KEY (`sku`),
) ENGINE=InnoDB;
And I want to show only the 'product bundles' that are currently completely in stock and defined in the database (since these get retrieved from a third party vendor, there is no guarantee the SKU is defined). So I figured I'd need an anti-join to retrieve it accordingly:
SELECT SQL_CALC_FOUND_ROWS *
FROM product_bundles AS bundles
WHERE 1
AND NOT EXISTS (
SELECT *
FROM product_bundle_parts AS parts
LEFT JOIN products AS products ON parts.sku = products.sku
WHERE parts.bundle_id = bundles.bundle_id
AND products.status = 'A'
AND products.product_id IS NULL
)
-- placeholder for other dynamic conditions for e.g. sorting
LIMIT 0, 24
Now, I sincerely thought this would filter out the products by status, however, that seems not to be the case. I then changed one thing up a bit, and the query never finished (although I believe it to be correct):
SELECT SQL_CALC_FOUND_ROWS *
FROM product_bundles AS bundles
WHERE 1
AND NOT EXISTS (
SELECT *
FROM product_bundle_parts AS parts
LEFT JOIN products AS products ON parts.sku = products.sku
AND products.status = 'A'
WHERE parts.bundle_id = bundles.bundle_id
AND products.product_id IS NULL
)
-- placeholder for other dynamic conditions for e.g. sorting
LIMIT 0, 24
Example data:
product_bundles
bundle_id | etc.
1 |
2 |
3 |
product_bundle_parts
part_id | bundle_id | sku
1 | 1 | 'sku11'
2 | 1 | 'sku22'
3 | 1 | 'sku33'
4 | 1 | 'sku44'
5 | 2 | 'sku55'
6 | 2 | 'sku66'
7 | 3 | 'sku77'
8 | 3 | 'sku88'
products
product_id | sku | status
101 | 'sku11' | 'A'
102 | 'sku22' | 'A'
103 | 'sku33' | 'A'
104 | 'sku44' | 'A'
105 | 'sku55' | 'D'
106 | 'sku66' | 'A'
107 | 'sku77' | 'A'
108 | 'sku99' | 'A'
Example result: Since the product status of product #105 is 'D' and 'sku88' from part #8 was not found:
bundle_id | etc.
1 |
I am running Server version: 10.3.25-MariaDB-0ubuntu0.20.04.1 Ubuntu 20.04
So there are a few questions I have.
Why does the first query not filter out products that do not have the status A.
Why does the second query not finish?
Are there alternative ways of achieving the same thing in a more efficient matter, as this looks rather cumbersome.
First of all, I've read that SQL_CALC_FOUND_ROWS * is much slower than running two separate query (COUNT(*) and then SELECT * or, if you make your query inside another programming language, like PHP, executing the SELECT * and then count the number of rows of the result set)
Second: your first query returns all the boundles that doesn't have ANY active products, while you need the boundles with ALL products active.
I'd change it in the following:
SELECT SQL_CALC_FOUND_ROWS *
FROM product_bundles AS bundles
WHERE NOT EXISTS (
SELECT 'x'
FROM product_bundle_parts AS parts
LEFT JOIN products ON (parts.sku = products.sku)
WHERE parts.bundle_id = bundles.bundle_id
AND COALESCE(products.status, 'X') != 'A'
)
-- placeholder for other dynamic conditions for e.g. sorting
LIMIT 0, 24
I changed the products.status = 'A' in products.status != 'A': in this way the query will return all the boundles that DOESN'T have inactive products (I also removed the condition AND products.product_id IS NULL because it should have been in OR, but with a loss in performance).
You can see my solution in SQLFiddle.
Finally, to know why your second query doesn't end, you should check the structure of your tables and how they are indexed. Executing an Explain on the query could help you to find eventual issues on the structure. Just put the keyword EXPLAIN before the SELECT and you'll have your "report" (EXPLAIN SELECT * ....).
Related
I have a database with two tables one table (shops) has an admin user column and the other a user with less privileges. I plan to LEFT JOIN the table of the user with less privileges. When I retrieve the data, the records for the admin user must be on a separate row and must have NULL values for the left joined table followed by records of users with less privileges (records of the left joined table) if any. I am using MySQL.
I have looked into the UNION commands but I don't think it can help. Please see the results bellow of what I need.
Thank you.
SELECT *
FROM shops LEFT JOIN users USING(shop_id)
WHERE shop_id = 1 AND (admin_id = 1 OR user_id = 1);
+---------+----------+---------+
| shop_id | admin_id | user_id |
+---------+----------+---------+
| 1 | 1 | NULL | <-- Need this one extra record
| 1 | 1 | 1 |
| 1 | 1 | 2 |
| 1 | 1 | 3 |
+---------+----------+---------+
Here is an example structure of the databases and some sample data:
CREATE SCHEMA test DEFAULT CHARACTER SET utf8 ;
USE test;
CREATE TABLE admin(
admin_id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY(admin_id)
);
CREATE TABLE shops(
shop_id INT NOT NULL AUTO_INCREMENT,
admin_id INT NOT NULL,
PRIMARY KEY(shop_id),
CONSTRAINT fk_shop_admin FOREIGN KEY(admin_id) REFERENCES admin (admin_id)
);
CREATE TABLE users(
user_id INT NOT NULL AUTO_INCREMENT,
shop_id INT NOT NULL,
CONSTRAINT fk_user_shop FOREIGN KEY(shop_id) REFERENCES admin (shop_id)
);
-- Sample data
INSERT INTO admin() VALUES ();
INSERT INTO shops(admin_id) VALUES (1);
INSERT INTO users(shop_id) VALUES (1),(1),(1);
I think you need union all:
select s.shop_id, s.admin_id, null as user_id
from shops s
where s.shop_id = 1
union all
select s.shop_id, s.admin_id, u.user_id
from shops s join
users u
on s.shop_id = u.shop_id
where shop_id = 1;
Put your where condition in On clause
SELECT *
FROM shops LEFT JOIN users on shops.shop_id=users.shop_id and (admin_id = 1 OR user_id = 1)
WHERE shops.shop_id = 1
Question covers doubts on efficient SQL query for multiple subqueries:
I have 3 tables. I want to get details from table 1, based on filtering done from table 2 and table 3. Currently I am using IN clause on table 2 and table 3 but it takes around 6 seconds for 2M users. I tried join also but it was slower than subquery.
Table1:
mysql> describe users;
Field | Type | Null | Key | Default
| uuid | varchar(36) | NO | PRI | NULL
| firstname | varchar(512) | YES | | NULL
| status | varchar(512) | YES | | NULL
| createdAt | timestamp | YES | | CURRENT_TIMESTAMP
Table 2:
describe homes;
| Field | Type | Null | Key | Default | Extra
| uuid | varchar(50) | NO | PRI | NULL
| phoneNumberHash | varchar(512) | YES | MUL | NULL
| secondaryPhoneNumberHash | varchar(512) | YES | MUL | NULL
Table 3:
describe utility_tags:
| Field | Type | Null | Key | Default |
| tag_name | varchar(50) | NO | MUL | NULL |
| tag_value | varchar(50) | NO | MUL | NULL |
| user_id | varchar(50) | NO | MUL | NULL |
I have index on all the required fields ie.
User Table : Index on uuid
Home Table : Separate Index on phoneNumberHash and secondaryPhoneNumberHash
Utility_Tags: Separate Index on tag_name and tag_value
Query I am running:
SELECT uuid, firstname
FROM users
WHERE ( uuid in (
SELECT `uuid`
FROM `homes`
WHERE ( ( `phoneNumberHash` = '02c' OR `secondaryPhoneNumberHash` = '02c' ))
)
OR uuid in (
SELECT `user_id`
FROM `utility_tags`
WHERE ( `tag_name` = 'ACCOUNT_NUMBER' AND `tag_value`= '13' )
))
AND `status` != 'DELETED'
ORDER BY `createdAt` DESC LIMIT 10 OFFSET 0;
The query is slow and takes around 6 sec when there are 2M rows in user and homes table.
I tried join query:
SELECT users.uuid, firstname
FROM users inner join homes on homes.uuid=users.uuid
inner join utility_tags on utility_tags.user_id=users.uuid
WHERE ( phoneNumberHash = '02c' OR secondaryPhoneNumberHash = '02cd0' )
OR ( tag_name = 'ACCOUNT_NUMBER' AND tag_value= '1311851988' )
AND `status` != 'DELETED'
ORDER BY `createdAt` DESC
LIMIT 10 OFFSET 0;
This takes around 30 seconds.
Any help is highly appreciated.
You are selecting certain rows from your users table based on matches in your other tables. You're using a complex IN( ... ) clause for that.
Let's look at the contents of that clause for optimization possibilities. Here's one way you generate a set of uuid values.
SELECT uuid
FROM homes
WHERE phoneNumberHash = '02c'
OR secondaryPhoneNumberHash = '02c'
Here's the other
SELECT user_id
FROM utility_tags
WHERE tag_name = 'ACCOUNT_NUMBER'
AND tag_value= '13'
Let's recast all this as a UNION of several sets of uuid values, like this.
SELECT uuid FROM homes WHERE phoneNumberHash = '02c'
UNION
SELECT uuid FROM homes WHERE secondaryPhoneNumberHash = '02c'
UNION
SELECT user_id AS uuid
FROM utility_tags
WHERE tag_name = 'ACCOUNT_NUMBER'
AND tag_value= '13'
That union of three queries does the same thing as all your OR clauses. The first two of those queries should (if you're using InnoDB) be optimized by the indexes on phoneNumberHash and secondaryPhoneNumberHash respectively. The third query in that union needs a compound index on (tag_name, tag_value, user_id) to perform efficiently.
The cool thing about UNION is it does the same sort of set creation as OR, but lets you write queries within the UNION that are more likely to use indexes. I suggest you experiment with this UNION query and appropriate indexes until you're happy with its performance. Then you can use it in your outer query.
(It's possible that the query planner has become smart enough to handle phoneNumberHash = '02c' OR secondaryPhoneNumberHash = '02c' as a UNION all by itself, exploiting your two indexes one after the other. Recent MySQL versions have made great progress in query planning.)
So that leaves us with the outer query:
SELECT uuid, firstname
FROM users
WHERE matching uuids
AND status != 'DELETED'
ORDER BY createdAt DESC
LIMIT 10 OFFSET 0
This is hard to make sargable. The query planner doesn't like != operators. It likes = best because index equality scans are cheap. It likes <, <=, >=, and > OK because range scans are almost as cheap. But you're stuck with !=.
Also, the query planner hates ORDER BY ... LIMIT because it has to sort a whole mess of rows just to discard all except a tiny number.
The following compound covering index MAY optimize this query: (createdAt, status, uuid, firstname). The query planner may be able to dodge the separate ORDER BY if it has an index that provides both the match criteria and the needed results. It's also possible that this index will be better. (createdAt, status, uuid, status, firstname) You'll need to try them both. Don't keep them both, only the one that helps best.
Putting it all together:
SELECT u.uuid, u.firstname
FROM users u
JOIN (
SELECT uuid FROM homes WHERE phoneNumberHash = '02c'
UNION
SELECT uuid FROM homes WHERE secondaryPhoneNumberHash = '02c'
UNION
SELECT user_id AS uuid
FROM utility_tags
WHERE tag_name = 'ACCOUNT_NUMBER'
AND tag_value= '13'
) s ON s.uuid = u.uuid
WHERE status != 'DELETED'
ORDER BY createdAt DESC
LIMIT 10 OFFSET 0
Things get interesting on megarow tables when you want subsecond query response. http://use-the-index-luke.com/ is a fine reference for this stuff.
Your main problem is you're selecting from users first - move it to last so its index can be used (subqueries can't be indexed).
Also, SQL OR is notorious, mainly because (almost always) at most 1 index can be used.
Select from the subquery first, so the index into users can be used
Ensure there are indexes on all looked-up columns, ie (uuid), (phoneNumberHash), (secondaryPhoneNumberHash) and (tag_name, tag_value)
Break up your query to eradicate OR
Try this:
SELECT uuid, firstname
FROM (
SELECT uuid
FROM homes
WHERE phoneNumberHash = '02c'
UNION
SELECT uuid
FROM homes
WHERE secondaryPhoneNumberHash = '02c'
SELECT user_id
FROM utility_tags
WHERE tag_name = 'ACCOUNT_NUMBER'
AND tag_value = 13
) x
JOIN users ON users.uuid = x.uuid
AND status != 'DELETED'
ORDER BY createdAt DESC
LIMIT 10 OFFSET 0
Notice also that the test for status != 'DELETED' is in the join condition (not the WHERE clause), so it's executed at join time, not post-join, which will boost performance especially if there are a lot of deleted users.
I have the following table:
id | parent_id | searchable | value
--------------------------------------------
1 | 0 | 0 | a
2 | 1 | 0 | b
3 | 2 | 1 | c
4 | 0 | 0 | d
5 | 4 | 1 | e
6 | 0 | 0 | f
7 | 6 | 0 | g
8 | 6 | 0 | h
9 | 0 | 1 | i
I need to extract all the top level records (so the ones where the parent_id = 0).
But only the records where the parent OR one of his children is searchable (searchable = 1)
So in this case, the output should be:
id | parent_id | searchable | value
--------------------------------------------
1 | 0 | 0 | a
4 | 0 | 0 | d
9 | 0 | 1 | i
Because these are all top-level records and it self or one of his childeren (doesn't matter how 'deep' the searchable child is) is searchable.
I am working with MySQL. I am not really sure if it is possible to write this with just one query, but I assume it should be done with a piece of recursive code or a function.
** Note: it is unknown how 'deep' the tree goes.
You will have to use stored procedure to do it.
Find all rows with searchable = 1, store their ids and parent_ids in a temp table.
Then do self-joins to add parents to this temp table.
Repeat until no more rows can be added (obviously better make sure tree is not cyclic).
At the end you have a table only with rows that have a searchable descendant somewhere down the tree, so just show only rows with no parent (at the top).
Assuming your table is called 'my_table' this one should work:
DELIMITER //
DROP PROCEDURE IF EXISTS top_level_parents//
CREATE PROCEDURE top_level_parents()
BEGIN
DECLARE found INT(11) DEFAULT 1;
DROP TABLE IF EXISTS parent_tree;
CREATE TABLE parent_tree (id int(11) PRIMARY KEY, p_id int(11)) ENGINE=HEAP;
INSERT INTO parent_tree
SELECT id, parent_id FROM my_table
WHERE searchable = 1;
SET found = ROW_COUNT();
WHILE found > 0 DO
INSERT IGNORE INTO parent_tree
SELECT p.id, p.parent_id FROM parent_tree c JOIN my_table p
WHERE p.id = c.p_id;
SET found = ROW_COUNT();
END WHILE;
SELECT id FROM parent_tree WHERE p_id = 0;
DROP TABLE parent_tree;
END;//
DELIMITER ;
Then just calling it:
CALL top_level_parents();
will be equal to
SELECT id FROM my_table WHERE id_is_top_level_and_has_searchable_descendant
Recursive queries can be done in Newer Mysql, possibly not around back when this was asked.
Get parents and children data where top level parent has a name of "A" or "B" or "C".
RECURSIVE MySQL 8.0 compatibility.
https://dev.mysql.com/doc/refman/8.0/en/with.html
The first part gets the parent top level and filters it, the second gets the children joining to their parents.
WITH RECURSIVE tree AS (
SELECT id,
name,
parent_id,
1 as level
FROM category
WHERE parent_id = 0 AND (name = 'A' or name = 'B' or name = 'C')
UNION ALL
SELECT c.id,
c.name,
c.parent_id,
t.level + 1
FROM category c
JOIN tree t ON c.parent_id = t.id
)
SELECT *
FROM tree;
To find if the parent or one of its children have searchable, you can pull through that value with a COALESCE(NULLIF(p.searchable,0), NULLIF(c.searchable,0)) and by pulling through the top level parent id and joining back against it.
So to initialize your example data:
CREATE TABLE `category` (
`id` int(11) NOT NULL,
`parent_id` int(11) NULL DEFAULT NULL,
`searchable` int(11) NULL DEFAULT NULL,
`value` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;
INSERT INTO category (id, parent_id, searchable, value) VALUES
(1,0,0,'a'),
(2,1,0,'b'),
(3,2,1,'c'),
(4,0,0,'d'),
(5,4,1,'e'),
(6,0,0,'f'),
(7,6,0,'g'),
(8,6,0,'h'),
(9,0,1,'i');
And to answer the question.
WITH RECURSIVE tree AS (
SELECT id,
value,
parent_id,
1 as level,
searchable,
id AS top_level_id
FROM category
WHERE parent_id = 0
UNION ALL
SELECT c.id,
c.value,
c.parent_id,
t.level + 1,
COALESCE(NULLIF(t.searchable,0), NULLIF(c.searchable,0)),
COALESCE(t.top_level_id) AS top_level_id
FROM category c
JOIN tree t ON c.parent_id = t.id
)
SELECT category.*
FROM category
LEFT JOIN tree ON tree.top_level_id = category.id
WHERE tree.searchable = 1;
Note: Does not handle cyclic linkages.
If you have those, you need to remove them or constraint it so it does not happen, or add a visited column in much the same way you can bring through the top level id possibly.
I have an SQL table that contains the names of people and respective country codes.
----------------
name | code
----------------
saket | IN
rohan | US
samules | AR
Geeth | CH
Vikash | IN
Rahul | IN
Ganesh | US
Zorro | US
What I wanted was that, I should able to get rows group by country code having names starting with sa first, if not then Vi even if not then last row of the group.
When I tried this
SELECT * FROM MyTable GROUP BY code HAVING name like 'sa%' or name like 'vi%';
But its give me rows who matched with the above condition in having clause.
I want that if condition fails then give me the last row of that group, Is it possible?.
If possible, then how?
Maybe not very efficient, but try:
SELECT FIRST(`name`) AS `name`, `code` FROM (
SELECT `name`, `code` FROM `MyTable`
WHERE `name` LIKE 'sa%'
UNION ALL
SELECT `name`, `code` FROM `MyTable`
WHERE `name` LIKE 'vi%'
UNION ALL
SELECT LAST(`name`) AS `name`, `code` FROM `MyTable` GROUP BY `code`
HAVING `name` NOT LIKE 'sa%' AND `name` NOT LIKE `vi%'
) AS `a` GROUP BY `code`
You can try this query. It returns what you need, but be aware - this query has two pitfalls:
Subquery is a pain on 10^6 rows
Field name in outer query is nonaggregated. MySQL documentation says that is is impossible to say what value will be selected for nonaggregated.
http://dev.mysql.com/doc/refman/5.7/en/group-by-extensions.html
select name, country
from
(
select *, if(name like 'sa%', 0, if(name like 'vi%', 2, 3) ) as name_order
from tmp_names
order by country, name_order, name desc
) as tmp_names
group by country
order by name;
It returns
+---------+---------+
| name | country |
+---------+---------+
| Geeth | CH |
| saket | IN |
| samules | AR |
| Zorro | US |
+---------+---------+
I have a table with name-value pairs and additional attribute. The same name can have more than one value. If that happens I want to return the row which has a higher attribute value.
Table:
ID | name | value | attribute
1 | set1 | 1 | 0
2 | set2 | 2 | 0
3 | set3 | 3 | 0
4 | set1 | 4 | 1
Desired results of query:
name | value
set2 | 2
set3 | 3
set1 | 4
What is the best performing sql query to get the desired results?
the best performing query would be as follows:
select
s.set_id,
s.name as set_name,
a.attrib_id,
a.name as attrib_name,
sav.value
from
sets s
inner join set_attribute_values sav on
sav.set_id = s.set_id and sav.attrib_id = s.max_attrib_id
inner join attributes a on sav.attrib_id = a.attrib_id
order by
s.set_id;
+--------+----------+-----------+-------------+-------+
| set_id | set_name | attrib_id | attrib_name | value |
+--------+----------+-----------+-------------+-------+
| 1 | set1 | 3 | attrib3 | 20 |
| 2 | set2 | 0 | attrib0 | 10 |
| 3 | set3 | 0 | attrib0 | 10 |
| 4 | set4 | 4 | attrib4 | 10 |
| 5 | set5 | 2 | attrib2 | 10 |
+--------+----------+-----------+-------------+-------+
obviously for this to work you're gonna also have to normalise your design and implement a simple trigger:
drop table if exists attributes;
create table attributes
(
attrib_id smallint unsigned not null primary key,
name varchar(255) unique not null
)
engine=innodb;
drop table if exists sets;
create table sets
(
set_id smallint unsigned not null auto_increment primary key,
name varchar(255) unique not null,
max_attrib_id smallint unsigned not null default 0,
key (max_attrib_id)
)
engine=innodb;
drop table if exists set_attribute_values;
create table set_attribute_values
(
set_id smallint unsigned not null,
attrib_id smallint unsigned not null,
value int unsigned not null default 0,
primary key (set_id, attrib_id)
)
engine=innodb;
delimiter #
create trigger set_attribute_values_before_ins_trig
before insert on set_attribute_values
for each row
begin
update sets set max_attrib_id = new.attrib_id
where set_id = new.set_id and max_attrib_id < new.attrib_id;
end#
delimiter ;
insert into attributes values (0,'attrib0'),(1,'attrib1'),(2,'attrib2'),(3,'attrib3'),(4,'attrib4');
insert into sets (name) values ('set1'),('set2'),('set3'),('set4'),('set5');
insert into set_attribute_values values
(1,0,10),(1,3,20),(1,1,30),
(2,0,10),
(3,0,10),
(4,4,10),(4,2,20),
(5,2,10);
This solution will probably perform the best:
Select ...
From Table As T
Left Join Table As T2
On T2.name = T.name
And T2.attribute > T1.attribute
Where T2.ID Is Null
Another solution which may not perform as well (you would need to evaluate against your data):
Select ...
From Table As T
Where Not Exists (
Select 1
From Table As T2
Where T2.name = T.name
And T2.attribute > T.attribute
)
select name,max(value)
from table
group by name
SELECT name, value
FROM (SELECT name, value, attribute
FROM table_name
ORDER BY attribute DESC) AS t
GROUP BY name;
There is no easy way to do this.
A similar question was asked here.
Edit: Here's a suggestion:
SELECT `name`,`value` FROM `mytable` ORDER BY `name`,`attribute` DESC
This isn't quite what you asked for, but it'll at least give you the higher attribute values first, and you can ignore the rest.
Edit again: Another suggestion:
If you know that value is a positive integer, you can do this. It's yucky, but it'll work.
SELECT `name`,CAST (GROUP_CONCAT(`value` ORDER by `attribute` DESC) as UNSIGNED) FROM `mytable` GROUP BY `name`
To include negative integers you could change UNSIGNED to SIGNED.
Might want to benchmark all these options, here's another one.
SELECT t1.name, t1.value
FROM temp t1
WHERE t1.attribute IN (
SELECT MAX(t2.attribute)
FROM temp t2
WHERE t2.name = t1.name);
How about:
SELECT ID, name, value, attribute
FROM table A
WHERE A.attribute = (SELECT MAX(B.attribute) FROM table B WHERE B.NAME = A.NAME);
Edit: Seems like someones said the same already.
Did not benchmark them, but here is how it is doable:
TableName = temm
1) Row with maximum value of attribute :
select t.name, t.value
from (
select name, max(attribute) as maxattr
from temm group by name
) as x inner join temm as t on t.name = x.name and t.attribute = x.maxattr;
2) Top N rows with maximum attribute value :
select name, value
from temm
where (
select count(*) from temm as n
where n.name = temm.name and n.attribute > temm.attribute
) < 1 ; /* 1 can be changed to 2,3,4 ..N to get N rows */