mySQL hierarchical grouping sort - mysql

I have a schema that essentially looks like this:
CREATE TABLE `data` (
`id` int(10) unsigned NOT NULL,
`title` text,
`type` tinyint(4),
`parent` int(10)
)
The type field is just an enum where 1 is a parent type, and 2 is a child type (in actuality there are many types, where some should behave like parents and some like children). The parent field indicates that a record is the child of another record.
I know this is probably not ideal for the query I want to build, but this is what I have to work with.
I would like to sort and group the data so that the parent records are sorted by title, and grouped under each parent is the child records sorted by title. Like so:
ID | title |type |parent
--------------------------------
4 | ParentA | 1 |
2 | ChildA | 2 | 4
5 | ChildB | 2 | 4
7 | ParentB | 1 |
9 | ChildC | 2 | 7
1 | ChildD | 2 | 7
** Edit **
We should be able to take the type field out of the picture entirely. If parent is not null then it should be grouped underneath it's parent.

SELECT * FROM `data` ORDER BY COALESCE(`parent`, `id`), `parent`, `id`

Here's a solution tested to work on SQL Server. Should be essentially the same on MySQL
select Id, Title, [Type], Id as OrderId from Hier h1 where [Type] = 1
union
select Id, Title, [Type], Parent as OrderId from Hier h2 where [Type] = 2
order by OrderId, [Type]

You said you wanted it to sort on the titles, correct?
SELECT id, title, parent
FROM
( SELECT id, title, parent,
CASE WHEN parent is null THEN title ELSE CONCAT((SELECT title FROM `data` d2 WHERE d2.id = d.parent), '.', d.title) END AS sortkey
FROM `data` d
) subtable
ORDER BY sortkey
edit: Edited to remove type from the query.

Related

SQL where not exists with multiple rows and status

I have the following tables (minified for the sake of simplicity):
CREATE TABLE IF NOT EXISTS `product_bundles` (
bundle_id int AUTO_INCREMENT PRIMARY KEY,
-- More columns here for bundle attributes
) ENGINE=InnoDB;
CREATE TABLE IF NOT EXISTS `product_bundle_parts` (
`part_id` int AUTO_INCREMENT PRIMARY KEY,
`bundle_id` int NOT NULL,
`sku` varchar(255) NOT NULL,
-- More columns here for product attributes
KEY `bundle_id` (`bundle_id`),
KEY `sku` (`sku`)
) ENGINE=InnoDB;
CREATE TABLE IF NOT EXISTS `products` (
`product_id` mediumint(8) AUTO_INCREMENT PRIMARY KEY,
`sku` varchar(64) NOT NULL DEFAULT '',
`status` char(1) NOT NULL default 'A',
-- More columns here for product attributes
KEY (`sku`),
) ENGINE=InnoDB;
And I want to show only the 'product bundles' that are currently completely in stock and defined in the database (since these get retrieved from a third party vendor, there is no guarantee the SKU is defined). So I figured I'd need an anti-join to retrieve it accordingly:
SELECT SQL_CALC_FOUND_ROWS *
FROM product_bundles AS bundles
WHERE 1
AND NOT EXISTS (
SELECT *
FROM product_bundle_parts AS parts
LEFT JOIN products AS products ON parts.sku = products.sku
WHERE parts.bundle_id = bundles.bundle_id
AND products.status = 'A'
AND products.product_id IS NULL
)
-- placeholder for other dynamic conditions for e.g. sorting
LIMIT 0, 24
Now, I sincerely thought this would filter out the products by status, however, that seems not to be the case. I then changed one thing up a bit, and the query never finished (although I believe it to be correct):
SELECT SQL_CALC_FOUND_ROWS *
FROM product_bundles AS bundles
WHERE 1
AND NOT EXISTS (
SELECT *
FROM product_bundle_parts AS parts
LEFT JOIN products AS products ON parts.sku = products.sku
AND products.status = 'A'
WHERE parts.bundle_id = bundles.bundle_id
AND products.product_id IS NULL
)
-- placeholder for other dynamic conditions for e.g. sorting
LIMIT 0, 24
Example data:
product_bundles
bundle_id | etc.
1 |
2 |
3 |
product_bundle_parts
part_id | bundle_id | sku
1 | 1 | 'sku11'
2 | 1 | 'sku22'
3 | 1 | 'sku33'
4 | 1 | 'sku44'
5 | 2 | 'sku55'
6 | 2 | 'sku66'
7 | 3 | 'sku77'
8 | 3 | 'sku88'
products
product_id | sku | status
101 | 'sku11' | 'A'
102 | 'sku22' | 'A'
103 | 'sku33' | 'A'
104 | 'sku44' | 'A'
105 | 'sku55' | 'D'
106 | 'sku66' | 'A'
107 | 'sku77' | 'A'
108 | 'sku99' | 'A'
Example result: Since the product status of product #105 is 'D' and 'sku88' from part #8 was not found:
bundle_id | etc.
1 |
I am running Server version: 10.3.25-MariaDB-0ubuntu0.20.04.1 Ubuntu 20.04
So there are a few questions I have.
Why does the first query not filter out products that do not have the status A.
Why does the second query not finish?
Are there alternative ways of achieving the same thing in a more efficient matter, as this looks rather cumbersome.
First of all, I've read that SQL_CALC_FOUND_ROWS * is much slower than running two separate query (COUNT(*) and then SELECT * or, if you make your query inside another programming language, like PHP, executing the SELECT * and then count the number of rows of the result set)
Second: your first query returns all the boundles that doesn't have ANY active products, while you need the boundles with ALL products active.
I'd change it in the following:
SELECT SQL_CALC_FOUND_ROWS *
FROM product_bundles AS bundles
WHERE NOT EXISTS (
SELECT 'x'
FROM product_bundle_parts AS parts
LEFT JOIN products ON (parts.sku = products.sku)
WHERE parts.bundle_id = bundles.bundle_id
AND COALESCE(products.status, 'X') != 'A'
)
-- placeholder for other dynamic conditions for e.g. sorting
LIMIT 0, 24
I changed the products.status = 'A' in products.status != 'A': in this way the query will return all the boundles that DOESN'T have inactive products (I also removed the condition AND products.product_id IS NULL because it should have been in OR, but with a loss in performance).
You can see my solution in SQLFiddle.
Finally, to know why your second query doesn't end, you should check the structure of your tables and how they are indexed. Executing an Explain on the query could help you to find eventual issues on the structure. Just put the keyword EXPLAIN before the SELECT and you'll have your "report" (EXPLAIN SELECT * ....).

Retrieve top-level parent MySQL

I have the following table:
id | parent_id | searchable | value
--------------------------------------------
1 | 0 | 0 | a
2 | 1 | 0 | b
3 | 2 | 1 | c
4 | 0 | 0 | d
5 | 4 | 1 | e
6 | 0 | 0 | f
7 | 6 | 0 | g
8 | 6 | 0 | h
9 | 0 | 1 | i
I need to extract all the top level records (so the ones where the parent_id = 0).
But only the records where the parent OR one of his children is searchable (searchable = 1)
So in this case, the output should be:
id | parent_id | searchable | value
--------------------------------------------
1 | 0 | 0 | a
4 | 0 | 0 | d
9 | 0 | 1 | i
Because these are all top-level records and it self or one of his childeren (doesn't matter how 'deep' the searchable child is) is searchable.
I am working with MySQL. I am not really sure if it is possible to write this with just one query, but I assume it should be done with a piece of recursive code or a function.
** Note: it is unknown how 'deep' the tree goes.
You will have to use stored procedure to do it.
Find all rows with searchable = 1, store their ids and parent_ids in a temp table.
Then do self-joins to add parents to this temp table.
Repeat until no more rows can be added (obviously better make sure tree is not cyclic).
At the end you have a table only with rows that have a searchable descendant somewhere down the tree, so just show only rows with no parent (at the top).
Assuming your table is called 'my_table' this one should work:
DELIMITER //
DROP PROCEDURE IF EXISTS top_level_parents//
CREATE PROCEDURE top_level_parents()
BEGIN
DECLARE found INT(11) DEFAULT 1;
DROP TABLE IF EXISTS parent_tree;
CREATE TABLE parent_tree (id int(11) PRIMARY KEY, p_id int(11)) ENGINE=HEAP;
INSERT INTO parent_tree
SELECT id, parent_id FROM my_table
WHERE searchable = 1;
SET found = ROW_COUNT();
WHILE found > 0 DO
INSERT IGNORE INTO parent_tree
SELECT p.id, p.parent_id FROM parent_tree c JOIN my_table p
WHERE p.id = c.p_id;
SET found = ROW_COUNT();
END WHILE;
SELECT id FROM parent_tree WHERE p_id = 0;
DROP TABLE parent_tree;
END;//
DELIMITER ;
Then just calling it:
CALL top_level_parents();
will be equal to
SELECT id FROM my_table WHERE id_is_top_level_and_has_searchable_descendant
Recursive queries can be done in Newer Mysql, possibly not around back when this was asked.
Get parents and children data where top level parent has a name of "A" or "B" or "C".
RECURSIVE MySQL 8.0 compatibility.
https://dev.mysql.com/doc/refman/8.0/en/with.html
The first part gets the parent top level and filters it, the second gets the children joining to their parents.
WITH RECURSIVE tree AS (
SELECT id,
name,
parent_id,
1 as level
FROM category
WHERE parent_id = 0 AND (name = 'A' or name = 'B' or name = 'C')
UNION ALL
SELECT c.id,
c.name,
c.parent_id,
t.level + 1
FROM category c
JOIN tree t ON c.parent_id = t.id
)
SELECT *
FROM tree;
To find if the parent or one of its children have searchable, you can pull through that value with a COALESCE(NULLIF(p.searchable,0), NULLIF(c.searchable,0)) and by pulling through the top level parent id and joining back against it.
So to initialize your example data:
CREATE TABLE `category` (
`id` int(11) NOT NULL,
`parent_id` int(11) NULL DEFAULT NULL,
`searchable` int(11) NULL DEFAULT NULL,
`value` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;
INSERT INTO category (id, parent_id, searchable, value) VALUES
(1,0,0,'a'),
(2,1,0,'b'),
(3,2,1,'c'),
(4,0,0,'d'),
(5,4,1,'e'),
(6,0,0,'f'),
(7,6,0,'g'),
(8,6,0,'h'),
(9,0,1,'i');
And to answer the question.
WITH RECURSIVE tree AS (
SELECT id,
value,
parent_id,
1 as level,
searchable,
id AS top_level_id
FROM category
WHERE parent_id = 0
UNION ALL
SELECT c.id,
c.value,
c.parent_id,
t.level + 1,
COALESCE(NULLIF(t.searchable,0), NULLIF(c.searchable,0)),
COALESCE(t.top_level_id) AS top_level_id
FROM category c
JOIN tree t ON c.parent_id = t.id
)
SELECT category.*
FROM category
LEFT JOIN tree ON tree.top_level_id = category.id
WHERE tree.searchable = 1;
Note: Does not handle cyclic linkages.
If you have those, you need to remove them or constraint it so it does not happen, or add a visited column in much the same way you can bring through the top level id possibly.

How do I get a left join with a group by clause to return all the rows?

I am trying to write a query to determine how much of my inventory is committed at a given time, i.e. current, next month, etc.
A simplified example:
I have an inventory table of items. I have an offer table that specifies the customer, when the offer starts, and when the offer expires. I have a third table that associates the two.
create table inventory
(id int not null auto_increment , name varchar(32) not null, primary key(id));
create table offer
(id int not null auto_increment , customer_name varchar(32) not null, starts_at datetime not null, expires_at datetime, primary key (id));
create table items
(id int not null auto_increment, inventory_id int not null, offer_id int not null, primary key (id),
CONSTRAINT fk_item__offer FOREIGN KEY (offer_id) REFERENCES offer(id),
CONSTRAINT fk_item__inventory FOREIGN KEY (inventory_id) REFERENCES inventory(id));
create some inventory
insert into inventory(name)
values ('item 1'), ('item 2'),('item 3');
create two offers for this month
insert into offer(customer_name, starts_at)
values ('customer 1', DATE_FORMAT(NOW(), '%Y-%m-01')), ('customer 2', DATE_FORMAT(NOW(), '%Y-%m-01'));
and one for next month
insert into offer(customer_name, starts_at)
values ('customer 3', DATE_FORMAT(DATE_ADD(CURDATE(), INTERVAL 1 MONTH), '%Y-%m-01'));
Now add some items to each offer
insert into items(inventory_id, offer_id)
values (1,1), (2,1), (2,2), (3,3);
What I want is a query that will show me all the inventory and the count of the committed inventory for this month. Inventory would be considered committed if the starts_at is less than or equal to now, and the offer has not expired (expires_at is null or expires_at is in the future)
The results I would expect would look like this:
+----+--------+---------------------+
| id | name | committed_inventory |
+----+--------+---------------------+
| 1 | item 1 | 1 |
| 2 | item 2 | 2 |
| 3 | item 3 | 0 |
+----+--------+---------------------+
3 rows in set (0.00 sec)
The query that I felt should work is:
SELECT inventory.id
, inventory.name
, count(items.id) as committed_inventory
FROM inventory
LEFT JOIN items
ON items.inventory_id = inventory.id
LEFT JOIN offer
ON offer.id = items.offer_id
WHERE (offer.starts_at IS NULL OR offer.starts_at <= NOW())
AND (offer.expires_at IS NULL OR offer.expires_at > NOW())
GROUP BY inventory.id, inventory.name;
However, the results from this query does not include the third item. What I get is this:
+----+--------+---------------------+
| id | name | committed_inventory |
+----+--------+---------------------+
| 1 | item 1 | 1 |
| 2 | item 2 | 2 |
+----+--------+---------------------+
2 rows in set (0.00 sec)
I cannot figure out how to get the third inventory item to show. Since inventory is the driving table in the outer joins, I thought that it should always show.
The problem is the where clause. Try this:
SELECT inventory.id
, inventory.name
, count(offers.id) as committed_inventory
FROM inventory
LEFT JOIN items
ON items.inventory_id = inventory.id
LEFT JOIN offer
ON offer.id = items.offer_id and
(offer.starts_at <= NOW() or
offer.expires_at > NOW()
)
GROUP BY inventory.id, inventory.name;
The problem is that you get a matching offer, but it isn't currently valid. So, the where clause fails because the offer dates are not NULL (there is a match) and the date comparison fails because the offer is not current ly.
For item 3 the starts_at from offer table is set to March, 01 2014 which is greater than NOW so (offer.starts_at IS NULL OR offer.starts_at <= NOW()) condition will skip the item 3 record
See fiddle demo

Select based on some default value for group by having

I have an SQL table that contains the names of people and respective country codes.
----------------
name | code
----------------
saket | IN
rohan | US
samules | AR
Geeth | CH
Vikash | IN
Rahul | IN
Ganesh | US
Zorro | US
What I wanted was that, I should able to get rows group by country code having names starting with sa first, if not then Vi even if not then last row of the group.
When I tried this
SELECT * FROM MyTable GROUP BY code HAVING name like 'sa%' or name like 'vi%';
But its give me rows who matched with the above condition in having clause.
I want that if condition fails then give me the last row of that group, Is it possible?.
If possible, then how?
Maybe not very efficient, but try:
SELECT FIRST(`name`) AS `name`, `code` FROM (
SELECT `name`, `code` FROM `MyTable`
WHERE `name` LIKE 'sa%'
UNION ALL
SELECT `name`, `code` FROM `MyTable`
WHERE `name` LIKE 'vi%'
UNION ALL
SELECT LAST(`name`) AS `name`, `code` FROM `MyTable` GROUP BY `code`
HAVING `name` NOT LIKE 'sa%' AND `name` NOT LIKE `vi%'
) AS `a` GROUP BY `code`
You can try this query. It returns what you need, but be aware - this query has two pitfalls:
Subquery is a pain on 10^6 rows
Field name in outer query is nonaggregated. MySQL documentation says that is is impossible to say what value will be selected for nonaggregated.
http://dev.mysql.com/doc/refman/5.7/en/group-by-extensions.html
select name, country
from
(
select *, if(name like 'sa%', 0, if(name like 'vi%', 2, 3) ) as name_order
from tmp_names
order by country, name_order, name desc
) as tmp_names
group by country
order by name;
It returns
+---------+---------+
| name | country |
+---------+---------+
| Geeth | CH |
| saket | IN |
| samules | AR |
| Zorro | US |
+---------+---------+

Mysql unique values query

I have a table with name-value pairs and additional attribute. The same name can have more than one value. If that happens I want to return the row which has a higher attribute value.
Table:
ID | name | value | attribute
1 | set1 | 1 | 0
2 | set2 | 2 | 0
3 | set3 | 3 | 0
4 | set1 | 4 | 1
Desired results of query:
name | value
set2 | 2
set3 | 3
set1 | 4
What is the best performing sql query to get the desired results?
the best performing query would be as follows:
select
s.set_id,
s.name as set_name,
a.attrib_id,
a.name as attrib_name,
sav.value
from
sets s
inner join set_attribute_values sav on
sav.set_id = s.set_id and sav.attrib_id = s.max_attrib_id
inner join attributes a on sav.attrib_id = a.attrib_id
order by
s.set_id;
+--------+----------+-----------+-------------+-------+
| set_id | set_name | attrib_id | attrib_name | value |
+--------+----------+-----------+-------------+-------+
| 1 | set1 | 3 | attrib3 | 20 |
| 2 | set2 | 0 | attrib0 | 10 |
| 3 | set3 | 0 | attrib0 | 10 |
| 4 | set4 | 4 | attrib4 | 10 |
| 5 | set5 | 2 | attrib2 | 10 |
+--------+----------+-----------+-------------+-------+
obviously for this to work you're gonna also have to normalise your design and implement a simple trigger:
drop table if exists attributes;
create table attributes
(
attrib_id smallint unsigned not null primary key,
name varchar(255) unique not null
)
engine=innodb;
drop table if exists sets;
create table sets
(
set_id smallint unsigned not null auto_increment primary key,
name varchar(255) unique not null,
max_attrib_id smallint unsigned not null default 0,
key (max_attrib_id)
)
engine=innodb;
drop table if exists set_attribute_values;
create table set_attribute_values
(
set_id smallint unsigned not null,
attrib_id smallint unsigned not null,
value int unsigned not null default 0,
primary key (set_id, attrib_id)
)
engine=innodb;
delimiter #
create trigger set_attribute_values_before_ins_trig
before insert on set_attribute_values
for each row
begin
update sets set max_attrib_id = new.attrib_id
where set_id = new.set_id and max_attrib_id < new.attrib_id;
end#
delimiter ;
insert into attributes values (0,'attrib0'),(1,'attrib1'),(2,'attrib2'),(3,'attrib3'),(4,'attrib4');
insert into sets (name) values ('set1'),('set2'),('set3'),('set4'),('set5');
insert into set_attribute_values values
(1,0,10),(1,3,20),(1,1,30),
(2,0,10),
(3,0,10),
(4,4,10),(4,2,20),
(5,2,10);
This solution will probably perform the best:
Select ...
From Table As T
Left Join Table As T2
On T2.name = T.name
And T2.attribute > T1.attribute
Where T2.ID Is Null
Another solution which may not perform as well (you would need to evaluate against your data):
Select ...
From Table As T
Where Not Exists (
Select 1
From Table As T2
Where T2.name = T.name
And T2.attribute > T.attribute
)
select name,max(value)
from table
group by name
SELECT name, value
FROM (SELECT name, value, attribute
FROM table_name
ORDER BY attribute DESC) AS t
GROUP BY name;
There is no easy way to do this.
A similar question was asked here.
Edit: Here's a suggestion:
SELECT `name`,`value` FROM `mytable` ORDER BY `name`,`attribute` DESC
This isn't quite what you asked for, but it'll at least give you the higher attribute values first, and you can ignore the rest.
Edit again: Another suggestion:
If you know that value is a positive integer, you can do this. It's yucky, but it'll work.
SELECT `name`,CAST (GROUP_CONCAT(`value` ORDER by `attribute` DESC) as UNSIGNED) FROM `mytable` GROUP BY `name`
To include negative integers you could change UNSIGNED to SIGNED.
Might want to benchmark all these options, here's another one.
SELECT t1.name, t1.value
FROM temp t1
WHERE t1.attribute IN (
SELECT MAX(t2.attribute)
FROM temp t2
WHERE t2.name = t1.name);
How about:
SELECT ID, name, value, attribute
FROM table A
WHERE A.attribute = (SELECT MAX(B.attribute) FROM table B WHERE B.NAME = A.NAME);
Edit: Seems like someones said the same already.
Did not benchmark them, but here is how it is doable:
TableName = temm
1) Row with maximum value of attribute :
select t.name, t.value
from (
select name, max(attribute) as maxattr
from temm group by name
) as x inner join temm as t on t.name = x.name and t.attribute = x.maxattr;
2) Top N rows with maximum attribute value :
select name, value
from temm
where (
select count(*) from temm as n
where n.name = temm.name and n.attribute > temm.attribute
) < 1 ; /* 1 can be changed to 2,3,4 ..N to get N rows */