Hierarchical SQL Query - mysql

I'm working on a simple CMS system for which I have a database with the following tables:
Items
Contents
Langs
The Items table has the following structure:
itemId
name (for semantic reasons)
type
parent (foreign key to itemId)
An item can be either a document or a section type. A section is a piece of content on a document which is linked to it via the parent collumn. But also a document can have a parent which makes it a subpage.
Now I get stuck on making a query to fetch all the items from the database hierarchically. So something like this:
documentId => name
metaDescription => language => meta
sections => sectionId => language => title
content
uri
subPages => documentId => name
metaDescription
sections => etc...
Just to clarify, a website can have multiple languages which are in the Langs table and every language is linked to a piece of content in the Contents table which is also linked to an item in the Items table. The metaDescription is the linked content collumn linked to a item of type document.
Is there a way to do this with one query? This was my first attempt, but it doesnt work for subPages:
SELECT
documents.itemId AS id,
documents.name AS documentName,
documents.lastModified AS lastModified,
meta.content AS metaDescription,
meta.uri AS documentUri,
sections.itemId AS sectionId,
sections.name AS sectionName,
sections.lastModified AS sectionLastModified,
contents.name AS sectionTitle,
contents.content AS sectionContent,
contents.uri AS contentUri,
contents.lastModified AS contentLastModified,
langs.name AS contentLang
FROM
SITENAME_kw_items AS documents
INNER JOIN
SITENAME_kw_contents AS meta
ON documents.itemId = meta.itemId
INNER JOIN
SITENAME_kw_items AS sections
ON sections.parent = documents.itemId
INNER JOIN
SITENAME_kw_contents AS contents
ON sections.itemId = contents.itemId
INNER JOIN
SITENAME_kw_langs AS langs
ON langs.langId = contents.langId
Sorry for the long question. Hope you guys can help!

Below is how I do it in "our" DMS (recursive CTE), which is Adam Gent's suggestion expanded.
Note that I just see one could use COALESCE instead of nesting ISNULL.
The order by you would do according to the breadcrumbs (here Bez_Path or UID_Path).
A far better way would be to use a closure-table architecture.
See here:
http://dirtsimple.org/2010/11/simplest-way-to-do-tree-based-queries.html
and here:
http://www.mysqlperformanceblog.com/2011/02/14/moving-subtrees-in-closure-table/
The closure table also has the advantage that it works on MySQL, where CTE & recursion are not supported.
Also note that closure tables are much better (and simpler and faster to query) than recursion.
Also think about symlinks in such a structure.
The something_UID, something_parent_UID pattern (as shown below) is almost always an antipattern.
CREATE VIEW [dbo].[V_DMS_Navigation_Structure]
AS
SELECT
NAV_UID
,NAV_Typ
,NAV_Parent_UID
,NAV_Stufe
,NAV_ApertureKey
,NAV_Nr
--,NAV_Bemerkung
,NAV_Status
,NAV_Referenz
,ISNULL(PJ_Bezeichnung, ISNULL(FO_Bezeichnung, DOC_Bezeichnung + '.' + DOC_Dateiendung) ) AS NAV_Bezeichnung
,NAV_PJ_UID
,NAV_FO_UID
,NAV_DOC_UID
,ISNULL(NAV_PJ_UID, ISNULL(NAV_FO_UID,NAV_DOC_UID)) AS NAV_OBJ_UID
FROM T_DMS_Navigation
LEFT JOIN T_DMS_Projekt
ON T_DMS_Projekt.PJ_UID = T_DMS_Navigation.NAV_PJ_UID
LEFT JOIN T_DMS_Folder
ON T_DMS_Folder.FO_UID = T_DMS_Navigation.NAV_FO_UID
LEFT JOIN T_DMS_Dokument
ON T_DMS_Dokument.DOC_UID = T_DMS_Navigation.NAV_DOC_UID
CREATE VIEW [dbo].[V_DMS_Navigation_Structure_Path]
AS
WITH Tree
(
NAV_UID
,NAV_Bezeichnung
,NAV_Parent_UID
,Depth
,Sort
,Bez_Path
,UID_Path
,PJ_UID
,FO_UID
,DOC_UID
,OBJ_UID
)
AS
(
SELECT
NAV_UID
,NAV_Bezeichnung
,NAV_Parent_UID
,0 AS Depth
,CAST('0' AS varchar(10)) AS Sort
,CAST(NAV_Bezeichnung AS varchar(4000)) AS Bez_Path
,CAST(NAV_OBJ_UID AS varchar(4000)) AS UID_Path
,NAV_PJ_UID AS PJ_UID
,NAV_FO_UID AS FO_UID
,NAV_DOC_UID AS DOC_UID
,NAV_OBJ_UID AS OBJ_UID
FROM V_DMS_Navigation_Structure
WHERE NAV_Parent_UID IS NULL
UNION ALL
SELECT
CT.NAV_UID
,CT.NAV_Bezeichnung
,CT.NAV_Parent_UID
,Parent.Depth + 1 AS Depth
,CONVERT(varchar(10), Parent.Sort + '.' + CAST(Parent.Depth + 1 AS varchar(10))) AS Sort
,CONVERT(varchar(4000), Parent.Bez_Path + '\' + CAST(CT.NAV_Bezeichnung AS varchar(1000))) AS Bez_Path
,CONVERT(varchar(4000), Parent.UID_Path + '\' + CAST(CT.NAV_OBJ_UID AS varchar(1000))) AS UID_Path
,NAV_PJ_UID AS PJ_UID
,NAV_FO_UID AS FO_UID
,NAV_DOC_UID AS DOC_UID
,NAV_OBJ_UID AS OBJ_UID
FROM V_DMS_Navigation_Structure CT
INNER JOIN Tree AS Parent
ON Parent.NAV_UID = CT.NAV_Parent_UID
)
SELECT TOP 999999999999999 * FROM Tree
ORDER BY Depth

The short answer is that you can't really do this with RDBMS. The long answer is you can sort of do it either programmatically (N+1 select) or you can use common table expressions (CTE).
The other option is to cheat and use a depth column as a hint for an order by.

Related

How to retrieve, values, names and group names in one SQL query

I used to have an EAV shema with 4 tables in MySQl 5.7:
articles
article attributes
attribute names
attribute group names
After running into huge complexity, I learned from another question that this is not a good shema. So I got rid of table 2 where all the attributes have been stored and saved them either with values or value_ids directly into table one, as the STI model suggests.
Now I ended up with 3 tables:
articles
attribute names
attribute group names
At first it looked like it made my live easier, but while trying to replace a simple query that was getting all attribute group names and attribute names of a specific article I figured that this is also not ideal.
My previous query looked like this:
SELECT
cag.name_de,
cag.attr_group_id,
attr.attr_de,
attr.attr_id
FROM
articles_attr aa,
cat_attr attr,
cat_attr_groups cag
WHERE
aa.article_id = '181206'
AND aa.attr_id = attr.attr_id
AND cag.attr_group_id = attr.attr_group_id
Now with the new schema, I would need at least this:
Get all group names like e.g. "color"
SELECT
name_de,
attr_group_id
FROM
cat_attr_groups
Get all indirect values which have an ID like e.g. "green"
SELECT
attr.attr_group_id,
attr.attr_de
FROM
articles a,
cat_attr attr
WHERE
a.article_id = '181206'
AND (
(a.dial_c_id = attr.attr_id)
OR (a.dial_n_id = attr.attr_id)
OR (a.bracelet_color_id = attr.attr_id)
)
// pseudo code
$attr[$row->attr_group_id] = $row->attr_de;
Get all direct values:
SELECT
jewels,
vibrations
FROM
articles a
WHERE
a.article_id = '181206'
// pseudo code
$attr[4] = $row->jewels;
Map group names with group ids
foreach($attr AS $key => $value){
// somehow
}
This does not seem to be very elegant. How could I design my shema better or how could those queries be rewritten to retrieve the values in an acceptable query time?

Sql Result IN a Query

dont blame for the database design.I am not its database architect. I am the one who has to use it in current situation
I hope this will be understandable.
I have 3 tables containing following data with no foreign key relationship b/w them:
groups
groupId groupName
1 Admin
2 Editor
3 Subscriber
preveleges
groupId roles
1 1,2
2 2,3
3 1
roles
roleId roleTitle
1 add
2 edit
Query:
SELECT roles
from groups
LEFT JOIN preveleges ON (groups.groupId=preveleges.groupId)
returns specific result i.e roles.
Problem: I wanted to show roleTitle instead of roles in the above query.
I am confused how to relate table roles with this query and returns required result
I know it is feasible with coding but i want in SQL.Any suggestion will be appreciated.
SELECT g.groupName,
GROUP_CONCAT(r.roleTitle
ORDER BY FIND_IN_SET(r.roleId, p.roles))
AS RoleTitles
FROM groups AS g
LEFT JOIN preveleges AS p
ON g.groupId = p.groupId
LEFT JOIN roles AS r
ON FIND_IN_SET(r.roleId, p.roles)
GROUP BY g.groupName ;
Tested at: SQL-FIDDLE
I would change the data structure it self. Since It's not normalised, there are multiple elements in a single column.
But it is possible with SQL, if for some (valid) reason you can't change the DB.
A simple "static" solution:
SELECT REPLACE(REPLACE(roles, '1', 'add'), '2', 'edit') from groups
LEFT JOIN preveleges ON(groups.groupId=preveleges.groupId)
A more complex but still ugly solution:
CREATE FUNCTION ReplaceRoleIDWithName (#StringIds VARCHAR(50))
RETURNS VARCHAR(50)
AS
BEGIN
DECLARE #RoleNames VARCHAR(50)
SET #RoleNames = #StringIds
SELECT #RoleNames = REPLACE(#RoleNames, CAST(RoleId AS VARCHAR(50)), roleTitle)
FROM roles
RETURN #RoleNames
END
And then use the function in the query
SELECT ReplaceRoleIDWithName(roles) from groups
LEFT JOIN preveleges ON(groups.groupId=preveleges.groupId)
It is possible without function, but this is more readable. Made without editor so it's not tested in anyway.
You also tagged the question with PostgreSQL and it's actually quite easy with Postgres to work around this broken design:
SELECT grp.groupname, r.roletitle
FROM groups grp
join (
select groupid, cast(regexp_split_to_table(roles, ',') as integer) as role_id
from privileges
) as privs on privs.groupid = grp.groupid
join roles r on r.roleid = privs.role_id;
SQLFiddle: http://sqlfiddle.com/#!12/5e87b/1
(Note that I changed the incorrectly spelled name preveleges to the correct spelling privileges)
But you should really, really re-design your data model!
Fixing your design also enables you to define foreign key constraints and validate the input. In your current model, the application would probably break (just as my query would), if someone inserted the value 'one,two,three' into the roles table.
Edit
To complete the picture, using Postgres's array handling the above could be slightly simplified using a similar approach as MySQL's find_in_set()
select grp.groupname, r.roletitle
from groups grp
join privileges privs on grp.groupid = privs.groupid
join roles r on r.roleid::text = any (string_to_array(privs.roles, ','))
In both cases if all role titles should be shown as a comma separated list, the string_agg() function could be used (which is equivalent to MySQL's group_concat()
select grp.groupname, string_agg(r.roletitle, ',')
from groups grp
join privileges privs on grp.groupid = privs.groupid
join roles r on r.roleid::text = any (string_to_array(privs.roles, ','))
group by grp.groupname

NHibernate INNER JOIN on a SubQuery

I would like to do a subquery and then inner join the result of that to produce a query. I want to do this as I have tested an inner join query and it seems to be far more performant on MySql when compared to a straight IN subquery.
Below is a very basic example of the type of sql I am trying to reproduce.
Tables
ITEM
ItemId
Name
ITEMRELATIONS
ItemId
RelationId
Example Sql I would Like to create
Give me the COUNT of RELATIONs for ITEMs having a name of 'bob':
select ir.itemId, count(ir.relationId)
from ItemRelations ir
inner join (select itemId from Items where name = 'bob') sq
on ir.itemId = sq.itemId
group by ir.itemId
The base Nhibernate QueryOver
var bobItems = QueryOver.Of<Item>(() => itemAlias)
.Where(() => itemAlias.Name == "bob")
.Select(Projections.Id());
var bobRelationCount = session.QueryOver<ItemRelation>(() => itemRelationAlias)
.Inner.Join(/* Somehow join the detached criteria here on the itemId */)
.SelectList(
list =>
list.SelectGroup(() => itemRelationAlias.ItemId)
.WithAlias(() => itemRelationCountAlias.ItemId)
.SelectCount(() => itemRelationAlias.ItemRelationId)
.WithAlias(() => itemRelationCountAlias.Count))
.TransformUsing(Transformers.AliasToBean<ItemRelationCount>())
.List<ItemRelationCount>();
I know it may be possible to refactor this into a single query, however the above is merely as simple example. I cannot change the detached QueryOver, as it is handed to my bit of code and is used in other parts of the system.
Does anyone know if it is possible to do an inner join on a detached criteria?
MySql 5.6.5 has addressed the performance issue related to the query structure.
See here: http://bugs.mysql.com/bug.php?id=42259
No need for me to change the output format of my NHibernate queries anymore. :)

SQL issue: one to many relationship and EAV model

Good evening guys,
I'm a newbie to web programming and I need your help to solve a problem inherent to SQL query.
The database engine I'm using is MySQL and I access it via PHP, here I'll explain a simplified version of my database, just to fix ideas.
Let's suppose to work with a database containing three tables: teams, teams_information, attributes. More precisely:
1) teams is a table containing some basic information about italian football teams (soccer, not american football :D), it is formed by three fields: 'id' (int, primary key), 'name' (varchar, team name), nickname (Varchar, team nickname);
2) attributes is a table containing a list of possible information about a football team, such as city (the city where team plays its home match), captain (team captain's fullname), f_number (number of fans) and so on. This table is formed by three fields: id (int, primary key), attribute_name (varchar, an identifier for the attribute), attribute_desc (text, an explanation of the meaning of attribute). Each record of this table represents a single possible attribute of a football team;
3) teams_information is a table where some information, about teams listed in team table, are available. This table contains three fields: id (int, primary key), team_id (int, a foreign key which identifies a team), attribute_id (int, a foreign key which identifies one of the attributes listed in attributes table), attribute_value (varchar, the value of the attribute). Each record represents a single attribute of a single team. In general, different teams will have a different number of information, so for some teams a large number of attributes will be available while for other teams only a small number of attributes will be available.
Note that relation between teams and teams_information is one to many and the same relation exists between attributes and teams_information
Well, given this model my purpose is to realize a grid (maybe with ExtJS 4.1) to show user the list of italian football team, each record of this grid will represent a single football team and will contain all possible attributes: some fields may be empty (because, for considered team, the correspondent attribute is unknown), while the others will contain the values stored in teams_information table (for the considered team).
According to the above grid's field are: id, team_name and a number of fields to represent all the different attributes listed in 'attributes' table.
My question is: can I realize such a grid by using a SINGLE SQL query (maybe a proper SELECT query, to fetch all data I need from database tables) ?
Can anyone suggest me how to write a similar query (if it exists) ?
Thanks in advance for helping me.
Regards.
Enrico.
The short answer to your question is no, there is no simple construct in MySQL to achieve the result set you are looking for.
But it is possible to carefully (painstakingly) craft such a query. Here is an example, I trust you will be able to decipher it. Basically, I'm using correlated subqueries in the select list, for each attribute I want returned.
SELECT t.id
, t.name
, t.nickname
, ( SELECT v1.attribute_value
FROM team_information v1
JOIN attributes a1
ON a1.id = v1.attribute_id AND a1.attribute_name = 'city'
WHERE v1.team_id = t.id ORDER BY 1 LIMIT 1
) AS city
, ( SELECT v2.attribute_value
FROM team_information v2 JOIN attributes a2
ON a2.id = v2.attribute_id AND a2.attribute_name = 'captain'
WHERE v2.team_id = t.id ORDER BY 1 LIMIT 1
) AS captain
, ( SELECT v3.attribute_value
FROM team_information v3 JOIN attributes a3
ON a3.id = v3.attribute_id AND a3.attribute_name = 'f_number'
WHERE v3.team_id = t.id ORDER BY 1 LIMIT 1
) AS f_number
FROM teams t
ORDER BY t.id
For 'multi-valued' attributes, you'd have to pull each instance of the attribute separately. (Use the LIMIT to specify whether you are retrieving the first one, the second one, etc.)
, ( SELECT v4.attribute_value
FROM team_information v4 JOIN attributes a4
ON a4.id = v4.attribute_id AND a4.attribute_name = 'nickname'
WHERE v4.team_id = t.id ORDER BY 1 LIMIT 0,1
) AS nickname_1st
, ( SELECT v5.attribute_value
FROM team_information v5 JOIN attributes a5
ON a5.id = v5.attribute_id AND a5.attribute_name = 'nickname'
WHERE v5.team_id = t.id ORDER BY 1 LIMIT 1,1
) AS nickname_2nd
, ( SELECT v6.attribute_value
FROM team_information v6 JOIN attributes a6
ON a6.id = v6.attribute_id AND a6.attribute_name = 'nickname'
WHERE v6.team_id = t.id ORDER BY 1 LIMIT 2,1
) AS nickname_3rd
I use nickname as an example here, because American soccer clubs frequently have more than one nickname, e.g. Chicago Fire Soccer Club has nicknames: 'The Fire', 'La Máquina Roja', 'Men in Red', 'CF97', et al.)
NOT AN ANSWER TO YOUR QUESTION, BUT ...
Have I mentioned numerous times before, how much I dislike working with EAV database implementations? What should IMO be a very simple query turns into an overly complicated beast of a potentially light dimming query.
Wouldn't it be much simpler to create a table where each "attribute" is a separate column? Then queries to return reasonable result sets would look more reasonable...
SELECT id, name, nickname, city, captain, f_number, ... FROM team
But what really makes me shudder is the prospect that some developer is going to decide that the LDQ should be "hidden" in the database as a view, to enable the "simpler" query.
If you go this route, PLEASE PLEASE PLEASE resist any urge you may have to store this query in the database as a view.
I'm going to take a slightly different route. Spencer's answer is fantastic, and it addresses the issue quite well, but there's still a large underlying problem.
The data that you are trying to display on the site is over-normalized in the database. I won't elaborate, since, again, Spencer's answer highlights the issue pretty well.
Rather, I'd like to recommend a solution that denormalizes the data a bit.
Convert all of your Team data into a single table with many columns. (If there is Player data that isn't covered in the question, that would be a second table, but I'll gloss over that for now.)
Sure, you'll have a whole bunch of columns, and a lot of the columns might be NULL for a lot of the rows. It's not normalized, and it's not pretty, but here's the huge advantage that you gain.
Your query becomes:
SELECT * FROM Teams
That's it. That gets displayed right to the website and you are done. You might have to go out of your way to realize this schema, but it would be totally worth the time investment.
I think what you're saying is that you want the rows in the attributes table to appear as columns in the result recordset. If this is correct, then then in SQL you would use PIVOT.
A quick search on SO seems to indicate that there is no PIVOT equivalent in MySql.
I wrote a simple PHP script to generalize spencer's idea to solve my issue.
Here's the code:
<?php
require_once('includes/db.config.php'); //this file performs connection to mysql
/*
* Following function requires a table name ($table)
* and a number of service fields ($num). Given those parameters
* it returns the number of table fields (excluding service fields).
*/
function get_fields_number($table,$num,$conn)
{
$query = "SELECT * FROM $table";
$result = mysql_query($query,$conn);
return mysql_num_fields($result)-$num; //remember there are $num service fields
}
/*
* Following function requires a table name ($table) and an array
* containing a list of service fields names. Given those parameters,
* it returns the list of field names. That list is contained within an array and
* service fields are excluded.
*/
function get_fields_name($table,$service,$conn)
{
$query = "SELECT * FROM $table";
$result = mysql_query($query,$conn);
$name = array(); //Array to be returned
for ($i=0;$i<mysql_num_fields($result);$i++)
{
if(!in_array(mysql_field_name($result,$i),$service))
{
//currently selected field is not a service field
$name[] = mysql_field_name($result,$i);
}
}
return $name;
}
//Below $conn is db connection created in 'db.config.php'
$query = "SELECT `name` FROM `detail_arg` WHERE visibility = 0";
$res = mysql_query($query,$conn);
if($res===false)
{
$err_msg = mysql_real_escape_string(mysql_error($conn));
echo "{success:false,data:'".$err_msg."'}";
die();
}
$arg = array(); //list of argument names
while($row = mysql_fetch_assoc($res))
{
$arg[] = $row['name'];
}
//Following function writes the select subquery which is
//necessary to build a column containing a single attribute.
function make_subquery($attribute) //$attribute contains attribute name
{
$query = "";
$query.="(SELECT incident_detail.arg_value ";
$query.="FROM incident_detail ";
$query.="INNER JOIN detail_arg ";
$query.="ON incident_detail.arg_id = detail_arg.id AND detail_arg.name='".$attribute."' ";
$query.="WHERE incident.id = incident_detail.incident_id) ";
$query.="AS $attribute";
return $query;
}
/*
echo make_subquery("date"); //debug code
*/
$subquery = array(); //list of subqueries
for($i=0;$i<count($arg);$i++)
{
$subquery[] = make_subquery($arg[$i]);
}
$query = "SELECT "; //final query containing subqueries
$fields = get_fields_name("incident",array("id","visibility"),$conn);
//list of 'incident' table's fields
for($i=0;$i<count($fields);$i++)
{
$query.="incident.".$fields[$i].", ";
}
//insert the subqueries
$sub = implode($subquery,", ");
$query .= $sub;
$query.=" FROM incident ORDER BY incident.id";
echo $query;
?>

How to get the intersection of two groups of items in the most efficient way?

I need to store a group of "tags" applied to objects. I was thinking of using something along the lines of :
But in order to implement gmail's "Implicit Social Graph" algorithm (see this question), I need to be able to search for groups of tags that contain one or more specific tags.
So I guess my question is how to get the intersection of two groups of items, in mysql, in the most efficient way ?
find all sets containing one specific tag (given the value):
select tags_sets_id
from tags_has_sets, tags
where value = 'foo'
and tags_id = id;
find all sets containing either of two (or more) specific tags (given the values):
select distinct tags_sets_id
from tags_has_sets, tags
where value in ('foo', 'bar')
and tags_id = id;
find all sets containing both of exactly two specific tags:
select t1.tags_sets_id
from tags_has_sets t1, tags tags1,
tags_has_sets t2, tags tags2
where tags1.value = 'foo'
and tags2.value = 'bar'
and t1.tags_id = tags1.id
and t2.tags_id = tags2.id
and t1.tags_sets_id = t2.tags_sets_id;
note that the last solution doesn't generalize, but you could conceivably build a generalized algorithm to generate an n-joined sql statement on the fly.
here is one last implementation that does generalize, although i don't know its performance characteristics compared to the generated-join way (thanks to #ypercube for an excellent enhancement to my initial suggestion):
select tags_sets_id
from tags_has_sets, tags
where value in ('foo', 'bar', 'baz')
and id = tags_id
group by tags_sets_id
having count(*) = 3;
-- formerly: having group_concat(distinct value order by value)
-- ='bar,baz,foo';