Get results from a MySQL database recursively - mysql

Consider the following:
Table - id, parentid
What I'd like to do, is I'd like to pull all the children (not only direct children, but all of them, i.e. children of children of children etc.) of a specific parent.
So let's say the table contains the following row: (2, 1), (3, 1), (4, 2), (5, 4)
Then for parentid = 1, the table would return ids 2, 3, 4 AND 5.
Is this possible?
If not (and I guess it's indeed not possible), what are my options?
I really don't want to use dozens of queries...
P.S. I can't change the database structure.
Also, as there might be hundreds of thousands of records in the table, I can pull them all and do the whole thing using PHP instead.

This might help:
$parentId = 1; // the parent id
$arrAllChild = Array(); // array that will store all children
while (true) {
$arrChild = Array(); // array for storing children in this iteration
$q = 'SELECT `id` FROM `table` WHERE `parentid` IN (' . $parentId . ')';
$rs = mysql_query ($q);
while ($r = mysql_fetch_assoc($rs)) {
$arrChild[] = $r['id'];
$arrAllChild[] = $r['id'];
}
if (empty($arrChild)) { // break if no more children found
break;
}
$parentId = implode(',', $arrChild); // generate comma-separated string of all children and execute the query again
}
print_r($arrAllChild);
You may as well use recursion to do so but I think the above will need fewer iterations.
Hope it helps!
EDIT - I forgot to mention that you can as well implement the same logic in a MySQL stored procedure except that you cant use Arrays. The above example is implemented in PHP as you might have already guessed

not in one step.
I have done recursive queries in MySQL using PHP... looping through one level, collecting the data, modifying the query to use the results brought back in the last iteration, running the query again, etc.
Mysql is not very friendly for this sort of thing. MSSQL, Oracle, or PostgreSQL support it in singular query format.

Here is a query I wrote just now for a similar problem:
select if(e.id is not null, e.id, if(d.id is not null, d.id, if(c.id is not null, c.id, if(b.id is not null, b.id, a.id)))) as ID
from groups a
left join groups b on b.parent = a.id
left join groups c on c.parent = b.id
left join groups d on d.parent = c.id
left join groups e on e.parent = d.id
where a.parent = SOMETOPLEVELPARENTIDHERE;
This approach does have a fixed depth limit. I know from my own data that it happens to span at most five levels of depth. If the depth is fairly stable you can accommodate some growth by simply adding more left joins. Also, not sure how the query will perform with hundreds of thousands of records.

Related

"NOT IN" for Active Record

I have a MySQL query that I am trying to chain a "NOT IN" at the end of it.
Here is what it looks like in ruby using Active Record:
not_in = find_by_sql("SELECT parent_dimension_id FROM relations WHERE relation_type_id = 6;").map(&:parent_dimension_id)
joins('INNER JOIN dimensions ON child_dimension_id = dimensions.id')
.where(relation_type_id: model_relation_id,
parent_dimension_id: sub_type_ids,
child_dimension_id: model_type)
.where.not(parent_dimension_id: not_in)
So the SQL query I'm trying to do looks like this:
INNER JOIN dimensions ON child_dimension_id = dimensions.id
WHERE relations.relation_type_id = 5
AND relations.parent_dimension_id
NOT IN(SELECT parent_dimension_id FROM relations WHERE relation_type_id = 6);
Can someone confirm to me what I should use for that query?
do I chain on where.not ?
If you really do want
SELECT parent_dimension_id
FROM relations
WHERE relation_type_id = 6
as a subquery, you just need to convert that SQL to an ActiveRecord relation:
Relation.select(:parent_dimension_id).where(:relation_type_id => 6)
then use that as a value in a where call the same way you'd use an array:
not_parents = Relation.select(:parent_dimension_id).where(:relation_type_id => 6)
Relation.joins('...')
.where(relation_type_id: model_relation_id, ...)
.where.not(parent_dimension_id: not_parents)
When you use an ActiveRecord relation as a value in a where and that relation selects a single column:
r = M1.select(:one_column).where(...)
M2.where(:column => r)
ActiveRecord is smart enough to inline r's SQL as an in (select one_column ...) rather than doing two queries.
You could probably replace your:
joins('INNER JOIN dimensions ON child_dimension_id = dimensions.id')
with a simpler joins(:some_relation) if your relations are set up too.
You can feed where clauses with values or arrays of values, in which case they will be translated into in (?) clauses.
Thus, the last part of your query could contain a mapping:
.where.not(parent_dimension_id:Relation.where(relation_type_id:6).map(&:parent_dimension_id))
Or you can prepare a statement
.where('parent_dimension_id not in (?)', Relation.where(relation_type_id:6).map(&:parent_dimension_id) )
which is essentially exactly the same thing

Conditional order by in MYSQL. Should affect part of row

The left table is a result of my query. And I need to sort it as the right table.
I need to order by p_id, if level >= 2. The blue box of right table is a target of order by.
Is it possible? Of course it is an example. Actual data is hundreds and really need to be sorted.
I searched a lot, but coudln't find the same case.
edit : this table will be returned as java.util.ArrayList. If this kind of 'order by' is not possilbe, is it possible in java.util.ArrayList?
I'm sure it's not possible in one query in MySQL.
In your diagram on the right, the ordering has been done in two separate steps:
Sort by id
Sort each block by p_id if level >= 2
That's quite difficult to do in MySQL as you would need to identify the blocks and then iterate over them, sorting each block separately.
I've done something similar where ordering within blocks was required and then selecting from those ordered blocks. You can view that here but as I said, I think that that SQL code is horribly complicated involving 5 temporary tables. You would probably need fewer temp tables, but it would still be a very complicated procedure, quite slow and hard to maintain.
"Actual data is hundreds and really need to be sorted."
Is there any reason why you can't just sort it as you want in code?
$blockStart = FALSE;
$count = 0;
foreach($dataArray as $data){
if($blockStart === FALSE){
$blockStart = $count;
}
if($data['level'] < 2){ //Block has finished
sortBlock($dataArray, $blockStart, $count);
$blockStart = $count;
}
$count++;
}
sortBlock($dataArray, $blockStart, $count - 1);
function sortBlock($dataArray, $indexStart, $indexEnd){
//Sort the elements of $dataArray, between $indexStart and $indexEnd inclusive
//by the value of p_id
}
Trying to solve a general programming problem in MySQL when you could solve it in 1/10th of the programmer time (and probably have it perform faster as well) in Java is not a good path to follow.
It is possible to do this in SQL, but it would be a very, very complicated query in MySQL. Here is the approach.
(1) Create a subquery that has the original ids and an indicator of whether something is in level 2 or not. The ids in this table are going to define the final order.
(2) Next, create a separate counter for each group in this above table. In other databases, you would use row_number(). In my SQL, this requires a correlated subquery. This provides the mapping from id to the new ordering.
(3) Next, create a counter for each group, but this time with the needed order (by id for the non-level2 group, by your rules for ordering).
(4) Join the tables together to get the matching.
(5) Order by the original id.
Here is an attempt:
select altord.*
from (select t.*,
(select count(*) from t t2 where t2.id <= t.id and ((t2.level = 2 and t1.level = 2) or (t2.level <> 2 and t1.level <> 2))
) as seqnum
from t
) ord join
(select t.*,
(select count(*) from t t2 where (t2.id <= t.id and t2.level <> 2 and t.level <> 2) or (t2.level = 2 and t.level = 2 and (t2.pid < t.pid or t2.pid = t.pid and t2.id < t.id)))
) as seqnum
) altord
on ord.seqnum = altord.seqnum
order by ord.id
I'm not sure if this SQL is correct, but the idea can be implemented in a single query.

Per-row dynamic sql

I have a database representing something like a bookstore. There's a table containing the categories that books can be in. Some categories are defined simply using another table that contains the category-item relationships. But there are also some categories that can be defined programmatically -- a category for a specific author can be defined using a query (SELECT item_id FROM items WHERE author = "John Smith"). So my categories table has a "query" column; if it's not null, I use this to get the items in the category, otherwise I use the category_items table.
Currently, I have the application (PHP code) make this decision, but this means lots of separate queries when we iterate over all the categories. Is there some way to incorporate this dynamic SQL into a join? Something like:
SELECT c.category, IF(c.query IS NULL, count(i.items), count(EXECUTE c.query)
FROM categories c
LEFT OUTER JOIN category_items i
ON c.category = i.category
EXECUTE requires a prepared statement, but I need to prepare a different statement for each row. Also, EXECUTE can't be used in expressions, it's just a toplevel statement. Suggestions?
What happens when you want to list books by publisher? Country? Language? You'd have to throw them all into a single "category_items" table. How would you pick which dynamic query to execute? The query-within-a-query method is not going to work.
I think your concept of "category" is too broad, which is resulting in overly complicated SQL. I would replace "category" to represent only "genre" (for books). Genres are defined in their own table, and item_genres connects them to the items table. Books-by-author and books-by-genre should just be separate queries at the application level, rather than trying to do them both with the same (sort of) query at the database/SQL level. (If you have music as well as books, they probably shouldn't all be stored in a single "items" table because they're different concepts ... have different genres, author vs. artist, etc.)
I know this does not really solve your problem in the way you'd like, but I think you'll be happier not trying to do it that way.
Here's how I finally ended up solving this in the PHP client.
I decided to just keep the membership in the category_items table, and use the dynamic queries during submission to update this table.
This is the function in my script that's called to update an item's categories during submission or updating. It takes a list of user-selected categories (which can only be chosen from categories that don't have dynamic queries), and using this and the dynamic queries it figures out the difference between the categories that an item is currently in and the ones it should be in, and inserts/deletes as necessary to get them in sync. (Note that the actual table names in my DB are not the same as in my question, I was using somewhat generic terms.)
function update_item_categories($dbh, $id, $requested_cats) {
$data = mysql_check($dbh, mysqli_query($dbh, "select id, query from t_ld_categories where query is not null"), 'getting dynamic categories');
$clauses = array();
while ($row = mysqli_fetch_object($data))
$clauses[] = sprintf('select %d cat_id, (%d in (%s)) should_be_in',
$row->id, $id, $row->query);
if (!$requested_cats) $requested_cats[] = -1; // Dummy entry that never matches cat_id
$requested_cat_string = implode(', ', $requested_cats);
$clauses[] = "select c.id cat_id, (c.id in ($requested_cat_string)) should_be_in
from t_ld_categories c
where member_type = 'lessons' and query is null";
$subquery = implode("\nunion all\n", $clauses);
$query = "select c.cat_id cat_id, should_be_in, (member_id is not null) is_in
from ($subquery) c
left outer join t_ld_cat_members m
on c.cat_id = m.cat_id
and m.member_id = $id";
// printf("<pre>$query</pre>");
$data = mysql_check($dbh, mysqli_query($dbh, $query), 'getting current category membership');
$adds = array();
$deletes = array();
while ($row = mysqli_fetch_object($data)) {
if ($row->should_be_in && !$row->is_in) $adds[] = "({$row->cat_id}, $id)";
elseif (!$row->should_be_in && $row->is_in) $deletes[] = "(cat_id = {$row->cat_id} and member_id = $id)";
}
if ($deletes) {
$delete_string = implode(' or ', $deletes);
mysql_check($dbh, mysqli_query($dbh, "delete from t_ld_cat_members where $delete_string"), 'deleting old categories');
}
if ($adds) {
$add_string = implode(', ', $adds);
mysql_check($dbh, mysqli_query($dbh, "insert into t_ld_cat_members (cat_id, member_id) values $add_string"),
"adding new categories");
}
}

SQL issue: one to many relationship and EAV model

Good evening guys,
I'm a newbie to web programming and I need your help to solve a problem inherent to SQL query.
The database engine I'm using is MySQL and I access it via PHP, here I'll explain a simplified version of my database, just to fix ideas.
Let's suppose to work with a database containing three tables: teams, teams_information, attributes. More precisely:
1) teams is a table containing some basic information about italian football teams (soccer, not american football :D), it is formed by three fields: 'id' (int, primary key), 'name' (varchar, team name), nickname (Varchar, team nickname);
2) attributes is a table containing a list of possible information about a football team, such as city (the city where team plays its home match), captain (team captain's fullname), f_number (number of fans) and so on. This table is formed by three fields: id (int, primary key), attribute_name (varchar, an identifier for the attribute), attribute_desc (text, an explanation of the meaning of attribute). Each record of this table represents a single possible attribute of a football team;
3) teams_information is a table where some information, about teams listed in team table, are available. This table contains three fields: id (int, primary key), team_id (int, a foreign key which identifies a team), attribute_id (int, a foreign key which identifies one of the attributes listed in attributes table), attribute_value (varchar, the value of the attribute). Each record represents a single attribute of a single team. In general, different teams will have a different number of information, so for some teams a large number of attributes will be available while for other teams only a small number of attributes will be available.
Note that relation between teams and teams_information is one to many and the same relation exists between attributes and teams_information
Well, given this model my purpose is to realize a grid (maybe with ExtJS 4.1) to show user the list of italian football team, each record of this grid will represent a single football team and will contain all possible attributes: some fields may be empty (because, for considered team, the correspondent attribute is unknown), while the others will contain the values stored in teams_information table (for the considered team).
According to the above grid's field are: id, team_name and a number of fields to represent all the different attributes listed in 'attributes' table.
My question is: can I realize such a grid by using a SINGLE SQL query (maybe a proper SELECT query, to fetch all data I need from database tables) ?
Can anyone suggest me how to write a similar query (if it exists) ?
Thanks in advance for helping me.
Regards.
Enrico.
The short answer to your question is no, there is no simple construct in MySQL to achieve the result set you are looking for.
But it is possible to carefully (painstakingly) craft such a query. Here is an example, I trust you will be able to decipher it. Basically, I'm using correlated subqueries in the select list, for each attribute I want returned.
SELECT t.id
, t.name
, t.nickname
, ( SELECT v1.attribute_value
FROM team_information v1
JOIN attributes a1
ON a1.id = v1.attribute_id AND a1.attribute_name = 'city'
WHERE v1.team_id = t.id ORDER BY 1 LIMIT 1
) AS city
, ( SELECT v2.attribute_value
FROM team_information v2 JOIN attributes a2
ON a2.id = v2.attribute_id AND a2.attribute_name = 'captain'
WHERE v2.team_id = t.id ORDER BY 1 LIMIT 1
) AS captain
, ( SELECT v3.attribute_value
FROM team_information v3 JOIN attributes a3
ON a3.id = v3.attribute_id AND a3.attribute_name = 'f_number'
WHERE v3.team_id = t.id ORDER BY 1 LIMIT 1
) AS f_number
FROM teams t
ORDER BY t.id
For 'multi-valued' attributes, you'd have to pull each instance of the attribute separately. (Use the LIMIT to specify whether you are retrieving the first one, the second one, etc.)
, ( SELECT v4.attribute_value
FROM team_information v4 JOIN attributes a4
ON a4.id = v4.attribute_id AND a4.attribute_name = 'nickname'
WHERE v4.team_id = t.id ORDER BY 1 LIMIT 0,1
) AS nickname_1st
, ( SELECT v5.attribute_value
FROM team_information v5 JOIN attributes a5
ON a5.id = v5.attribute_id AND a5.attribute_name = 'nickname'
WHERE v5.team_id = t.id ORDER BY 1 LIMIT 1,1
) AS nickname_2nd
, ( SELECT v6.attribute_value
FROM team_information v6 JOIN attributes a6
ON a6.id = v6.attribute_id AND a6.attribute_name = 'nickname'
WHERE v6.team_id = t.id ORDER BY 1 LIMIT 2,1
) AS nickname_3rd
I use nickname as an example here, because American soccer clubs frequently have more than one nickname, e.g. Chicago Fire Soccer Club has nicknames: 'The Fire', 'La Máquina Roja', 'Men in Red', 'CF97', et al.)
NOT AN ANSWER TO YOUR QUESTION, BUT ...
Have I mentioned numerous times before, how much I dislike working with EAV database implementations? What should IMO be a very simple query turns into an overly complicated beast of a potentially light dimming query.
Wouldn't it be much simpler to create a table where each "attribute" is a separate column? Then queries to return reasonable result sets would look more reasonable...
SELECT id, name, nickname, city, captain, f_number, ... FROM team
But what really makes me shudder is the prospect that some developer is going to decide that the LDQ should be "hidden" in the database as a view, to enable the "simpler" query.
If you go this route, PLEASE PLEASE PLEASE resist any urge you may have to store this query in the database as a view.
I'm going to take a slightly different route. Spencer's answer is fantastic, and it addresses the issue quite well, but there's still a large underlying problem.
The data that you are trying to display on the site is over-normalized in the database. I won't elaborate, since, again, Spencer's answer highlights the issue pretty well.
Rather, I'd like to recommend a solution that denormalizes the data a bit.
Convert all of your Team data into a single table with many columns. (If there is Player data that isn't covered in the question, that would be a second table, but I'll gloss over that for now.)
Sure, you'll have a whole bunch of columns, and a lot of the columns might be NULL for a lot of the rows. It's not normalized, and it's not pretty, but here's the huge advantage that you gain.
Your query becomes:
SELECT * FROM Teams
That's it. That gets displayed right to the website and you are done. You might have to go out of your way to realize this schema, but it would be totally worth the time investment.
I think what you're saying is that you want the rows in the attributes table to appear as columns in the result recordset. If this is correct, then then in SQL you would use PIVOT.
A quick search on SO seems to indicate that there is no PIVOT equivalent in MySql.
I wrote a simple PHP script to generalize spencer's idea to solve my issue.
Here's the code:
<?php
require_once('includes/db.config.php'); //this file performs connection to mysql
/*
* Following function requires a table name ($table)
* and a number of service fields ($num). Given those parameters
* it returns the number of table fields (excluding service fields).
*/
function get_fields_number($table,$num,$conn)
{
$query = "SELECT * FROM $table";
$result = mysql_query($query,$conn);
return mysql_num_fields($result)-$num; //remember there are $num service fields
}
/*
* Following function requires a table name ($table) and an array
* containing a list of service fields names. Given those parameters,
* it returns the list of field names. That list is contained within an array and
* service fields are excluded.
*/
function get_fields_name($table,$service,$conn)
{
$query = "SELECT * FROM $table";
$result = mysql_query($query,$conn);
$name = array(); //Array to be returned
for ($i=0;$i<mysql_num_fields($result);$i++)
{
if(!in_array(mysql_field_name($result,$i),$service))
{
//currently selected field is not a service field
$name[] = mysql_field_name($result,$i);
}
}
return $name;
}
//Below $conn is db connection created in 'db.config.php'
$query = "SELECT `name` FROM `detail_arg` WHERE visibility = 0";
$res = mysql_query($query,$conn);
if($res===false)
{
$err_msg = mysql_real_escape_string(mysql_error($conn));
echo "{success:false,data:'".$err_msg."'}";
die();
}
$arg = array(); //list of argument names
while($row = mysql_fetch_assoc($res))
{
$arg[] = $row['name'];
}
//Following function writes the select subquery which is
//necessary to build a column containing a single attribute.
function make_subquery($attribute) //$attribute contains attribute name
{
$query = "";
$query.="(SELECT incident_detail.arg_value ";
$query.="FROM incident_detail ";
$query.="INNER JOIN detail_arg ";
$query.="ON incident_detail.arg_id = detail_arg.id AND detail_arg.name='".$attribute."' ";
$query.="WHERE incident.id = incident_detail.incident_id) ";
$query.="AS $attribute";
return $query;
}
/*
echo make_subquery("date"); //debug code
*/
$subquery = array(); //list of subqueries
for($i=0;$i<count($arg);$i++)
{
$subquery[] = make_subquery($arg[$i]);
}
$query = "SELECT "; //final query containing subqueries
$fields = get_fields_name("incident",array("id","visibility"),$conn);
//list of 'incident' table's fields
for($i=0;$i<count($fields);$i++)
{
$query.="incident.".$fields[$i].", ";
}
//insert the subqueries
$sub = implode($subquery,", ");
$query .= $sub;
$query.=" FROM incident ORDER BY incident.id";
echo $query;
?>

Performing Join with Multiple Criteria in Propel 1.5

This question follows on from the questions here and here.
I have recently upgraded to Propel 1.5, and have started using it's Query features over Criteria. I have a query I cannot translate, however - a left join with multiple criteria:
SELECT * FROM person
LEFT JOIN group_membership ON
person.id = group_membership.person_id
AND group_id = 1
WHERE group_membership.person_id is null;
Its aim is to find all people not in the specified group. Previously I was using the following code to accomplish this:
$criteria->addJoin(array(
self::ID,
GroupMembershipPeer::GROUP_ID,
), array(
GroupMembershipPeer::PERSON_ID,
$group_id,
),
Criteria::LEFT_JOIN);
$criteria->add(GroupMembershipPeer::PERSON_ID, null, Criteria::EQUAL);
I considered performing a query for all people in that group, getting the primary keys and adding a NOT IN on the array, but there didn't seem a particularly easy way to get the primary keys from a find, and it didn't seem very elegant.
An article on codenugget.org details how to add extra criteria to a join, which I attempted:
$result = $this->leftJoin('GroupMembership');
$result->getJoin('GroupMembership')
->addCondition(GroupMembershipPeer::GROUP_ID, $group->getId());
return $result
->useGroupMembershipQuery()
->filterByPersonId(null)
->endUse();
Unfortunately, the 'useGroupMembershipQuery' overrides the left join. To solve this, I tried the following code:
$result = $this
->useGroupMembershipQuery('GroupMembership', Criteria::LEFT_JOIN)
->filterByPersonId(null)
->endUse();
$result->getJoin('GroupMembership')
->addCondition(GroupMembershipPeer::GROUP_ID, $group->getId());
return $tmp;
For some reason this results in a cross join being performed for some reason:
SELECT * FROM `person`
CROSS JOIN `group_membership`
LEFT JOIN group_membership GroupMembership ON
(person.ID=GroupMembership.PERSON_ID
AND group_membership.GROUP_ID=3)
WHERE group_membership.PERSON_ID IS NULL
Does anyone know why this might be doing this, or how one might perform this join successfully in Propel 1.5, without having to resort to Criteria, again?
Propel 1.6 supports multiple criteria on joins with addJoinCondition(). If you update the Symfony plugin, or move to sfPropelORMPlugin, you can take advantage of that. The query can then be written like this:
return $this
->leftJoin('GroupMembership')
->addJoinCondition('GroupMembership', 'GroupMembership.GroupId = ?', $group->getId())
->where('GroupMembership.PersonId IS NULL');