How to join a many to many relationship - mysql

Entities :
Model
Category
Keyword
Model has a many to many relationship to keyword as well as category has a many to many relationship to keyword.
The orm generates following tables
model
category
keyword
keyword_model
keyword_category
When a category is given, how can I get all models related to this category? I would do it like this
get all keyword id's from keyword_category by category.id
join the result with the keyword_model table
the result of the join should be all relevant model id's
Since symfony2 deals with entities and not tables it seems hard to create a mysql query. I tried with something like
SELECT x,y FROM MyBundle:Category x, MyBundle:Model y
JOIN x.keywords
JOIN y.keywords
WHERE
x.id = " . $category . "
however this is invalid mysql syntax. Any ideas how to get the models here?

You could try the following:
SELECT
y
FROM
MyBundle:Model y
WHERE
EXISTS (
SELECT
x
FROM
MyBundle:Category x
JOIN
x.keywords xk
WHERE
xk MEMBER OF y.keywords AND
x = :category
)
Or if your relations are bidirectional:
SELECT
y
FROM
MyBundle:Model y
JOIN
y.keywords yk
JOIN
yk.categories c
WHERE
c = :category

From a given category (I suppose that you have the id)
$category_repo = $this->getDoctrine()->getManager()->getRepository('YourBundleName:Category');
$category = $category_repo->findOneById($id); //$id is your entity id
$keywords = $category->getKeywords(); //getKeywords() is the name of the method that you should have inside your class
$models = new ArrayCollection(); //or use a simple array
foreach($keywords as $keyword) {
foreach($keyword->getModels() as $model) {
$models->add($model);
}
}
however use sql directly should be more performing as you'll do a single query instead of a query (lazy-loading concept) for each object

Related

"NOT IN" for Active Record

I have a MySQL query that I am trying to chain a "NOT IN" at the end of it.
Here is what it looks like in ruby using Active Record:
not_in = find_by_sql("SELECT parent_dimension_id FROM relations WHERE relation_type_id = 6;").map(&:parent_dimension_id)
joins('INNER JOIN dimensions ON child_dimension_id = dimensions.id')
.where(relation_type_id: model_relation_id,
parent_dimension_id: sub_type_ids,
child_dimension_id: model_type)
.where.not(parent_dimension_id: not_in)
So the SQL query I'm trying to do looks like this:
INNER JOIN dimensions ON child_dimension_id = dimensions.id
WHERE relations.relation_type_id = 5
AND relations.parent_dimension_id
NOT IN(SELECT parent_dimension_id FROM relations WHERE relation_type_id = 6);
Can someone confirm to me what I should use for that query?
do I chain on where.not ?
If you really do want
SELECT parent_dimension_id
FROM relations
WHERE relation_type_id = 6
as a subquery, you just need to convert that SQL to an ActiveRecord relation:
Relation.select(:parent_dimension_id).where(:relation_type_id => 6)
then use that as a value in a where call the same way you'd use an array:
not_parents = Relation.select(:parent_dimension_id).where(:relation_type_id => 6)
Relation.joins('...')
.where(relation_type_id: model_relation_id, ...)
.where.not(parent_dimension_id: not_parents)
When you use an ActiveRecord relation as a value in a where and that relation selects a single column:
r = M1.select(:one_column).where(...)
M2.where(:column => r)
ActiveRecord is smart enough to inline r's SQL as an in (select one_column ...) rather than doing two queries.
You could probably replace your:
joins('INNER JOIN dimensions ON child_dimension_id = dimensions.id')
with a simpler joins(:some_relation) if your relations are set up too.
You can feed where clauses with values or arrays of values, in which case they will be translated into in (?) clauses.
Thus, the last part of your query could contain a mapping:
.where.not(parent_dimension_id:Relation.where(relation_type_id:6).map(&:parent_dimension_id))
Or you can prepare a statement
.where('parent_dimension_id not in (?)', Relation.where(relation_type_id:6).map(&:parent_dimension_id) )
which is essentially exactly the same thing

Return a different datatype from postgresql

I have the below query in PG
SELECT
project.project_id,
project.project_name,
category.category_name,
array_agg(row(skill.skill_name,projects_skills.projects_skills_id)) AS skills
FROM project
JOIN projects_skills ON project.project_id = projects_skills.project_id
JOIN skill ON projects_skills.skill_id = skill.skill_id
JOIN category ON project.category_id = category.category_id
GROUP BY project.project_name,project.project_id, category.category_name;
of particular interest is the below line which seems to return a pseudo-type tuple
array_agg(row(skill.skill_name,projects_skills.projects_skills_id)) AS skills
I'm unable to create a view of this because of the pseudo type - in addition to this, the row function seems to return a tuple set like the below:
skills: '{"(Python,3)","(Node,4)","(Javascript,5)"}' }
I could painfully parse it in JavaScript by replacing '(' to '[' etc. but could I do something in postgres to return it preferably as an object?
One possible solution is to register a row type (once):
CREATE TYPE my_type AS (skill_name text, projects_skills_id int);
I am guessing text and int as data types. Use the actual data types of the underlying tables.
SELECT p.project_id, p.project_name, c.category_name
, array_agg((s.skill_name, ps.projects_skills_id)::my_type) AS skills
FROM project p
JOIN projects_skills ps ON p.project_id = ps.project_id
JOIN skill s ON ps.skill_id = s.skill_id
JOIN category c ON p.category_id = c.category_id
GROUP BY p.project_id, p.project_name, c.category_name;
There are many other options, depending on your version of Postgres and what you need exactly.
As well as the excellent suggestions to use JSON in the comments, and #Erwin 's to use a registered composite type, you can use a two-dimension array, or a multivalues approach:
Just replace your line
array_agg(row(skill.skill_name::text,projects_skills.projects_skills_id::text)) AS skills
with the following:
Two dimension array option 1
array_agg(array[skill.skill_name::text,projects_skills.projects_skills_id::text]) AS skills
-- skills will be '{{Python,3},{Node,4},{Javascript,5}}', thus
-- skills[1][1] = 'Python' and skills[1][2] = '3' -- id is text
Two dimension array option 2
array[array_agg(skill.skill_name),array_agg(projects_skills.projects_skills_id)] AS skills
-- skills will be '{{Python,Node,Javascript},{3,4,5}}', thus
-- skills[1][1] = 'Python' and skills[2][1] = '3' -- id is text
Multivalues
array_agg(skill.skill_name) AS skill_names,
array_agg(projects_skills.projects_skills_id) AS skills_ids
-- skills_names = '{Python,Node,Javascript}' and skill_ids = '{3,4,5}', thus
-- skills_names[1] = 'Python' and skills_ids[1] = 3 -- id is integer

INNER JOIN Results from Select Statement using Doctrine QueryBuilder

Can you use Doctrine QueryBuilder to INNER JOIN a temporary table from a full SELECT statement that includes a GROUP BY?
The ultimate goal is to select the best version of a record. I have a viewVersion table that has multiple versions with the same viewId value but different timeMod. I want to find the version with the latest timeMod (and do a lot of other complex joins and filters on the query).
Initially people assume you can do a GROUP BY viewId and then ORDER BY timeMod, but ORDER BY has no effect on GROUP BY, and MySQL will return random results. There are a ton of answers out there (e.g. here) that explain the problem with using GROUP and offer a solution, but I am having trouble interpreting the Doctrine docs to find a way to implement the SQL with Doctrine QueryBuilder (if it's even possible). Why don't I just use DQL? I may have to, but I have a lot of dynamic filters and joins that are much easier to do with QueryBuilder, so I wanted to see if that's possible.
Sample MySQL to Reproduce in Doctrine QueryBuilder
SELECT vv.*
FROM view_version vv
#inner join only returns where the result sets overlap, i.e. one record
INNER JOIN (
SELECT MAX(timeMod) maxTimeMod, viewId
FROM view_version
GROUP BY viewId
) version ON version.viewId = vv.viewId AND vv.timeMod = version.maxTimeMod
#join other tables for filter, etc
INNER JOIN view v ON v.id = vv.viewId
INNER JOIN content_type c ON c.id = v.contentTypeId
WHERE vv.siteId=1
AND v.contentTypeId IN (2)
ORDER BY vv.title ASC;
Theoretical Solution via Query Builder (not working)
I am thinking that the JOIN needs to inject a DQL statement, e.g.
$em = $this->getDoctrine()->getManager();
$viewVersionRepo = $em->getRepository('GutensiteCmsBundle:View\ViewVersion');
$queryMax = $viewVersionRepo->createQueryBuilder()
->addSelect('MAX(timeMod) AS timeModMax')
->addSelect('viewId')
->groupBy('viewId');
$queryBuilder = $viewVersionRepo->createQueryBuilder('vv')
// I tried putting the query in a parenthesis, to no avail
->join('('.$queryMax->getDQL().')', 'version', 'WITH', 'vv.viewId = version.viewId AND vv.timeMod = version.timeModMax')
// Join other Entities
->join('e.view', 'view')
->addSelect('view')
->join('view.contentType', 'contentType')
->addSelect('contentType')
// Perform random filters
->andWhere('vv.siteId = :siteId')->setParameter('siteId', 1)
->andWhere('view.contentTypeId IN(:contentTypeId)')->setParameter('contentTypeId', $contentTypeIds)
->addOrderBy('e.title', 'ASC');
$query = $queryBuilder->getQuery();
$results = $query->getResult();
My code (which may not match the above example perfectly) outputs:
SELECT e, view, contentType
FROM Gutensite\CmsBundle\Entity\View\ViewVersion e
INNER JOIN (
SELECT MAX(v.timeMod) AS timeModMax, v.viewId
FROM Gutensite\CmsBundle\Entity\View\ViewVersion v
GROUP BY v.viewId
) version WITH vv.viewId = version.viewId AND vv.timeMod = version.timeModMax
INNER JOIN e.view view
INNER JOIN view.contentType contentType
WHERE e.siteId = :siteId
AND view.contentTypeId IN (:contentTypeId)
ORDER BY e.title ASC
This Answer seems to indicate that it's possible in other contexts like IN statements, but when I try the above method in the JOIN, I get the error:
[Semantical Error] line 0, col 90 near '(SELECT MAX(v.timeMod)': Error: Class '(' is not defined.
A big thanks to #AdrienCarniero for his alternative query structure for sorting the highest version with a simple JOIN where the entity's timeMod is less than the joined table timeMod.
Alternative Query
SELECT view_version.*
FROM view_version
#inner join to get the best version
LEFT JOIN view_version AS best_version ON best_version.viewId = view_version.viewId AND best_version.timeMod > view_version.timeMod
#join other tables for filter, etc
INNER JOIN view ON view.id = view_version.viewId
INNER JOIN content_type ON content_type.id = view.contentTypeId
WHERE view_version.siteId=1
# LIMIT Best Version
AND best_version.timeMod IS NULL
AND view.contentTypeId IN (2)
ORDER BY view_version.title ASC;
Using Doctrine QueryBuilder
$em = $this->getDoctrine()->getManager();
$viewVersionRepo = $em->getRepository('GutensiteCmsBundle:View\ViewVersion');
$queryBuilder = $viewVersionRepo->createQueryBuilder('vv')
// Join Best Version
->leftJoin('GutensiteCmsBundle:View\ViewVersion', 'bestVersion', 'WITH', 'bestVersion.viewId = e.viewId AND bestVersion.timeMod > e.timeMod')
// Join other Entities
->join('e.view', 'view')
->addSelect('view')
->join('view.contentType', 'contentType')
->addSelect('contentType')
// Perform random filters
->andWhere('vv.siteId = :siteId')->setParameter('siteId', 1)
// LIMIT Joined Best Version
->andWhere('bestVersion.timeMod IS NULL')
->andWhere('view.contentTypeId IN(:contentTypeId)')->setParameter('contentTypeId', $contentTypeIds)
->addOrderBy('e.title', 'ASC');
$query = $queryBuilder->getQuery();
$results = $query->getResult();
In terms of performance, it really depends on the dataset. See this discussion for details.
TIP: The table should include indexes on both these values (viewId and timeMod) to speed up results. I don't know if it would also benefit from a single index on both fields.
A native SQL query using the original JOIN method may be better in some cases, but compiling the query over an extended range of code that dynamically creates it, and getting the mappings correct is a pain. So this is at least an alternative solution that I hope helps others.

Complex Sql Query to RoR ORM

I have a very complex sql query that I would like to convert to RoR's ORM.
SELECT c.* FROM (SELECT companies.* FROM companies WHERE city = "?" AND country = "?") AS c INNER JOIN tagsForCompany AS tc ON c.id = tc.Company INNER JOIN tags AS t ON t.id = tc.TID WHERE t.Name REGEXP '?'
I have defined the models like this:
companies.rb
class Company < ActiveRecord::Base
# ... Some code that doesn't matter
has_and_belongs_to_many :tags
# ... Some other code
end
and tags.rb
class Tag < ActiveRecord::Base
has_and_belongs_to_many :company
end
I need a function in the companies controller that searches for the companies like the query above.
Options:
First: find_by_sql()
Description: Allows you to put any query you want on it.
http://apidock.com/rails/ActiveRecord/Base/find_by_sql/class
Second: Combination of .where() and .joins() method.
But be careful, if you call .joins() after a .where() with a nil return, you will get an error of undefined method. The solution here would be to first test if .where() returned anything, then you can join with another table.
Possible ways to use joins:
joins(:tags) Creates a Inner Join
joins('Left join foo...') Enables you to use left outter joins
joins(tagsForCompanies: :tags) Nested Joins if you have N to N associations.
See API:
http://api.rubyonrails.org/classes/ActiveRecord/QueryMethods.html#method-i-where
http://guides.rubyonrails.org/active_record_querying.html#joining-tables

Per-row dynamic sql

I have a database representing something like a bookstore. There's a table containing the categories that books can be in. Some categories are defined simply using another table that contains the category-item relationships. But there are also some categories that can be defined programmatically -- a category for a specific author can be defined using a query (SELECT item_id FROM items WHERE author = "John Smith"). So my categories table has a "query" column; if it's not null, I use this to get the items in the category, otherwise I use the category_items table.
Currently, I have the application (PHP code) make this decision, but this means lots of separate queries when we iterate over all the categories. Is there some way to incorporate this dynamic SQL into a join? Something like:
SELECT c.category, IF(c.query IS NULL, count(i.items), count(EXECUTE c.query)
FROM categories c
LEFT OUTER JOIN category_items i
ON c.category = i.category
EXECUTE requires a prepared statement, but I need to prepare a different statement for each row. Also, EXECUTE can't be used in expressions, it's just a toplevel statement. Suggestions?
What happens when you want to list books by publisher? Country? Language? You'd have to throw them all into a single "category_items" table. How would you pick which dynamic query to execute? The query-within-a-query method is not going to work.
I think your concept of "category" is too broad, which is resulting in overly complicated SQL. I would replace "category" to represent only "genre" (for books). Genres are defined in their own table, and item_genres connects them to the items table. Books-by-author and books-by-genre should just be separate queries at the application level, rather than trying to do them both with the same (sort of) query at the database/SQL level. (If you have music as well as books, they probably shouldn't all be stored in a single "items" table because they're different concepts ... have different genres, author vs. artist, etc.)
I know this does not really solve your problem in the way you'd like, but I think you'll be happier not trying to do it that way.
Here's how I finally ended up solving this in the PHP client.
I decided to just keep the membership in the category_items table, and use the dynamic queries during submission to update this table.
This is the function in my script that's called to update an item's categories during submission or updating. It takes a list of user-selected categories (which can only be chosen from categories that don't have dynamic queries), and using this and the dynamic queries it figures out the difference between the categories that an item is currently in and the ones it should be in, and inserts/deletes as necessary to get them in sync. (Note that the actual table names in my DB are not the same as in my question, I was using somewhat generic terms.)
function update_item_categories($dbh, $id, $requested_cats) {
$data = mysql_check($dbh, mysqli_query($dbh, "select id, query from t_ld_categories where query is not null"), 'getting dynamic categories');
$clauses = array();
while ($row = mysqli_fetch_object($data))
$clauses[] = sprintf('select %d cat_id, (%d in (%s)) should_be_in',
$row->id, $id, $row->query);
if (!$requested_cats) $requested_cats[] = -1; // Dummy entry that never matches cat_id
$requested_cat_string = implode(', ', $requested_cats);
$clauses[] = "select c.id cat_id, (c.id in ($requested_cat_string)) should_be_in
from t_ld_categories c
where member_type = 'lessons' and query is null";
$subquery = implode("\nunion all\n", $clauses);
$query = "select c.cat_id cat_id, should_be_in, (member_id is not null) is_in
from ($subquery) c
left outer join t_ld_cat_members m
on c.cat_id = m.cat_id
and m.member_id = $id";
// printf("<pre>$query</pre>");
$data = mysql_check($dbh, mysqli_query($dbh, $query), 'getting current category membership');
$adds = array();
$deletes = array();
while ($row = mysqli_fetch_object($data)) {
if ($row->should_be_in && !$row->is_in) $adds[] = "({$row->cat_id}, $id)";
elseif (!$row->should_be_in && $row->is_in) $deletes[] = "(cat_id = {$row->cat_id} and member_id = $id)";
}
if ($deletes) {
$delete_string = implode(' or ', $deletes);
mysql_check($dbh, mysqli_query($dbh, "delete from t_ld_cat_members where $delete_string"), 'deleting old categories');
}
if ($adds) {
$add_string = implode(', ', $adds);
mysql_check($dbh, mysqli_query($dbh, "insert into t_ld_cat_members (cat_id, member_id) values $add_string"),
"adding new categories");
}
}