Best way in Doctrine to load only entities which have attached entities - mysql

I have an entity, let's call it Foo and a second one Bar
Foo can (but doesn't have to) have one or multiple Bar entries assigned. It looks something like this:
/**
* #ORM\OneToMany(targetEntity="Bar", mappedBy="foo")
* #ORM\OrderBy({"name" = "ASC"})
*/
private $bars;
I now would like to load in one case only Foo entities that have at least one Bar entity assigned. Previously, there was one foreach loop to traverse all Foo entries and if it had assigned entries, the Foo entry got assigned to an array.
My current implementation is in the FooRepository a function called findIfTheresBar which looks like this:
$qb = $this->_em->createQueryBuilder()
->select('e')
->from($this->_entityName, 'e')
/* some where stuff here */
->addOrderBy('e.name', 'ASC')
->join('e.bars', 'b')
->groupBy('e.id');
Is this the correct way to load such entries? Is there a better (faster) way? It kind of feels as if it should have a having(...) in the query.
EDIT:
I've investigated it a little further. The query should return 373 out of 437 entries.
Version 1: only using join(), this loaded 373 entries in 7.88ms
Version 2: using join() and having(), this loaded 373 entries in 8.91ms
Version 3: only using leftJoin(), this loaded all 437 entries (which isn't desired) in 8.05ms
Version 4: using leftJoin() and having(), this loaded 373 entries in 8.14ms
Since Version 1 which only uses an innerJoin as #Chausser pointed out, is the fastest, I will stick to that one.
Note: I'm not saying Version 1 will be the fastest in all scenarios and on every hardware, so kind of a follow up question, does anybody know about a performance comparison?

Please take a look at this answer for more information on how SQL JOINs work: https://stackoverflow.com/a/16598900/1307183
Using a join, which is an alias of innerJoin, is exactly what you want. This only returns records where entries exist in both Foo and Bar - aka where the association/attached entity exists. This calls INNER JOIN in SQL, which, if your database structure is defined correctly, is the absolute best and fastest way to get the data you want.
Using a leftJoin calls LEFT JOIN in SQL, which returns all records from Foo, even if there is no Bar associated with it (for example, where bar_id in your foo table would be null).
You have no reason to use having() in any of the above scenarios you described. If you want to filter further you would do that with a ->addWhere() function. Using the having() clause is something you would only want to do if you were selecting aggregate data in your original query (like SELECT SUM(field) AS sum_field).

Related

TypeORM: how to load relations with CreateQueryBuilder, without using JOINs?

I'm developing an API using NestJS & TypeORM to fetch data from a MySQL DB. Currently I'm trying to get all the instances of an entity (HearingTonalTestPage) and all the related entities (e.g. Frequency). I can get it using createQueryBuilder:
const queryBuilder = await this.hearingTonalTestPageRepo
.createQueryBuilder('hearing_tonal_test_page')
.innerJoinAndSelect('hearing_tonal_test_page.hearingTest', 'hearingTest')
.innerJoinAndSelect('hearingTest.page', 'page')
.innerJoinAndSelect('hearing_tonal_test_page.frequencies', 'frequencies')
.innerJoinAndSelect('frequencies.frequency', 'frequency')
.where(whereConditions)
.orderBy(`page.${orderBy}`, StringToSortType(pageFilterDto.ascending));
The problem here is that this will produce a SQL query (screenshot below) which will output a line per each related entity (Frequency), when I want to output a line per each HearingTonalTestPage (in the screenshot example, 3 rows instead of 12) without losing its relations data. Reading the docs, apparently this can be easily achieved using the relations option with .find(). With QueryBuilder I see some relation methods, but from I've read, under the hood it will produce JOINs, which of course I want to avoid.
So the 1 million $ question here is: is it possible with CreateQueryBuilder to load the relations after querying the main entities (something similar to .find({ relations: { } }) )? If yes, how can I achieve it?
I am not an expert, but I had a similar case and using:
const qb = this.createQueryBuilder("product");
// apply relations
FindOptionsUtils.applyRelationsRecursively(qb, ["createdBy", "updatedBy"], qb.alias, this.metadata, "");
return qb
.orderBy("product.id", "DESC")
.limit(1)
.getOne();
it worked for me, all relations are correctly loaded.
ref: https://github.com/typeorm/typeorm/blob/master/src/find-options/FindOptionsUtils.ts
You say that you want to avoid JOINs, and are seeking an analogue of find({relations: {}}), but, as the documentation says, find({relations: {}}) uses under the hood, expectedly, LEFT JOINs. So when we talk about query with relations, it can't be without JOIN's.
Now about the problem:
The problem here is that this will produce a SQL query (screenshot
below) which will output a line per each related entity (Frequency),
when I want to output a line per each HearingTonalTestPage
Your query looks fine. And the result of the query, also, is ok. I think that you expected to have as a result of the query something similar to json structure(when the relation field contains all the information inside itself instead of creating new rows and spread all its values on several rows). But that is how the SQL works. By the way, getMany() method should return 3 HearingTonalTestPage objects, not 12, so what the SQL query returns should not worry you.
The main question:
is it possible with CreateQueryBuilder to load the relations after
querying the main entities
I did't get what do you mean by saying "after querying the main entities". Can you provide more context?

Why is my query wrong?

before i use alias for table i get the error:
: Integrity constraint violation: 1052 Column 'id' in field list is ambiguous
Then i used aliases and i get this error:
unknown index a
I am trying to get a list of category name ( dependant to a translation) and the associated category id which is unique. Since i need to put them in a select, i see that i should use the lists.
$categorie= DB::table('cat as a')
->join('campo_cat as c','c.id_cat','=','a.id')
->join('campo as d','d.id','=','c.id_campo')
->join('cat_nome as nome','nome.id_cat','=','a.id')
->join('lingua','nome.id_lingua','=','lingua.id')
->where('lingua.lingua','=','it-IT')
->groupby('nome.nome')
->lists('nome.nome','a.id');
The best way to debug your query is to look at the raw query Laravel generates and trying to run this raw query in your favorite SQL tool (Navicat, MySQL cli tool...), so you can dump it to log using:
DB::listen(function($sql, $bindings, $time) {
Log::info($sql);
Log::info($bindings);
});
Doing that with yours I could see at least one problem:
->where('lingua.lingua','=','it-IT')
Must be changed to
->where('lingua.lingua','=',"'it-IT'")
As #jmail said, you didn't really describe the problem very well, just what you ended up doing to get around (part of) it. However, if I read your question right you're saying that originally you did it without all the aliases you got the 'ambiguous' error.
So let me explain that first: this would happen, because there are many parts of that query that use id rather than a qualified table`.`id.
if you think about it, without aliases you query looks a bit like this: SELECT * FROM `cat` JOIN `campo_cat` ON `id_cat` = `id` JOIN `campo` ON `id` = `id_campo`; and suddenly, MySQL doesn't know to which table all these id columns refer. So to get around that all you need to do is namespace your fields (i.e. use ... JOIN `campo` ON `campo`.`id` = `campo_cat`.`id_campo`...). In your case you've gone one step further and aliased your tables. This certianly makes the query a little simpler, though you don't need to actually do it.
So on to your next issue - this will be a Laravel error. And presumably happening because your key column from lists($valueColumn, $keyColumn) isn't found in the results. This is because you're referring to the cat.id column (okay in your aliased case a.id) in part of the code that's no longer in MySQL - the lists() method is actually run in PHP after Laravel gets the results from the database. As such, there's no such column called a.id. It's likely it'll be called id, but because you don't request it specifically, you may find that the ambiguous issue is back. My suggestion would be to select it specifically and alias the column. Try something like the below:
$categories = DB::table('cat as a')
->join('campo_cat as c','c.id_cat','=','a.id')
->join('campo as d','d.id','=','c.id_campo')
->join('cat_nome as nome','nome.id_cat','=','a.id')
->join('lingua','nome.id_lingua','=','lingua.id')
->where('lingua.lingua','=','it-IT')
->groupby('nome.nome')
->select('nome.nome as nome_nome','a.id as a_id') // here we alias `.id as a_id
->lists('nome_nome','a_id'); // here we refer to the actual columns
It may not work perfectly (I don't use ->select() so don't know whether you pass an array or multiple parameters, also you may need DB::raw() wrapping each one in order to do the aliasing) but hopefully you get my meaning and can get it working.

Why is Rails is adding `OR 1=0` to queries using the where clause hash syntax with a range?

The project that I'm working on is using MySQL on RDS (mysql2 gem specifically).
When I use a hash of conditions including a range in a where statement I'm getting a bit of an odd addition to my query.
User.where(id: [1..5])
and
User.where(id: [1...5])
Result in the following queries respectively:
SELECT `users`.* FROM `users` WHERE ((`users`.`id` BETWEEN 1 AND 5 OR 1=0))
SELECT `users`.* FROM `users` WHERE ((`users`.`id` >= 1 AND `users`.`id` < 5 OR 1=0))
The queries work perfectly fine since OR FALSE is effectively a no-op. I'm just wondering why Rails or ARel is adding this snippet into the query.
EDIT
It looks like the line that could explain this is line 26 in ActiveRecord::PredicateBuilder. Still no idea how the hash could be empty? at that point but maybe someone else does.
EDIT 2
This is intersting. I was looking into Filip's comment to see why he made it since it seems just like a clarification but he is correct that 1..5 != [1..5]. The former is an inclusive range from 1 to 5 where as the latter is an array whose first element is the former. I tried putting these into an ARel where call to see the SQL produced and the OR 1=0 is not there!
User.where(id: 1..5) #=> SELECT "users".* FROM "users" WHERE ("users"."id" BETWEEN 1 AND 5)
User.where(id: 1...5) #=> SELECT "users".* FROM "users" WHERE ("users"."id" >= 1 AND "users"."id" < 5)
While I still do not know why ARel is adding the OR 1=0 which will always be false and seemingly unnecessary. It may be due to how Arrays and Ranges are handled differently.
Building on the fact, which you've discovered, that [1..5] is not the correct way to specify the range... I have discovered why [1..5] behaves as it does. To get there, I first found that an empty array in a hash condition produces the 1=0 SQL condition:
User.where(id: []).to_sql
# => "SELECT \"users\".* FROM \"users\" WHERE 1=0"
And, if you check the ActiveRecord::PredicateBuilder::ArrayHandler code, you'll see that array values are always partitioned into ranges and other values.
ranges, values = values.partition { |v| v.is_a?(Range) }
This explains why you don't see the 1=0 when using non-range values. That is, the only way to get 1=0 from an array without including a range is to supply an empty array, which yields the 1=0 condition, as shown above. And when all the array has in it is a range you're going to get the range conditions (ranges) and, separately, an empty array condition (values) executed. My guess is that there isn't a good reason for this... it just simply is easier to let this be than to avoid it (since the result set is equivalent either way). If the partition code was a bit smarter then it wouldn't have to tack on the additional, empty values array and could skip the 1=0 condition.
As for where the 1=0 comes from in the first place... I think that comes from the database adapter, but I couldn't find exactly where. However, I would call it an attempt to fail to find a record. In other words, WHERE 1=0 isn't ever going to return any users, which makes sense over alternative SQL like WHERE id=null which will find any users whose id is null (realizing that this isn't really correct SQL syntax). And this is what I'd expect when attempting to find all Users whose id is in the empty set (i.e. we're not asking for nil ids or null ids or whatever). So, in my mind, leaving the bit about exactly where 1=0 comes from as a black box is OK. At least we now can reason about why the range inside of the array is causing it to show up!
UPDATE
I've also found that, even when using ARel directly, you can still get 1=0:
User.arel_table[:id].in([]).to_sql
# => "1=0"
This is strictly speaking a guess, since I did something similar in a project of my own (although I used AND 1).
For whatever reason, when generating a query, it is easier to always have a WHERE clause containing a no-op than it is to conditionally generate the WHERE clause at all. That is, if you don't include any where sections it will end up generating something still valid.
On the other hand, I'm not sure why it's taking this form: when I did it I use 1 [<AND (generated code)>...] it allowed arbitrary chaining, but I don't see how what you're seeing would allow it. None the less, I still think it likely to be a result of an algorithmic code generation scheme.
Check to see if you are using active_record-acts_as. That was the problem with me.
Add the line below to your Gemfile:
gem 'active_record-acts_as', :git => 'https://github.com/hzamani/active_record-acts_as.git'
This will just pull the latest version of the Gem that will hopefully be fixed. Worked for me.
I think you're seeing side effects of ruby personally.
I think the better way to do what you're doing would be with
2.0.0-p481#meri :008 > [*1..5]
=> [1, 2, 3, 4, 5]
User.where(id: [*1..5]).to_sql
"SELECT `users`.* FROM `users` WHERE `users`.`id` IN (1, 2, 3, 4, 5)"
As this creates an Array vs an Array with element 1 of class Range.
OR
use an explicit Range to trigger the BETWEEN in AREL.
# with end element, i.e. exclude_end=false
2.0.0-p481#meri :013 > User.where(id: Range.new(1,5)).to_sql
=> "SELECT `users`.* FROM `users` WHERE (`users`.`id` BETWEEN 1 AND 5)"
# without end element, i.e. exclude_end=true
2.0.0-p481#meri :022 > User.where(id: Range.new(1, 5, true)).to_sql
=> "SELECT `users`.* FROM `users` WHERE (`users`.`id` >= 1 AND `users`.`id` < 5)"
If you care about having control of the queries you generate and the full power of the SQL language and database features then I would suggest moving from ActiveRecord/Arel to Sequel.
I can honestly say there are a lot more quirks and infuriating times ahead for you with ActiveRecord, especially when you move beyond simple crud like queries. When you start trying to query your data in anger, perhaps needing to join a few join tables here and there and realize you really do need join conditions or union all type queries.
It is also significantly faster and more reliable in its query generation and result handling and much easier to compose the queries you want. It also has real documentation you can actually read unlike arel.
I just wish I had discovered it much earlier rather than persisting with the rails default data access layer.

SQL select everything with arbitrary IN clause

This will sound silly, but trust me it is for a good (i.e. over-engineered) cause.
Is it possible to write a SQL query using an IN clause which selects everything in that table without knowing anything about the table? Keep in mind this would mean you can't use a subquery that references the table.
In other words I would like to find a statement to replace "SOMETHING" in the following query:
SELECT * FROM table_a WHERE table_a.id IN (SOMETHING)
so that the results are identical to:
SELECT * FROM table_a
by doing nothing beyond changing the value of "SOMETHING"
To satisfy the curious I'll share the reason for the question.
1) I have a FactoryObject abstract class which grants all models that extend it some glorious factory method magic using two template methods: getData() and load()
2) Models must implement the template methods. getData is a static method that accepts ID constraints, pulls rows from the database, and returns a set of associative arrays. load is not static, accepts an associative array, and populates the object based on that array.
3) The non-abstract part of FactoryObject implements a getObject() and a getObjects() method. These call getData, create objects, and loads() the array responses from getData to create and return populated objects.
getObjects() requires ID constraints as an input, either in the form of a list or in the form of a subquery, which are then passed to getData(). I wanted to make it possible to pass in no ID constraints to get all objects.
The problem is that only the models know about their tables. getObjects() is implemented at a higher level and so it doesn't know what to pass getData(), unless there was a universal "return everything" clause for IN.
There are other solutions. I can modify the API to require getData to accept a special parameter and return everything, or I can implement a static getAll[ModelName]s() method at the model level which calls:
static function getAllModelObjects() {
return getObjects("select [model].id from [model]");
}
This is reasonable and may fit the architecture anyway, but I was curious so I thought I would ask!
Works on SQL Server:
SELECT * FROM table_a WHERE table_a.id IN (table_a.id)
Okay, I hate saying no so I had to come up with another solution for you.
Since mysql is opensource you can get the source and incorporate a new feature that understands the infinity symbol. Then you just need to get the mysql community to buy into the usefulness of this feature (steer the conversation away from security as much as possible in your attempts to do so), and then get your company to upgrade their dbms to the new version once this feature has been implemented.
Problem solved.
The answer is simple. The workaround is to add some criteria like these:
# to query on a number column
AND (-1 in (-1) OR sample_table.sample_column in (-1))
# or to query on a string column
AND ('%' in ('%') OR sample_table.sample_column in ('%'))
Therefore, in your example, two following queries should return the same result as soon as you pass -1 as the parameter value.
SELECT * FROM table_a;
SELECT * FROM table_a WHERE (-1 in (-1) OR table_a.id in (-1));
And whenever you want to filter something out, you can pass it as a parameter. For example, in the following query, the records with id of 1, 2 and 6 are filtered.
SELECT * FROM table_a WHERE (-1 in (1, 2, 6) OR table_a.id in (1, 2, 6));
In this case, we have a default value like -1 or % and we have a parameter that can be anything. If the parameter is the default value, nothing is filtered.
I suggest % character as the default value if you are querying over a text column or -1 if you are querying over the PK of the table. But it totally depends to you to substitute % or -1 with any reserved character or number that you decide on.
similiar to #brandonmoore:
select * from table_a where table_a.id not in ('0')
How about:
select * from table_a where table_a.id not ine ('somevaluethatwouldneverpossiblyexistintable_a.id')
EDIT:
As much as I would like to continue thinking of a way to solve your problem, I know there isn't a way to so I figure I'll go ahead and be the first person to tell you so I can at least get credit for the answer. It's truly a bittersweet victory though :/
If you provide more info though maybe I or someone else can help you think of another workaround.

Union (or Concat, etc..) with Constant values and projection

I've discovered a very nasty gotcha with Linq-to-sql, and i'm not sure what the best solution is.
If you take a simple L2S Union statement, and include L2S code in one side, and constants in the other, then the constants do not get included in the SQL Union and are only projected into the output after the SQL, resulting in SQL errors about the number of columns not mathching for the union.
As an example:
(from d in dc.mytable where foo == "bar" select new {First = d.Foo, Second = d.Roo})
.Union(from e in dc.mytable where foo == "roo" select new {First= "", Second = e.Roo})
This will generate an error "All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.
This is particularly insidious (and maddening) because there are obviously the same number of expressions in the list, but when you look at the SQL, you will notice that it does not generate a column for "First" in the second half of the Union. This is because "First" is inserted into the projection AFTER the query.
Ok, the easy solution is to just convert each part into Enumerables or Lists or something and then do the union in memory rather than SQL, and that's fine if you're dealing with a small amount of data. However, if you're working with a large set of data, which you then plan to further filter (in sql) before returning it this is not ideal.
I guess what i'm looking for is a way to force L2S to include the column in the SQL. Is that possible?
UPDATE:
While not an exact duplicate, this error is similar to This Question and has similar solutions. So i'm closing, but not deleting this question because it may help someone else come to posible solutions from a different way.
Unfortunately, L2S is too smart of it's own good sometimes.
I've decided that the only real solution is to use a stored proc. Hope this helps.
This is a bug in the Linq2SQL provider.
In LinqPad you can clearly see the bug.
(from d in dc.mytable where foo == "bar" select new {First = d.Foo, Second = d.Roo})
.Union(from e in dc.mytable where foo == "roo" select new {First= "", Second = e.Roo})
Will server side produce something like this:
SELECT [t2].[Foo], [t2].[Roo]
FROM (
SELECT [t0].[Foo], #p0 AS [value]
FROM [dc].[Mytable] AS [t0]
UNION ALL
SELECT [t1].[Foo], [t1].[Roo]
FROM [dc].[Mytable] AS [t1]
) AS [t2]
This will be a problem because the union will name the second column "value" instead of "Roo", which will cause the outer query to fail.
If you, however, switch the order of the two tables
(from e in dc.mytable where foo == "roo" select new {First= "", Second = e.Roo})
.Union(from d in dc.mytable where foo == "bar" select new {First = d.Foo, Second = d.Roo})
So that the constant assignment within the generated T-SQL comes in the non-first table, then things may work because T-SQL ignores the column names of subsequent tables.
Note: The first table in a union decides both column name and type. So would be smart to get LinqPad anyway.