Union (or Concat, etc..) with Constant values and projection - linq-to-sql

I've discovered a very nasty gotcha with Linq-to-sql, and i'm not sure what the best solution is.
If you take a simple L2S Union statement, and include L2S code in one side, and constants in the other, then the constants do not get included in the SQL Union and are only projected into the output after the SQL, resulting in SQL errors about the number of columns not mathching for the union.
As an example:
(from d in dc.mytable where foo == "bar" select new {First = d.Foo, Second = d.Roo})
.Union(from e in dc.mytable where foo == "roo" select new {First= "", Second = e.Roo})
This will generate an error "All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.
This is particularly insidious (and maddening) because there are obviously the same number of expressions in the list, but when you look at the SQL, you will notice that it does not generate a column for "First" in the second half of the Union. This is because "First" is inserted into the projection AFTER the query.
Ok, the easy solution is to just convert each part into Enumerables or Lists or something and then do the union in memory rather than SQL, and that's fine if you're dealing with a small amount of data. However, if you're working with a large set of data, which you then plan to further filter (in sql) before returning it this is not ideal.
I guess what i'm looking for is a way to force L2S to include the column in the SQL. Is that possible?
UPDATE:
While not an exact duplicate, this error is similar to This Question and has similar solutions. So i'm closing, but not deleting this question because it may help someone else come to posible solutions from a different way.
Unfortunately, L2S is too smart of it's own good sometimes.

I've decided that the only real solution is to use a stored proc. Hope this helps.

This is a bug in the Linq2SQL provider.
In LinqPad you can clearly see the bug.
(from d in dc.mytable where foo == "bar" select new {First = d.Foo, Second = d.Roo})
.Union(from e in dc.mytable where foo == "roo" select new {First= "", Second = e.Roo})
Will server side produce something like this:
SELECT [t2].[Foo], [t2].[Roo]
FROM (
SELECT [t0].[Foo], #p0 AS [value]
FROM [dc].[Mytable] AS [t0]
UNION ALL
SELECT [t1].[Foo], [t1].[Roo]
FROM [dc].[Mytable] AS [t1]
) AS [t2]
This will be a problem because the union will name the second column "value" instead of "Roo", which will cause the outer query to fail.
If you, however, switch the order of the two tables
(from e in dc.mytable where foo == "roo" select new {First= "", Second = e.Roo})
.Union(from d in dc.mytable where foo == "bar" select new {First = d.Foo, Second = d.Roo})
So that the constant assignment within the generated T-SQL comes in the non-first table, then things may work because T-SQL ignores the column names of subsequent tables.
Note: The first table in a union decides both column name and type. So would be smart to get LinqPad anyway.

Related

Best way in Doctrine to load only entities which have attached entities

I have an entity, let's call it Foo and a second one Bar
Foo can (but doesn't have to) have one or multiple Bar entries assigned. It looks something like this:
/**
* #ORM\OneToMany(targetEntity="Bar", mappedBy="foo")
* #ORM\OrderBy({"name" = "ASC"})
*/
private $bars;
I now would like to load in one case only Foo entities that have at least one Bar entity assigned. Previously, there was one foreach loop to traverse all Foo entries and if it had assigned entries, the Foo entry got assigned to an array.
My current implementation is in the FooRepository a function called findIfTheresBar which looks like this:
$qb = $this->_em->createQueryBuilder()
->select('e')
->from($this->_entityName, 'e')
/* some where stuff here */
->addOrderBy('e.name', 'ASC')
->join('e.bars', 'b')
->groupBy('e.id');
Is this the correct way to load such entries? Is there a better (faster) way? It kind of feels as if it should have a having(...) in the query.
EDIT:
I've investigated it a little further. The query should return 373 out of 437 entries.
Version 1: only using join(), this loaded 373 entries in 7.88ms
Version 2: using join() and having(), this loaded 373 entries in 8.91ms
Version 3: only using leftJoin(), this loaded all 437 entries (which isn't desired) in 8.05ms
Version 4: using leftJoin() and having(), this loaded 373 entries in 8.14ms
Since Version 1 which only uses an innerJoin as #Chausser pointed out, is the fastest, I will stick to that one.
Note: I'm not saying Version 1 will be the fastest in all scenarios and on every hardware, so kind of a follow up question, does anybody know about a performance comparison?
Please take a look at this answer for more information on how SQL JOINs work: https://stackoverflow.com/a/16598900/1307183
Using a join, which is an alias of innerJoin, is exactly what you want. This only returns records where entries exist in both Foo and Bar - aka where the association/attached entity exists. This calls INNER JOIN in SQL, which, if your database structure is defined correctly, is the absolute best and fastest way to get the data you want.
Using a leftJoin calls LEFT JOIN in SQL, which returns all records from Foo, even if there is no Bar associated with it (for example, where bar_id in your foo table would be null).
You have no reason to use having() in any of the above scenarios you described. If you want to filter further you would do that with a ->addWhere() function. Using the having() clause is something you would only want to do if you were selecting aggregate data in your original query (like SELECT SUM(field) AS sum_field).

Can MySQL create a result set by inserting members from a loop?

MySQL can do while-loops.
Can it do something like this
result_set = <EMPTY SET>
while (condition)
SELECT foo INTO bar;
if is_candidate
then
add (bar, baz) to result_set
end while
return result_set
Is this possible? If so how portable is it?
Context, since I'm sure people will say I'm doing it wrong:
the declarative way would be to SELECT whatever FROM table WHERE predicate(whatever) but in my case predicate has O(n) running time where n is the size of the table
so the overall query has quadratic running time,
but searching from known result rows to connected rows using is_candidate is O(1) and so my pseudocode above would be linear-time overall
(No, this data set isn't ideal for SQL, but it's what I've got to work with.)

SQLAlchemy db.text().bindparams() without clobbering

I just discovered that if you use the same name in bindparams twice in the same query, the second value clobbers the first one. For a contrived example:
db.session.query(MyTable).filter(
db.or_(
db.text("my_table.field = :value").bindparams(value=value1),
db.text("my_table.field = :value").bindparams(value=value2),
)
)
Here you would only get things with value2. value1 would not appear in the query.
Is there a general purpose way to fix this?
Btw, db.text() bits in my real query access nested jsonb properties, so please don't answer telling me to just use the column objects in this query in place of db.text().

Can we control LINQ expression order with Skip(), Take() and OrderBy()

I'm using LINQ to Entities to display paged results. But I'm having issues with the combination of Skip(), Take() and OrderBy() calls.
Everything works fine, except that OrderBy() is assigned too late. It's executed after result set has been cut down by Skip() and Take().
So each page of results has items in order. But ordering is done on a page handful of data instead of ordering of the whole set and then limiting those records with Skip() and Take().
How do I set precedence with these statements?
My example (simplified)
var query = ctx.EntitySet.Where(/* filter */).OrderByDescending(e => e.ChangedDate);
int total = query.Count();
var result = query.Skip(n).Take(x).ToList();
One possible (but a bad) solution
One possible solution would be to apply clustered index to order by column, but this column changes frequently, which would slow database performance on inserts and updates. And I really don't want to do that.
EDIT
I ran ToTraceString() on my query where we can actually see when order by is applied to the result set. Unfortunately at the end. :(
SELECT
-- columns
FROM (SELECT
-- columns
FROM (SELECT -- columns
FROM ( SELECT
-- columns
FROM table1 AS Extent1
WHERE EXISTS (SELECT
-- single constant column
FROM table2 AS Extent2
WHERE (Extent1.ID = Extent2.ID) AND (Extent2.userId = :p__linq__4)
)
) AS Project2
limit 0,10 ) AS Limit1
LEFT OUTER JOIN (SELECT
-- columns
FROM table2 AS Extent3 ) AS Project3 ON Limit1.ID = Project3.ID
UNION ALL
SELECT
-- columns
FROM (SELECT -- columns
FROM ( SELECT
-- columns
FROM table1 AS Extent4
WHERE EXISTS (SELECT
-- single constant column
FROM table2 AS Extent5
WHERE (Extent4.ID = Extent5.ID) AND (Extent5.userId = :p__linq__4)
)
) AS Project6
limit 0,10 ) AS Limit2
INNER JOIN table3 AS Extent6 ON Limit2.ID = Extent6.ID) AS UnionAll1
ORDER BY UnionAll1.ChangedDate DESC, UnionAll1.ID ASC, UnionAll1.C1 ASC
My workaround solution
I've managed to workaround this problem. Don't get me wrong here. I haven't solved precedence issue as of yet, but I've mitigated it.
What I did?
This is the code I've used until I get an answer from Devart. If they won't be able to overcome this issue I'll have to use this code in the end.
// get ordered list of IDs
List<int> ids = ctx.MyEntitySet
.Include(/* Related entity set that is needed in where clause */)
.Where(/* filter */)
.OrderByDescending(e => e.ChangedDate)
.Select(e => e.Id)
.ToList();
// get total count
int total = ids.Count;
if (total > 0)
{
// get a single page of results
List<MyEntity> result = ctx.MyEntitySet
.Include(/* related entity set (as described above) */)
.Include(/* additional entity set that's neede in end results */)
.Where(string.Format("it.Id in {{{0}}}", string.Join(",", ids.ConvertAll(id => id.ToString()).Skip(pageSize * currentPageIndex).Take(pageSize).ToArray())))
.OrderByDescending(e => e.ChangedOn)
.ToList();
}
First of all I'm getting ordered IDs of my entities. Getting only IDs is well performant even with larger set of data. MySql query is quite simple and performs really well. In the second part I partition these IDs and use them to get actual entity instances.
Thinking of it, this should perform even better than the way I was doing it at the beginning (as described in my question), because getting total count is much much quicker due to simplified query. The second part is practically very very similar, except that my entities are returned rather by their IDs instead of partitioned using Skip and Take...
Hopefully someone may find this solution helpful.
I haven't worked directly with Linq to Entities, but it should have a way to hook specific stored procedures into certain locations when needed. (Linq to SQL did.) If so, you could turn this query into a stored procedure, doing exacly what is required, and doing it efficiently.
Assuming from you comment the persisting the values in a List is not acceptable:
There's no way to completely minimize the iterations, as you intended (and as I would have tried too, living in hope). Cutting the iterations down by one would be nice. Is it possible to just get the Count once and cache/session it? Then you could:
int total = ctx.EntitySet.Count; // Hopefully you can not repeat doing this.
var result = ctx.EntitySet.Where(/* filter */).OrderBy(/* expression */).Skip(n).Take(x).ToList();
Hopefully you can cache the Count somehow, or avoid needing it every time. Even if you can't, this is the best you can do.
Could you please create a sample illusrating the problem and send it to us (support * devart * com, subject "EF: Skip, Take, OrderBy")?
Hope we will be able to help you.
You can also contact us using our forums or contact form.
Are you absolutely certain the ordering is off? What does the SQL look like?
Can you reorder your code as follows and post the output?
// Redefine your queries.
var query = ctx.EntitySet.Where(/* filter */).OrderBy(e => e.ChangedDate);
var skipped = query.Skip(n).Take(x);
// let's look at the SQL, shall we?
var querySQL = query.ToTraceString();
var skippedSQL = skipped.ToTraceString();
// actual execution of the queries...
int total = query.Count();
var result = skipped.ToList();
Edit:
I'm absolutely certain. You can check my "edit" to see trace result of my query with skipped trace result that is imperative in this case. Count is not really important.
Yeah, I see it. Wow, that's a stumper. Might even be an outright bug. I note you're not using SQL Server... what DB are you using? Looks like it might be MySQl.
One way:
var query = ctx.EntitySet.Where(/* filter */).OrderBy(/* expression */).ToList();
int total = query.Count;
var result = query.Skip(n).Take(x).ToList();
Convert it to a List before skipping. It's not too efficient, mind you...

linq-to-sql How can I get a few rows that don't match my existing rows?

I have a few rows of data pulled into business objects via linq-to-sql from large tables.
Now I want to get a few rows that don't match to test my comparison functions.
Using what I thought would work I get a NotSupportedException:
Local sequence cannot be used in LINQ to SQL implementation of query operators except the Contains() operator.
Here's the code:
//This table has a 2 field primary key, the other has a single
var AllNonMatches = from c in dc.Acaps
where !Matches.Rows.Any((row) => row.Key.Key == c.AppId & row.Key.Value == c.SeqNbr)
select c;
foreach (var item in AllNonMatches.Take(100)) //Exception here
{}
The table has a compound primary key: AppId and SeqNbr.
The Matches.Rows is defined as a dictionary of keyvaluepair(appid,seqnbr).
and the local sequence it is referring to appears to be the local dictionary.
Could you provide more information on the structure and the name(s) of the table(s) plz?
Not sure what you're trying to do...
edit:
Ok.. I think I get it now...
It appears you can't merge/join local tables (dictionary) with a SQL table.
If you can, I'm afraid I don't know how to do it.
The simplest solution I can think of is to put those results in a table ("Match" for instance) with foreign keys related to your table "Acaps" and then use linq-to-sql, like:
var AllNonMatches = dc.Acaps.Where(p=>p.Matchs==null).Take(100).ToList();
Sorry I couldn't come up with any better =(
What about this:
var AllNonMatches = from c in dc.Acaps
where !(Matches.Rows.ContainsKey(c.AppId) && Matches.Rows.ContainsValue(c.SeqNbr))
select c;
That will work fine. I have also used a bitwise AND operator (&&) - I think thats the right term to help improve performance over the standard AND operator.