Can we control LINQ expression order with Skip(), Take() and OrderBy()

Can we control LINQ expression order with Skip(), Take() and OrderBy() - mysql

I'm using LINQ to Entities to display paged results. But I'm having issues with the combination of Skip(), Take() and OrderBy() calls.
Everything works fine, except that OrderBy() is assigned too late. It's executed after result set has been cut down by Skip() and Take().
So each page of results has items in order. But ordering is done on a page handful of data instead of ordering of the whole set and then limiting those records with Skip() and Take().
How do I set precedence with these statements?
My example (simplified)
var query = ctx.EntitySet.Where(/* filter */).OrderByDescending(e => e.ChangedDate);
int total = query.Count();
var result = query.Skip(n).Take(x).ToList();
One possible (but a bad) solution
One possible solution would be to apply clustered index to order by column, but this column changes frequently, which would slow database performance on inserts and updates. And I really don't want to do that.
EDIT
I ran ToTraceString() on my query where we can actually see when order by is applied to the result set. Unfortunately at the end. :(
SELECT
-- columns
FROM (SELECT
-- columns
FROM (SELECT -- columns
FROM ( SELECT
-- columns
FROM table1 AS Extent1
WHERE EXISTS (SELECT
-- single constant column
FROM table2 AS Extent2
WHERE (Extent1.ID = Extent2.ID) AND (Extent2.userId = :p__linq__4)
)
) AS Project2
limit 0,10 ) AS Limit1
LEFT OUTER JOIN (SELECT
-- columns
FROM table2 AS Extent3 ) AS Project3 ON Limit1.ID = Project3.ID
UNION ALL
SELECT
-- columns
FROM (SELECT -- columns
FROM ( SELECT
-- columns
FROM table1 AS Extent4
WHERE EXISTS (SELECT
-- single constant column
FROM table2 AS Extent5
WHERE (Extent4.ID = Extent5.ID) AND (Extent5.userId = :p__linq__4)
)
) AS Project6
limit 0,10 ) AS Limit2
INNER JOIN table3 AS Extent6 ON Limit2.ID = Extent6.ID) AS UnionAll1
ORDER BY UnionAll1.ChangedDate DESC, UnionAll1.ID ASC, UnionAll1.C1 ASC

My workaround solution
I've managed to workaround this problem. Don't get me wrong here. I haven't solved precedence issue as of yet, but I've mitigated it.
What I did?
This is the code I've used until I get an answer from Devart. If they won't be able to overcome this issue I'll have to use this code in the end.
// get ordered list of IDs
List<int> ids = ctx.MyEntitySet
.Include(/* Related entity set that is needed in where clause */)
.Where(/* filter */)
.OrderByDescending(e => e.ChangedDate)
.Select(e => e.Id)
.ToList();
// get total count
int total = ids.Count;
if (total > 0)
{
// get a single page of results
List<MyEntity> result = ctx.MyEntitySet
.Include(/* related entity set (as described above) */)
.Include(/* additional entity set that's neede in end results */)
.Where(string.Format("it.Id in {{{0}}}", string.Join(",", ids.ConvertAll(id => id.ToString()).Skip(pageSize * currentPageIndex).Take(pageSize).ToArray())))
.OrderByDescending(e => e.ChangedOn)
.ToList();
}
First of all I'm getting ordered IDs of my entities. Getting only IDs is well performant even with larger set of data. MySql query is quite simple and performs really well. In the second part I partition these IDs and use them to get actual entity instances.
Thinking of it, this should perform even better than the way I was doing it at the beginning (as described in my question), because getting total count is much much quicker due to simplified query. The second part is practically very very similar, except that my entities are returned rather by their IDs instead of partitioned using Skip and Take...
Hopefully someone may find this solution helpful.

I haven't worked directly with Linq to Entities, but it should have a way to hook specific stored procedures into certain locations when needed. (Linq to SQL did.) If so, you could turn this query into a stored procedure, doing exacly what is required, and doing it efficiently.

Assuming from you comment the persisting the values in a List is not acceptable:
There's no way to completely minimize the iterations, as you intended (and as I would have tried too, living in hope). Cutting the iterations down by one would be nice. Is it possible to just get the Count once and cache/session it? Then you could:
int total = ctx.EntitySet.Count; // Hopefully you can not repeat doing this.
var result = ctx.EntitySet.Where(/* filter */).OrderBy(/* expression */).Skip(n).Take(x).ToList();
Hopefully you can cache the Count somehow, or avoid needing it every time. Even if you can't, this is the best you can do.

Could you please create a sample illusrating the problem and send it to us (support * devart * com, subject "EF: Skip, Take, OrderBy")?
Hope we will be able to help you.
You can also contact us using our forums or contact form.

Are you absolutely certain the ordering is off? What does the SQL look like?
Can you reorder your code as follows and post the output?
// Redefine your queries.
var query = ctx.EntitySet.Where(/* filter */).OrderBy(e => e.ChangedDate);
var skipped = query.Skip(n).Take(x);
// let's look at the SQL, shall we?
var querySQL = query.ToTraceString();
var skippedSQL = skipped.ToTraceString();
// actual execution of the queries...
int total = query.Count();
var result = skipped.ToList();
Edit:
I'm absolutely certain. You can check my "edit" to see trace result of my query with skipped trace result that is imperative in this case. Count is not really important.
Yeah, I see it. Wow, that's a stumper. Might even be an outright bug. I note you're not using SQL Server... what DB are you using? Looks like it might be MySQl.

One way:
var query = ctx.EntitySet.Where(/* filter */).OrderBy(/* expression */).ToList();
int total = query.Count;
var result = query.Skip(n).Take(x).ToList();
Convert it to a List before skipping. It's not too efficient, mind you...

Related

Statement in trigger is not "picking up" the condition in its Where clause

so I'm currently working on a MySQL trigger. I'm trying to assign values to two variables when a new record is inserted. Below are the queries:
SET mssgDocNo = (SELECT Document_ID FROM CORE_MSSG WHERE Message_ID = new.MSSG_ID);
SET mssgRegime = (SELECT CONCAT (Regime_Type, Regime_Code) FROM T_DOC WHERE CD_Message_ID = new.MSSG_ID);;
For some reason, the second SQL query is not picking up the 'new.MSSG_ID' condition while the first query in same trigger recognizes it. I really can't figure out what seems to be the problem.
When I replace the 'new.MSSG_ID' with a hard-coded value from the database in the second query it seems to work. I doubt the 'new.MSSG_ID' is the problem because it works perfectly fine in the first query.
I've tried pretty much anything I could think of. Would appreciate the help.

I would write these more simply as:
SELECT mssgDocNo := Document_ID
FROM CORE_MSSG
WHERE Message_ID = new.MSSG_ID;
SELECT mssgRegime := CONCAT(Regime_Type, Regime_Code)
FROM T_DOC
WHERE CD_Message_ID = new.MSSG_ID;
The SET is not necessary.
I did make one other change that might make it work. I removed the space after CONCAT. Some versions of MySQL have trouble parsing spaces after function calls.

MySQL order by problems

I have the following codes..
echo "<form><center><input type=submit name=subs value='Submit'></center></form>";
$val=$_POST['resulta']; //this is from a textarea name='resulta'
if (isset($_POST['subs'])) //from submit name='subs'
{
$aa=mysql_query("select max(reservno) as 'maxr' from reservation") or die(mysql_error()); //select maximum reservno
$bb=mysql_fetch_array($aa);
$cc=$bb['maxr'];
$lines = explode("\n", $val);
foreach ($lines as $line) {
mysql_query("insert into location_list (reservno, location) values ('$cc', '$line')")
or die(mysql_error()); //insert value of textarea then save it separately in location_list if \n is found
}
If I input the following data on the textarea (assume that I have maximum reservno '00014' from reservation table),
Davao - Cebu
Cebu - Davao
then submit it, I'll have these data in my location_list table:
loc_id || reservno || location
00001 || 00014 || Davao - Cebu
00002 || 00014 || Cebu - Davao
Then this code:
$gg=mysql_query("SELECT GROUP_CONCAT(IF((#var_ctr := #var_ctr + 1) = #cnt,
location,
SUBSTRING_INDEX(location,' - ', 1)
)
ORDER BY loc_id ASC
SEPARATOR ' - ') AS locations
FROM location_list,
(SELECT #cnt := COUNT(1), #var_ctr := 0
FROM location_list
WHERE reservno='$cc'
) dummy
WHERE reservno='$cc'") or die(mysql_error()); //QUERY IN QUESTION
$hh=mysql_fetch_array($gg);
$ii=$hh['locations'];
mysql_query("update reservation set itinerary = '$ii' where reservno = '$cc'")
or die(mysql_error());
is supposed to update reservation table with 'Davao - Cebu - Davao' but it's returning this instead, 'Davao - Cebu - Cebu'. I was previously helped by this forum to have this code working but now I'm facing another difficulty. Just can't get it to work. Please help me. Thanks in advance!

I got it working (without ORDER BY loc_id ASC) as long as I set phpMyAdmin operations loc_id ascending. But whenever I delete all data, it goes back as loc_id descending so I have to reset it. It doesn't entirely solve the problem but I guess this is as far as I can go. :)) I just have to make sure that the table column loc_id is always in ascending order. Thank you everyone for your help! I really appreciate it! But if you have any better answer, like how to set the table column always in ascending order or better query, etc, feel free to post it here. May God bless you all!

The database server is allowed to rewrite your query to optimize its execution. This might affect the order of the individual parts, in particular the order in which the various assignments are executed. I assume that some such reodering causes the result of the query to become undefined, in such a way that it works on sqlfiddle but not on your actual production system.
I can't put my finger on the exact location where things go wrong, but I believe that the core of the problem is the fact that SQL is intended to work on relations, but you try to abuse it for sequential programming. I suggest you retrieve the data from the database using portable SQL without any variable hackery, and then use PHP to perform any post-processing you might need. PHP is much better suited to express the ideas you're formulating, and no optimization or reordering of statements will get in your way there. And as your query currently only results in a single value, fetching multiple rows and combining them into a single value in the PHP code shouldn't increase complexety too much.
Edit:
While discussing another answer using a similar technique (by Omesh as well, just as the answer your code is based upon), I found this in the MySQL manual:
As a general rule, you should never assign a value to a user variable
and read the value within the same statement. You might get the
results you expect, but this is not guaranteed. The order of
evaluation for expressions involving user variables is undefined and
may change based on the elements contained within a given statement;
in addition, this order is not guaranteed to be the same between
releases of the MySQL Server.
So there are no guarantees about the order these variable assignments are evaluated, therefore no guarantees that the query does what you expect. It might work, but it might fail suddenly and unexpectedly. Therefore I strongly suggest you avoid this approach unless you have some relaibale mechanism to check the validity of the results, or really don't care about whether they are valid.

Linq2sql Optimizing Left join to get items that exist in only in 1 container

I want to get items from one container that don't exist in another. One container is IEnumerable, and another is an entity in DB. For example
IEnumberable<int> ids = new List<int>();
ids.Add(1);
ids.Add(2);
ids.Add(3);
using (MyObjectContext ctx = new MyObjectContext())
{
var filtered_ids = ids.Except(from u in ctx.Users select u.id);
}
This approach works, but I realized that underlying sql is something like SELECT id FROM [Users]. That is not what I want. Changing it to
var filtered_ids = ids.Except(from u in ctx.Users
where ids.Contains(u.id)
select u.id);
improves underlying query and adds WHERE [id] IN (...) which seems a way better.
I have 2 questions:
Is it possible to improve performance any further for this query?
As far as I remember there is a limit on how many parameters can be in IN . Will my current query work if I exceed the limit (which is not very likely to happen, but it's better to be prepare) ?

That query should be fine, provided proper indexes/primary keys are in place.
The upper limit on sql parameters accepted by sql server is around 2100. If you exceed the limit, you will be met with a sql exception instead of results.

Union (or Concat, etc..) with Constant values and projection

I've discovered a very nasty gotcha with Linq-to-sql, and i'm not sure what the best solution is.
If you take a simple L2S Union statement, and include L2S code in one side, and constants in the other, then the constants do not get included in the SQL Union and are only projected into the output after the SQL, resulting in SQL errors about the number of columns not mathching for the union.
As an example:
(from d in dc.mytable where foo == "bar" select new {First = d.Foo, Second = d.Roo})
.Union(from e in dc.mytable where foo == "roo" select new {First= "", Second = e.Roo})
This will generate an error "All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.
This is particularly insidious (and maddening) because there are obviously the same number of expressions in the list, but when you look at the SQL, you will notice that it does not generate a column for "First" in the second half of the Union. This is because "First" is inserted into the projection AFTER the query.
Ok, the easy solution is to just convert each part into Enumerables or Lists or something and then do the union in memory rather than SQL, and that's fine if you're dealing with a small amount of data. However, if you're working with a large set of data, which you then plan to further filter (in sql) before returning it this is not ideal.
I guess what i'm looking for is a way to force L2S to include the column in the SQL. Is that possible?
UPDATE:
While not an exact duplicate, this error is similar to This Question and has similar solutions. So i'm closing, but not deleting this question because it may help someone else come to posible solutions from a different way.
Unfortunately, L2S is too smart of it's own good sometimes.

I've decided that the only real solution is to use a stored proc. Hope this helps.

This is a bug in the Linq2SQL provider.
In LinqPad you can clearly see the bug.
(from d in dc.mytable where foo == "bar" select new {First = d.Foo, Second = d.Roo})
.Union(from e in dc.mytable where foo == "roo" select new {First= "", Second = e.Roo})
Will server side produce something like this:
SELECT [t2].[Foo], [t2].[Roo]
FROM (
SELECT [t0].[Foo], #p0 AS [value]
FROM [dc].[Mytable] AS [t0]
UNION ALL
SELECT [t1].[Foo], [t1].[Roo]
FROM [dc].[Mytable] AS [t1]
) AS [t2]
This will be a problem because the union will name the second column "value" instead of "Roo", which will cause the outer query to fail.
If you, however, switch the order of the two tables
(from e in dc.mytable where foo == "roo" select new {First= "", Second = e.Roo})
.Union(from d in dc.mytable where foo == "bar" select new {First = d.Foo, Second = d.Roo})
So that the constant assignment within the generated T-SQL comes in the non-first table, then things may work because T-SQL ignores the column names of subsequent tables.
Note: The first table in a union decides both column name and type. So would be smart to get LinqPad anyway.

When using skip and take to page data, how can I get the total record count without a separate query?

I don't see how this is possible, but I really, really hate to run my query an extra time just to get the record count so I can build a pager. When I say a "pager" I simply mean the common gizmo with a link for each 10 records for example.

Assuming you are building a query in the selecting event, the best you could do is construct the full query, get and save the count, then take or skip it into the e.result.
And by best I mean, the easiest read code from a single query, rather than two. You'll still be running two separate evaluations on the database though. Use query analyser to see if the statements are a 'Select Count' then a 'Select take' or a dirty big select pared down by LINQ after the retrieve. I think LINQ does the former.

As far as I know it is not possible to return the total count and the items retrieved by skip and take at the same time.

I wrote a custom data source control and view, which caches the count for a short duration. I invalidate the cache whenever the criteria changes that would affect the number of results, but not when the data is paged, or when the data is sorted for instance.

I was concerned about this same question. Here is are the results of my experimenting in Linqpad--the actual behavior of Linq does not create a full new query to the SQL server:
This simple test, in one of my development databases:
var query = from p in HrPersons select p;
var x = query.Skip(20).Take(10).Dump();
var t = query.Count().Dump();
generates the following actual SQL queries:
SELECT [t1].[company] AS [Company], [t1].[processLevel] AS [ProcessLevel], [t1].[emplId] AS [EmplId], [t1].[sn] AS [Sn], [t1].[givenName] AS [GivenName], [t1].[middleInitial] AS [MiddleInitial], [t1].[nickName] AS [NickName], [t1].[formerName] AS [FormerName], [t1].[ssn] AS [Ssn], [t1].[cn] AS [Cn], [t1].[costCenter] AS [CostCenter], [t1].[title] AS [Title], [t1].[status] AS [Status], [t1].[batch] AS [Batch], [t1].[rowversion] AS [Rowversion], [t1].[id] AS [Id], [t1].[source] AS [Source]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t0].[company], [t0].[processLevel], [t0].[emplId], [t0].[sn], [t0].[givenName], [t0].[middleInitial], [t0].[nickName], [t0].[formerName], [t0].[ssn], [t0].[cn], [t0].[costCenter], [t0].[title], [t0].[status], [t0].[batch], [t0].[rowversion], [t0].[id], [t0].[source]) AS [ROW_NUMBER], [t0].[company], [t0].[processLevel], [t0].[emplId], [t0].[sn], [t0].[givenName], [t0].[middleInitial], [t0].[nickName], [t0].[formerName], [t0].[ssn], [t0].[cn], [t0].[costCenter], [t0].[title], [t0].[status], [t0].[batch], [t0].[rowversion], [t0].[id], [t0].[source]
FROM [HrPerson] AS [t0]
) AS [t1]
WHERE [t1].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p0 + #p1
ORDER BY [t1].[ROW_NUMBER]
GO
SELECT COUNT(*) AS [value]
FROM [HrPerson] AS [t0]
So while there is a second SQL query, it is a trivial one that only requests the total count. I believe this is reasonable and acceptable as a pattern.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008