Optimize Linq query with PredicateBuilder with N-N join - linq-to-sql

I'm using Linq to query MS CRM 2011 Web Services. I've got a query that results in very poor SQL, it fetches too much intermediary data and its performance is horrible!! I'm new to it, so it may very well be the way I'm using it...
I've got two entities linked via an N-N relationship: Product and SalesLink. I want to recover a bunch of Product from their SerialNumber along with all SalesLink associated to them.
This is the query I have using PredicateBuilder:
// Build inner OR predicate on Serial Number list
var innerPredicate = PredicateBuilder.False<Xrm.c_product>();
foreach (string sn in serialNumbers) {
string temp = sn; // This temp assignement is important!
innerPredicate = innerPredicate.Or(p => p.c_SerialNumber == temp);
}
// Combine predicate with outer AND predicate
var predicate = PredicateBuilder.True<Xrm.c_product>();
predicate = predicate.And(innerPredicate);
predicate = predicate.And(p => p.statecode == (int)CrmStateValueType.Active);
// Inner Join Query
var prodAndLinks = from p in orgContext.CreateQuery<Xrm.c_product>().AsExpandable()
.Where(predicate)
.AsEnumerable()
join link in orgContext.CreateQuery<Xrm.c_saleslink>()
on p.Id equals link.c_ProductSalesLinkId.Id
where link.statecode == (int)CrmStateValueType.Active
select new {
productId = p.Id
, productSerialNumber = p.c_SerialNumber
, accountId = link.c_Account.Id
, accountName = link.c_Account.Name
};
...
Using SQL profiler, I saw that it causes an intermediate SQL query that has no WHERE clause, looking like this:
select
top 5001 "c_saleslink0".statecode as "statecode"
...
, "c_saleslink0".ModifiedOnBehalfByName as "modifiedonbehalfbyname"
, "c_saleslink0".ModifiedOnBehalfByYomiName as "modifiedonbehalfbyyominame"
from
c_saleslink as "c_saleslink0" order by
"c_saleslink0".c_saleslinkId asc
This returns a huge amount of (useless) data. I think the join is done on the client side instead of on the DB side...
How should I improve this query? I runs in around 3 minutes and that's totally unacceptable.
Thanks.
"Solution"
Based on Daryl's answer to use QueryExpression instead of Linq to CRM, I got this which gets the exact same result.
var qe = new QueryExpression("c_product");
qe.ColumnSet = new ColumnSet("c_serialnumber");
var filter = qe.Criteria.AddFilter(LogicalOperator.Or);
filter.AddCondition("c_serialnumber", ConditionOperator.In, serialNumbers.ToArray());
var link = qe.AddLink("c_saleslink", "c_productid", "c_productsaleslinkid");
link.LinkCriteria.AddCondition("statecode", ConditionOperator.Equal, (int)CrmStateValueType.Active);
link.Columns.AddColumns("c_account");
var entities = serviceProxy.RetrieveMultiple(qe).Entities.ToList();;
var prodAndLinks = entities.Select(x => x.ToEntity<Xrm.c_product>()).Select(x =>
new {
productId = x.c_productId
, productSerialNumber = x.c_SerialNumber
, accountId = ((Microsoft.Xrm.Sdk.EntityReference)((Microsoft.Xrm.Sdk.AliasedValue)x["c_saleslink1.c_account"]).Value).Id
, accountName = ((Microsoft.Xrm.Sdk.EntityReference)((Microsoft.Xrm.Sdk.AliasedValue)x["c_saleslink1.c_account"]).Value).Name
}).ToList();
I really would have liked to find a solution using Linq, but it seems to Linq to CRM is just not there yet...

95% of the time when you're having performance issues with a complicated query in CRM, the easiest way to improve the performance is to run a straight SQL query against the database (assuming this is not CRM online of course). This may be one of the 5% of the time.
In your case, the major performance issue you're experiencing is due to the predicate builder forcing a CRM Server (not the SQL database) side join of data. If you used a Query Expression (which is what your link statement get's translated) you can specify a Condition Expression with an IN operator that would allow you to pass in your serialNumbers collection. You could also use FetchXml as well. Both of these methods would allow CRM to perform a SQL side join.
Edit:
This should get you 80% of the way with Query Expressions:
IOrganizationService service = GetService();
var qe = new QueryExpression("c_product");
var filter = qe.Criteria.AddFilter(LogicalOperator.Or);
filter.AddCondition("c_serialnumber", ConditionOperator.In, serialNumbers.ToArray());
var link = qe.AddLink("c_saleslink", "c_productid", "c_productsaleslinkid");
link.LinkCriteria.AddCondition("statecode", ConditionOperator.Equal, (int)CrmStateValueType.Active);
link.Columns.AddColumns("c_Account");
var entities = service.RetrieveMultiple(qe).Entities.ToList();

You will probably find you can get better control by not using Linq to Crm. You could try:
FetchXml, this is an xml syntax, similar in approach to tsql MSDN.
QueryExpression, MSDN.
You could issue a RetrieveRequest, blog.

Related

Do 2 queries need to be submitted to the database to get total count and paged section using EF / Linq to Entities? [duplicate]

Currently when I need to run a query that will be used w/ paging I do it something like this:
//Setup query (Typically much more complex)
var q = ctx.People.Where(p=>p.Name.StartsWith("A"));
//Get total result count prior to sorting
int total = q.Count();
//Apply sort to query
q = q.OrderBy(p => p.Name);
q.Select(p => new PersonResult
{
Name = p.Name
}.Skip(skipRows).Take(pageSize).ToArray();
This works, but I wondered if it is possible to improve this to be more efficient while still using linq? I couldn't think of a way to combine the count w/ the data retrieval in a single trip to the DB w/o using a stored proc.
The following query will get the count and page results in one trip to the database, but if you check the SQL in LINQPad, you'll see that it's not very pretty. I can only imagine what it would look like for a more complex query.
var query = ctx.People.Where (p => p.Name.StartsWith("A"));
var page = query.OrderBy (p => p.Name)
.Select (p => new PersonResult { Name = p.Name } )
.Skip(skipRows).Take(pageSize)
.GroupBy (p => new { Total = query.Count() })
.First();
int total = page.Key.Total;
var people = page.Select(p => p);
For a simple query like this, you could probably use either method (2 trips to the database, or using GroupBy to do it in 1 trip) and not notice much difference. For anything complex, I think a stored procedure would be the best solution.
Jeff Ogata's answer can be optimized a little bit.
var results = query.OrderBy(p => p.Name)
.Select(p => new
{
Person = new PersonResult { Name = p.Name },
TotalCount = query.Count()
})
.Skip(skipRows).Take(pageSize)
.ToArray(); // query is executed once, here
var totalCount = results.First().TotalCount;
var people = results.Select(r => r.Person).ToArray();
This does pretty much the same thing except it won't bother the database with an unnecessary GROUP BY. When you are not certain your query will contain at least one result, and don't want it to ever throw an exception, you can get totalCount in the following (albeit less cleaner) way:
var totalCount = results.FirstOrDefault()?.TotalCount ?? query.Count();
Important Note for People using EF Core >= 1.1.x && < 3.0.0:
At the time I was looking for solution to this and this page is/was Rank 1 for the google term "EF Core Paging Total Count".
Having checked the SQL profiler I have found EF generates a SELECT COUNT(*) for every row that is returned. I have tired every solution provided on this page.
This was tested using EF Core 2.1.4 & SQL Server 2014. In the end I had to perform them as two separate queries like so. Which, for me at least, isn't the end of the world.
var query = _db.Foo.AsQueryable(); // Add Where Filters Here.
var resultsTask = query.OrderBy(p => p.ID).Skip(request.Offset).Take(request.Limit).ToArrayAsync();
var countTask = query.CountAsync();
await Task.WhenAll(resultsTask, countTask);
return new Result()
{
TotalCount = await countTask,
Data = await resultsTask,
Limit = request.Limit,
Offset = request.Offset
};
It looks like the EF Core team are aware of this:
https://github.com/aspnet/EntityFrameworkCore/issues/13739
https://github.com/aspnet/EntityFrameworkCore/issues/11186
I suggest making two queries for the first page, one for the total count and one for the first page or results.
Cache the total count for use as you move beyond the first page.

Linq to Sql: Join, why do I need to load a collection

I have 2 tables that I need to load together all the time, the both must exist together in the database. However I am wondering why Linq to Sql demands that I have to load in a collection and then do a join, I only want to join 2 single tables where a record where paramid say = 5, example...
var data = _repo.All<TheData>(); //why do I need a collection/IQueryable like this?
var _workflow = _repo.All<WorkFlow>()
.Where(x => x.WFID== paramid)
.Join(data, x => x.ID, y => y.WFID, (x, y) => new
{
data = x,
workflow = y
});
I gues then I need to do a SingleOrDefault()? If the record is not null pass it back?
I Understand the Sql query comes out correctly, is there a better way to write this?
NOTE: I need to search a table called Participants to see if the loggedonuser can actually view this record, so I guess I should leave it as this? (this is main requirement)
var participant = _repo.All<Participants>();
.Any(x=> x.ParticipantID == loggedonuser.ID); //add this to above query...
The line var data = _repo.All<TheData>(); is something like saying 'start building query against the TheData table'.
This function returns you an IQueryable which will contain a definition of the query against your database.
So this doesn't mean you load the whole TheData table data with this line!
The query will be executed the moment you do something like .Count(), .Any(), First(), Single(), or ToList(). This is called deferred execution.
If you would end your query with SingleOrDefault() this will create a sql query that joins the two tables, add the filter and select the top most record or null(or throw an error if there are more!).
You could also use Linq instead of query extension methods.
It would look like:
var data = _repo.All<TheData>();
var _workflow = from w in _repo.All<WorkFlow>()
join t in _repo.All<TheData> on w.Id equals t.WFID
where x.WIFD = paramid
select new
{
data = t,
workflow = x
});

Could not format node 'Value' for execution as SQL

I've stumbled upon a very strange LINQ to SQL behaviour / bug, that I just can't understand.
Let's take the following tables as an example: Customers -> Orders -> Details.
Each table is a subtable of the previous table, with a regular Primary-Foreign key relationship (1 to many).
If I execute the follow query:
var q = from c in context.Customers
select (c.Orders.FirstOrDefault() ?? new Order()).Details.Count();
Then I get an exception: Could not format node 'Value' for execution as SQL.
But the following queries do not throw an exception:
var q = from c in context.Customers
select (c.Orders.FirstOrDefault() ?? new Order()).OrderDateTime;
var q = from c in context.Customers
select (new Order()).Details.Count();
If I change my primary query as follows, I don't get an exception:
var q = from r in context.Customers.ToList()
select (c.Orders.FirstOrDefault() ?? new Order()).Details.Count();
Now I could understand that the last query works, because of the following logic:
Since there is no mapping of "new Order()" to SQL (I'm guessing here), I need to work on a local list instead.
But what I can't understand is why do the other two queries work?!?
I could potentially accept working with the "local" version of context.Customers.ToList(), but how to speed up the query?
For instance in the last query example, I'm pretty sure that each select will cause a new SQL query to be executed to retrieve the Orders. Now I could avoid lazy loading by using DataLoadOptions, but then I would be retrieving thousands of Order rows for no reason what so ever (I only need the first row)...
If I could execute the entire query in one SQL statement as I would like (my first query example), then the SQL engine itself would be smart enough to only retrieve one Order row for each Customer...
Is there perhaps a way to rewrite my original query in such a way that it will work as intended and be executed in one swoop by the SQL server?
EDIT:
(longer answer for Arturo)
The queries I provided are purely for example purposes. I know they are pointless in their own right, I just wanted to show a simplistic example.
The reason your example works is because you have avoided using "new Order()" all together. If I slightly modify your query to still use it, then I still get an exception:
var results = from e in (from c in db.Customers
select new { c.CustomerID, FirstOrder = c.Orders.FirstOrDefault() })
select new { e.CustomerID, Count = (e.FirstOrder != null ? e.FirstOrder : new Order()).Details().Count() }
Although this time the exception is slightly different - Could not format node 'ClientQuery' for execution as SQL.
If I use the ?? syntax instead of (x ? y : z) in that query, I get the same exception as I originaly got.
In my real-life query I don't need Count(), I need to select a couple of properties from the last table (which in my previous examples would be Details). Essentially I need to merge values of all the rows in each table. Inorder to give a more hefty example I'll first have to restate my tabels:
Models -> ModelCategoryVariations <- CategoryVariations -> CategoryVariationItems -> ModelModuleCategoryVariationItemAmounts -> ModelModuleCategoryVariationItemAmountValueChanges
The -> sign represents a 1 -> many relationship. Do notice that there is one sign that is the other way round...
My real query would go something like this:
var q = from m in context.Models
from mcv in m.ModelCategoryVariations
... // select some more tables
select new
{
ModelId = m.Id,
ModelName = m.Name,
CategoryVariationName = mcv.CategoryVariation.Name,
..., // values from other tables
Categories = (from cvi in mcv.CategoryVariation.CategoryVariationItems
let mmcvia = cvi.ModelModuleCategoryVariationItemAmounts.SingleOrDefault(mmcvia2 => mmcvia2.ModelModuleId == m.ModelModuleId) ?? new ModelModuleCategoryVariationItemAmount()
select new
{
cvi.Id,
Amount = (mmcvia.ModelModuleCategoryVariationItemAmountValueChanges.FirstOrDefault() ?? new ModelModuleCategoryVariationItemAmountValueChange()).Amount
... // select some more properties
}
}
This query blows up at the line let mmcvia =.
If I recall correctly, by using let mmcvia = new ModelModuleCategoryVariationItemAmount(), the query would blow up at the next ?? operand, which is at Amount =.
If I start the query with from m in context.Models.ToList() then everything works...
Why are you looking into only the individual count without selecting anything related to the customer.
You can do the following.
var results = from e in
(from c in db.Customers
select new { c.CustomerID, FirstOrder = c.Orders.FirstOrDefault() })
select new { e.CustomerID, DetailCount = e.FirstOrder != null ? e.FirstOrder.Details.Count() : 0 };
EDIT:
OK, I think you are over complicating your query.
The problem is that you are using the new WhateverObject() in your query, T-SQL doesnt know anyting about that; T-SQL knows about records in your hard drive, your are throwing something that doesn't exist. Only C# knows about that. DON'T USE new IN YOUR QUERIES OTHER THAN IN THE OUTER MOST SELECT STATEMENT because that is what C# will receive, and C# knows about creating new instances of objects.
Of course is going to work if you use ToList() method, but performance is affected because now you have your application host and sql server working together to give you the results and it might take many calls to your database instead of one.
Try this instead:
Categories = (from cvi in mcv.CategoryVariation.CategoryVariationItems
let mmcvia =
cvi.ModelModuleCategoryVariationItemAmounts.SingleOrDefault(
mmcvia2 => mmcvia2.ModelModuleId == m.ModelModuleId)
select new
{
cvi.Id,
Amount = mmcvia != null ?
(mmcvia.ModelModuleCategoryVariationItemAmountValueChanges.Select(
x => x.Amount).FirstOrDefault() : 0
... // select some more properties
}
Using the Select() method allows you to get the first Amount or its default value. I used "0" as an example only, I dont know what is your default value for Amount.

Populate JOIN into a list in one database query

I am trying to get the records from the 'many' table of a one-to-many relationship and add them as a list to the relevant record from the 'one' table.
I am also trying to do this in a single database request.
Code derived from Linq to Sql - Populate JOIN result into a List almost achieves the intended result, but makes one database request per entry in the 'one' table which is unacceptable. That failing code is here:
var res = from variable in _dc.GetTable<VARIABLE>()
select new { x = variable, y = variable.VARIABLE_VALUEs };
However if I do a similar query but loop through all the results, then only a single database request is made. This code achieves all goals:
var res = from variable in _dc.GetTable<VARIABLE>()
select variable;
List<GDO.Variable> output = new List<GDO.Variable>();
foreach (var v2 in res)
{
List<GDO.VariableValue> values = new List<GDO.VariableValue>();
foreach (var vv in v2.VARIABLE_VALUEs)
{
values.Add(VariableValue.EntityToGDO(vv));
}
output.Add(EntityToGDO(v2));
output[output.Count - 1].VariableValues = values;
}
However the latter code is ugly as hell, and it really feels like something that should be do-able in a single linq query.
So, how can this be done in a single linq query that makes only a single database query?
In both cases the table is set to preload using the following code:
_dc = _db.CreateLinqDataContext();
var loadOptions = new DataLoadOptions();
loadOptions.LoadWith<VARIABLE>(v => v.VARIABLE_VALUEs);
_dc.LoadOptions = loadOptions;
I am using .NET 3.5, and the database back-end was generated using SqlMetal.
This link may help
http://msdn.microsoft.com/en-us/vcsharp/aa336746.aspx
Look under join operators. You'll probably have to change from using extension syntax other syntax too. Like this,
var = from obj in dc.Table
from obj2 in dc.Table2
where condition
select

Is there any way to rewrite this LINQ query so that it only does one SQL query?

I have a LINQ query which is fetching a list of nested objects.
from c in ClientsRepository.LoadAll()
orderby c.Name
select new ComboBoxOptionGroup
{
Text = c.Name,
Options = from p in c.Projects
orderby p.Name
select new ComboBoxOption
{
Text = p.Name,
Value = p.ID.ToString()
}
}
Unfortunately this LINQ query results in num(clients)+1 SQL queries. Is there any elegant way of rewriting this so that it only results in one SQL query?
My idea would be to fetch all projects ordered by clients and to do the rest of the work in two nested foreach loops, but that seems to defy the elegance that was intended when LINQ was designed. Are there any better options?
If ClientsRepository.LoadAll() is a wrapper around a query, that actually goes to the database and retrieves the data (does not return IQueryable<T>) then you are stuck. If it returns a part of a query (does return IQueryable<T>) then it should be possible.
However without knowing where the data (or object) context is and your model it is hard.
Normally I would expect to do something like:
from client in dataContext.Clients
join p in dataContext.Projects on p.ClientId equals c.Id
into projects
orderby client.Name
select new ComboBoxOptionGroup {
Text = client.Name,
Option = from p in projects
orderby p.Name
select new ComboBoxOption {
Text = p.Name,
Value = p.ID.ToString()
}
}
Another possible alternative would be to use DataLoadOptions.
DataLoadOptions dlo = new DataLoadOptions();
dlo.LoadWith<Client>(c => c.Projects);
context.DeferredLoadingEnabled = false; // You may not need/want this line
context.LoadOptions = dlo;
I haven't tested this way of doing it, but you'll hopefully be able to leave your query alone and reduce the number of necessary queries.
You propably search for "join" operator: http://www.hookedonlinq.com/JoinOperator.ashx