I'm used to EF because it usually works just fine as long as you get to know it better, so you know how to optimize your queries. But.
What would you choose when you know you'll be working with large quantities of data? I know I wouldn't want to use EF in the first place and cripple my application. I would write highly optimised stored procedures and call those to get certain very narrow results (with many joins so they probably won't just return certain entities anyway).
So I'm a bit confused which DAL technology/library I should use? I don't want to use SqlConnection/SqlCommand way of doing it, since I would have to write much more code that's likely to hide some obscure bugs.
I would like to make bug surface as small as possible and use a technology that will accommodate my process not vice-a-versa...
Is there any library that gives me the possibility to:
provide the means of simple SP execution by name
provide automatic materialisation of returned data so I could just provide certain materialisers by means of lambda functions?
like:
List<Person> result = Context.Execute("StoredProcName", record => new Person{
Name = record.GetData<string>("PersonName"),
UserName = record.GetData<string>("UserName"),
Age = record.GetData<int>("Age"),
Gender = record.GetEnum<PersonGender>("Gender")
...
});
or even calling stored procedure that returns multiple result sets etc.
List<Question> result = Context.ExecuteMulti("SPMultipleResults", q => new Question {
Id = q.GetData<int>("QuestionID"),
Title = q.GetData<string>("Title"),
Content = q.GetData<string>("Content"),
Comments = new List<Comment>()
}, c => new Comment {
Id = c.GetData<int>("CommentID"),
Content = c.GetData<string>("Content")
});
Basically this last one wouldn't work, since this one doesn't have any knowledge how to bind both together... but you get the point.
So to put it all down to a single question: Is there a DAL library that's optimised for stored procedure execution and data materialisation?
Business Layer Toolkit might be exactly what's needed here. It's a lightweight ORM tool that supports lots of scenarios including multiple result sets although they seem very complicated to do.
Related
I have been playing about with LINQ-SQL, trying to get re-usable chunks of expressions that I can hot plug into other queries. So, I started with something like this:
Func<TaskFile, double> TimeSpent = (t =>
t.TimeEntries.Sum(te => (te.DateEnded - te.DateStarted).TotalHours));
Then, we can use the above in a LINQ query like the below (LINQPad example):
TaskFiles.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent(t),
})
This produces the expected output, except, a query per row is generated for the plugged expression. This is visible within LINQPad. Not good.
Anyway, I noticed the CompiledQuery.Compile method. Although this takes a DataContext as a parameter, I thought I would include ignore it, and try the same Func. So I ended up with the following:
static Func<UserQuery, TaskFile, double> TimeSpent =
CompiledQuery.Compile<UserQuery, TaskFile, double>(
(UserQuery db, TaskFile t) =>
t.TimeEntries.Sum(te => (te.DateEnded - te.DateStarted).TotalHours));
Notice here, that I am not using the db parameter. However, now when we use this updated parameter, only 1 SQL query is generated. The Expression is successfully translated to SQL and included within the original query.
So my ultimate question is, what makes CompiledQuery.Compile so special? It seems that the DataContext parameter isn't needed at all, and at this point i am thinking it is more a convenience parameter to generate full queries.
Would it be considered a good idea to use the CompiledQuery.Compile method like this? It seems like a big hack, but it seems like the only viable route for LINQ re-use.
UPDATE
Using the first Func within a Where statment, we see the following exception as below:
NotSupportedException: Method 'System.Object DynamicInvoke(System.Object[])' has no supported translation to SQL.
Like the following:
.Where(t => TimeSpent(t) > 2)
However, when we use the Func generated by CompiledQuery.Compile, the query is successfully executed and the correct SQL is generated.
I know this is not the ideal way to re-use Where statements, but it shows a little how the Expression Tree is generated.
Exec Summary:
Expression.Compile generates a CLR method, wheras CompiledQuery.Compile generates a delegate that is a placeholder for SQL.
One of the reasons you did not get a correct answer until now is that some things in your sample code are incorrect. And without the database or a generic sample someone else can play with chances are further reduced (I know it's difficult to provide that, but it's usually worth it).
On to the facts:
Expression<Func<TaskFile, double>> TimeSpent = (t =>
t.TimeEntries.Sum(te => (te.DateEnded - te.DateStarted).TotalHours));
Then, we can use the above in a LINQ query like the below:
TaskFiles.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent(t),
})
(Note: Maybe you used a Func<> type for TimeSpent. This yields the same situation as of you're scenario was as outlined in the paragraph below. Make sure to read and understand it though).
No, this won't compile. Expressions can't be invoked (TimeSpent is an expression). They need to be compiled into a delegate first. What happens under the hood when you invoke Expression.Compile() is that the Expression Tree is compiled down to IL which is injected into a DynamicMethod, for which you get a delegate then.
The following would work:
var q = TaskFiles.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent.Compile().DynamicInvoke()
});
This produces the expected output, except, a query per row is
generated for the plugged expression. This is visible within LINQPad.
Not good.
Why does that happen? Well, Linq To Sql will need to fetch all TaskFiles, dehydrate TaskFile instances and then run your selector against it in memory. You get a query per TaskFile likely because they contains one or multiple 1:m mappings.
While LTS allows projecting in memory for selects, it does not do so for Wheres (citation needed, this is to the best of my knowledge). When you think about it, this makes perfect sense: It is likely you will transfer a lot more data by filtering the whole database in memory, then by transforming a subset of it in memory. (Though it creates query performance issues as you see, something to be aware of when using an ORM).
CompiledQuery.Compile() does something different. It compiles the query to SQL and the delegate it returns is only a placeholder Linq to SQL will use internally. You can't "invoke" this method in the CLR, it can only be used as a node in another expression tree.
So why does LTS generate an efficient query with the CompiledQuery.Compile'd expression then? Because it knows what this expression node does, because it knows the SQL behind it. In the Expression.Compile case, it's just a InvokeExpression that invokes the DynamicMethod as I explained previously.
Why does it require a DataContext Parameter? Yes, it's more convenient for creating full queries, but it's also because the Expression Tree compiler needs to know the Mapping to use for generating the SQL. Without this parameter, it would be a pain to find this mapping, so it's a very sensible requirement.
I'm surprised why you've got no answers on this so far. CompiledQuery.Compile compiles and caches the query. That is why you see only one query being generated.
Not only this is NOT a hack, this is the recommended way!
Check out these MSDN articles for detailed info and example:
Compiled Queries (LINQ to Entities)
How to: Store and Reuse Queries (LINQ to SQL)
Update: (exceeded the limit for comments)
I did some digging in reflector & I do see DataContext being used. In your example, you're simply not using it.
Having said that, the main difference between the two is that the former creates a delegate (for the expression tree) and the latter creates the SQL that gets cached and actually returns a function (sort of). The first two expressions produce the query when you call Invoke on them, this is why you see multiple of them.
If your query doesn't change, but only the DataContext and Parameters, and if you plan to use it repeatedly, CompiledQuery.Compile will help. It is expensive to Compile, so for one off queries, there is no benefit.
TaskFiles.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent(t),
})
This isn't a LinqToSql query, as there is no DataContext instance. Most likely you are querying some EntitySet, which does not implement IQueryable.
Please post complete statements, not statement fragments. (I see invalid comma, no semicolon, no assignment).
Also, Try this:
var query = myDataContext.TaskFiles
.Where(tf => tf.Parent.Key == myParent.Key)
.Select(t => new {
t.TaskId,
TimeSpent = TimeSpent(t)
});
// where myParent is the source of the EntitySet and Parent is a relational property.
// and Key is the primary key property of Parent.
This really is an architectural question. I feel like I'm going about this the wrong way and wanted some input on best practices.
Let's say I have a Transactions table and a TransactionTypes table. Views will submit the appropriate transaction data which is processed in my controller. The problem is that the logic in the controller may be a bit complex and the TransactionType is not provided by the view inputs, but computed in the controller. (Which may be part of my problem).
For example, let's say that the View submits a ViewModel that would map to a TransactionType of "Withdrawal". However, the controller detects that it needs to change this to an Overdraft" as funds aren't sufficient. What I don't want to do is this:
transaction.TypeId =
DataContext.TransactionTypes.Single(x => x.type == "Overdraft").id;
... as I'll be embedding string literals in my code. Right?
OK, so I could map the values to strong types that would allow me to do this:
class TranTypes
{
public const long Deposit = 1;
public const long Withdrawal = 2;
public const long Overdraft = 3;
}
...
transaction.TypeId =
DataContext.TransactionTypes.Single(x => x.id == TranTypes.Overdraft);
Now, if my lookups change in the DB, I have one place that I can update the mappings and my controllers still have insight into the model.
But this feels awkward too.
I feel like what I really want is for the Linq To SQL auto-code generation to be able to generate the association so I can just refer to strongly-typed names (Deposit, Withdrawal, and Draft) and be assured that it will always return the current values for these in the database. Changes made to the lookup table during runtime would result in problems, but it still seems so much cleaner.
What should I be digesting to understand how best to structure this?
Thanks in advance for enlarging my brain. :-)
Dont worry about whether you have an embedded string or a strongy typed value - either is perfectly acceptable - which ever makes sense fror your database design.
What you should do, however, is write a single routine in a repository or helper class that you can then call from whatever controller or action requires it - if anything changes there is only one place to make the change.
One simple approach I've always liked is the Enum approach.
public enum TransactionType {
Overdraft
}
transaction.TypeId =
DataContext.TransactionTypes.Single(x => x.type == TransactionType.Overdraft.ToString()).id;
It's pretty simple, but I like it.
A more sophisticated approach (not sure if this works with Linq to SQL, but more sophisticated ORMs support it (like EF, DO .NET, LLBLGen, etc.) is to use inheritance in your data model, with discriminators.
That is, have a subclass of TransactionType called OverdraftTransactionType with a discriminator (the key) that identifies different types of TransactionTypes from each other.
Random link:
http://weblogs.asp.net/zeeshanhirani/archive/2008/08/16/single-table-inheritance-in-entity-framework.aspx
The following code illustrates a pattern I sometimes see, whereby an object is transformed implicitly as it is passed as a parameter across a number of method calls.
var o = new MyReferenceType();
DoSomeWorkAndPossiblyModifyO(o);
DoYetMoreWorkAndPossiblyFurtherModifyO(o);
//now use o...
This feels wrong to me (it hardly feels object oriented). Is it acceptable?
Based on your method names, I would argue that there is nothing implicit in the transformation. This pattern would be acceptable. If, on the other hand your methods had names like printO(o) or compareTo(o), but actually modified the Object o, the design would be bad.
It is acceptable but usually bad style.
The usual "good" approach is:
DoSomeWorkAndModify(&o); // explicit reference means we will be accepting changes
o = DoSomeWorkAndReturnModified(o); // much more elastic because you often want to keep original.
The approach you presented makes sense when o is huge, and making a copy of it in memory is out of question, or if it's a function you (and nobody else = private) use very frequently and don't want to bother with the & syntax. Otherwise it's laziness that results in some really difficult to detect bugs.
It depends entirely on what the methods actually do, besides modifying that object.
For instance, an object primarily related to keeping some state in memory might for instance not have anything related to persisting that state anywhere.
The methods could for instance load data from a database, and update the object with that information.
However! Since I program mostly in C# and thus .NET, which is a wholly object-oriented language, I would actually write your code like this:
var o = new MyReferenceType();
SomeOtherClass.DoSomeWorkAndPossiblyModifyO(o);
SomeOtherClass.DoYetMoreWorkAndPossiblyFurtherModifyO(o);
//now use o...
In which case the actual name of that other class (or those other classes if there's 2 involved) would give me a big clue as to what is actually happening and/or the context.
Example:
Person p = new Person();
DatabaseContext.FetchAllLazilyLoadedProperties(p);
DatabaseContext.Save(p); // updates primary key property with new ID
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm working on a highly-specialized search engine for my database. When the user submits a search request, the engine splits the search terms into an array and loops through. Inside the loop, each search term is examined against several possible scenarios to determine what it could mean. When a search term matches a scenario, a WHERE condition is added to the SQL query. Some terms can have multiple meanings, and in those cases the engine builds a list of suggestions to help the user to narrow the results.
Aside: In case anyone is interested to know, ambigous terms are refined by prefixing them with a keyword. For example, 1954 could be a year or a serial number. The engine will suggest both of these scenarios to the user and modify the search term to either year:1954 or serial:1954.
Building the SQL query and the refine suggestions in the same loop feels somehow wrong to me, but to separate them would add more overhead because I would have to loop through the same array twice and test all the same scenarios twice. What is the better course of action?
I'd probably factor out the two actions into their own functions. Then you'd have
foreach (term in terms) {
doThing1();
doThing2();
}
which is nice and clean.
No. It's not bad. I would think looping twice would be more confusing.
Arguably some of the tasks might be put into functions if the tasks are decoupled enough from each other, however.
I don't think it makes sense to add multiple loops for the sake of theoretical purity, especially given that if you're going to add a loop against multiple scenarios you're going from an O(n) -> O(n*#scenarios). Another way to break this out without falling into the "God Method" trap would be to have a method that runs a single loop and returns an array of matches, and another that runs the search for each element in the match array.
Using the same loop seems as a valid optimization to me, try to keep the code of the two tasks independent so this optimization can be changed if necessary.
Your scenario fits the builder pattern and if each operation is fairly complex then it would serve you well to break things up a bit. This is waaaaaay over engineering if all your logic fits in 50 lines of code, but if you have dependencies to manage and complex logic, then you should be using a proven design pattern to achieve separation of concerns. It might look like this:
var relatedTermsBuilder = new RelatedTermsBuilder();
var whereClauseBuilder = new WhereClauseBuilder();
var compositeBuilder = new CompositeBuilder()
.Add(relatedTermsBuilder)
.Add(whereClauseBuilder);
var parser = new SearchTermParser(compositeBuilder);
parser.Execute("the search phrase");
string[] related = relatedTermsBuilder.Result;
string whereClause = whereClauseBuilder.Result;
The supporting objects would look like:
public interface ISearchTermBuilder {
void Build(string term);
}
public class SearchTermParser {
private readonly ISearchTermBuilder builder;
public SearchTermParser(ISearchTermBuilder builder) {
this.builder = builder;
}
public void Execute(string phrase) {
foreach (var term in Parse(phrase)) {
builder.Build(term);
}
}
private static IEnumerable<string> Parse(string phrase) {
throw new NotImplementedException();
}
}
I'd call it a code smell, but not a very bad one. I would separate out the functionality inside the loop, putting one of the things first, and then after a blank line and/or comment the other one.
I would look to it as if it were an instance of the observer pattern: each time you loop you raise an event, and as many observers as you want can subscribe to it. Of course it would be overkill to do it as the pattern but the similarities tell me that it is just fine to execute two or three or how many actions you want.
I don't think it's wrong to make two actions in one loop. I'd even suggest to make two methods that are called from inside the loop, like:
for (...) {
refineSuggestions(..)
buildQuery();
}
On the other hand, O(n) = O(2n)
So don't worry too much - it isn't such a performance sin.
You could certainly run two loops.
If a lot of this is business logic, you could also create some kind of data structure in the first loop, and then use that to generate the SQL, something like
search_objects = []
loop through term in terms
search_object = {}
search_object.string = term
// suggestion & rules code
search_object.suggestion = suggestion
search_object.rule = { 'contains', 'term' }
search_objects.push(search_object)
loop through search_object in search_objects
//generate SQL based on search_object.rule
This at least saves you from having to do if/then/elses in both loops, and I think it is a bit cleaner to move SQL code creation outside of the first loop.
If the things you're doing in the loop are related, then fine. It probably makes sense to code "the stuff for each iteration" and then wrap it in a loop, since that;s probably how you think of it in your head.
Add a comment and if it gets too long, look at splitting it or using simple utility methods.
I think one could argue that this may not exactly be language-agnostic; it's also highly dependent on what you're trying to accomplish. If you're putting multiple tasks in a loop in such a way that they cannot be easily parallelized by the compiler for a parallel environment, then it is definitely a code smell.
Has anyone come up with a good way of performing full text searches (FREETEXT() CONTAINS()) for any number of arbitrary keywords using standard LinqToSql query syntax?
I'd obviously like to avoid having to use a Stored Proc or have to generate a Dynamic SQL calls.
Obviously I could just pump the search string in on a parameter to a SPROC that uses FREETEXT() or CONTAINS(), but I was hoping to be more creative with the search and build up queries like:
"pepperoni pizza" and burger, not "apple pie".
Crazy I know - but wouldn't it be neat to be able to do this directly from LinqToSql? Any tips on how to achieve this would be much appreciated.
Update: I think I may be on to something here...
Also: I rolled back the change made to my question title because it actually changed the meaning of what I was asking. I know that full text search is not supported in LinqToSql - I would have asked that question if I wanted to know that. Instead - I have updated my title to appease the edit-happy-trigger-fingered masses.
I've manage to get around this by using a table valued function to encapsulate the full text search component, then referenced it within my LINQ expression maintaining the benefits of delayed execution:
string q = query.Query;
IQueryable<Story> stories = ActiveStories
.Join(tvf_SearchStories(q), o => o.StoryId, i => i.StoryId, (o,i) => o)
.Where (s => (query.CategoryIds.Contains(s.CategoryId)) &&
/* time frame filter */
(s.PostedOn >= (query.Start ?? SqlDateTime.MinValue.Value)) &&
(s.PostedOn <= (query.End ?? SqlDateTime.MaxValue.Value)));
Here 'tvf_SearchStories' is the table valued function that internally uses full text search
Unfortunately LINQ to SQL does not support Full Text Search.
There are a bunch of products out there that I think could: Lucene.NET, NHibernate Search comes to mind. LINQ for NHibernate combined with NHibernate Search would probably give that functionality, but both are still way deep in beta.