I am using linq with linq to sql for data validation in my app.
How can I be positive that when querying my data context, the query won't hit the database?
I wan't to only access the data that has been pre-loaded and validate upon that.
Let's say that concurrency is not an issue here.
If you want to be 100% sure, you need to get the data in memory with e.g. a ToList() after your query.
Dispose the original datacontext after that and you can be sure your entities in the List<> will not hit the database anymore. (They just give you an exception instead...)
However, you will not be querieng the datacontext anymore so it is not a complete answer to your question. If you perform a new query to your datacontext, as far as I know it will always hit the database. Linq has no built in cache.
Related
i'm developing a little market in a web application and i have to implement the search function. Now, i know i can use MATCH function in mysql or i can add some libraries (like apache lucene) but that's not the point of my doubt. I'm thinking about managing the set of results i get from the search function (a servlet will do this), cause not all the results should be send to client at one time, so i would like to separate them in some pages. I want to know what is more efficient to do, if i should prefer to do the search in db for every page the client calls or if i should save the result set in a managed bean and access them while the client request a new page of results. Thx (i hope my english is enough understandable)
The question you should be asking is "how many results can you store in memory"? If you have a small dataset, by all means, sure but you will have to define what "small dataset means". This will help as you call the database once and filter on your result in memory (which is faster).
Alternative approach, for larger/huge dataset, you will want to request to the database on every user page request. The problem here is that you call the database on each call, so you will have to have an optimised search query that will bring results in small chunks (SQL LIMIT clause). If you only want to hit the database once and filter the result in "memory", you will have to slot in a caching layer in between your application and your database. That way, the results are cached and you filter on the cached result. The cache will sit on a different JVM as not to share your memory heap space.
There is no silver bullet here. You can only answer this based on your non-functional requirements.
I hope this helps.
We're using Linq to SQL here. At present, our repositories tend to new up their own DataContext objects, so if you're working with several repositories, you're working with several DataContexts.
I'm doing some refactoring to allow dependency injection in our work. I have assumed that we would want a '1 DataContext per request' pattern, so I'm injecting the same DataContext (unique to the web request) into all the repositories.
But then something happened today. Following my refactoring, we got a ForeignKeyReferenceAlreadyHasValueException because a foreign key field was set instead of the corresponding association property being set. So far as I can tell from Google, setting the foreign key directly is wrong in Linq to SQL (i.e. our code was wrong to do this), but we never got the error until after I had done the refactoring.
So I just wanted to check - is one DataContext per request definitely the right way to go?
One DataContext per request is one way to go, not the only one, but usually a good one.
By using a single DataContext you can save all submits to the end of the request and submit all changes at once. SubmitChanges automatically encapsulates all changes in a transaction.
If you use multiple contexts you need to encapsulate your request in a transaction instead to make it possible to rollback changes if the request fails halfway through. You get a little more overhead using multiple contexts but that is usually not significant.
I have worked with both single and multiple datacontexts in different applications and both works good, if it requires to much rewrite to go to a single DataContext you can keep multiple contexts if you don't have any other strong reason for a rewrite.
If you are using transactionscope, then you will find that the multiple datacontexts method will create a new DB connection for each new repository that has a new datacontext newed up while the transaction in the transactionscope is pending. We ran into this when a transaction scope wrapped a request that performed work with a lot repository objects and ran out of connections in the pool.
So if you plan on using trnasactionscope to manage complex business processes in a transaction, definitely go with the shared datacontext.
Problem scope: I want to use EF4.1 without any trade offs to the speed and reliability of the Enterprise Library Data Access Block that I know and trust.
Thanks to lots of Stackoverflow links and blogs about EF performance tuning I'm posting this way , among many, to use EF4.1 that matches the performance of ADO/ Enterprise Lib Data Access Block (SqlDataReader).
The Project:
1. No linq to Entities/ dynamic sql. I love linq, I just try to use it against objects mostly.
2. 100% stored procedures and no tracking, no merge, and most importantly, never call .SaveChanges(). I just call the insert/ update/ delete proc DbContext.StoredProcName(params). At this point we have eliminated several of the rapid dev elements of EF but the way it auto creates a complex type for your stored proc is enough for me.
The GetString and similar methods are an AbstractMapper that just goes through the expected types and casts the datareader into the type.
So this is the mark to beat as far as I'm concerned. It would be hard to adopt something I know to be slower.
That is SLOWER!!! A lot slower!
That is more like it!!
Performance Pie
Based on my results that performance pie should increase the tracking overhead by a lot more than 1%
I tried pre compiling the views and nothing got as big of a boost as no tracking! Why?? Maybe somebody can chime in on that.
So, this one is not really fair to compare to Enterprise Lib, but I'm making one untimed call to the database to load the the meta data that I understand is loaded once per IIS app pool. Essentially once in the life of your app.
I'm using EF this way with auto stored procedure generation and I used Linq to Edmx to auto import all these edmx function nodes to map up to the entities. Then I auto gen a repository for each entity and an engine.
Since I never call SaveChanges, I don't bother taking the time to map to stored procs in the designer. It takes too long and it is way to easy to break it and not know it. So I just call the procs from the context.
Before I actually implement this in my new mission critical medical equipment delivery web application I would appreciate any observations and critiques.
Thanks!
Just a few remarks:
Performance Pie Based on my results that performance pie should
increase the tracking overhead by a lot more than 1% I tried pre
compiling the views and nothing got as big of a boost as no tracking!
Why??
The blog post is from 2008 and therefore based on EF Version 1 and on EntityObject derived entities. With EF 4.1 you are using POCOs. Change tracking behaves very differently with POCOs. Especially when a POCO object is loaded from the database into the object context Entity Framework creates a snapshot of the original property values and stores is in the context. Change tracking relies on comparison between current entity values and original snapshop values. Creating this snapshot is apparently expensive in terms of performance and memory consumption as well. My observations are that it costs at least 50 percent (query time without change tracking is the half of the query time with change tracking). You seem to have measured an even bigger impact.
The Project: 1. No linq to Entities/ dynamic sql. I love linq, I just
try to use it against objects mostly. 2. 100% stored procedures and no
tracking, no merge, and most importantly, never call .SaveChanges(). I
just call the insert/ update/ delete proc
DbContext.StoredProcName(params). At this point we have eliminated
several of the rapid dev elements of EF but the way it auto creates a
complex type for your stored proc is enough for me.
For me this looks like you are ignoring essentially some of the main features why Entity Framework exists and it is questionable why you want to use EF for your purpose at all. If your main goal is to have a tool which helps to materialize query results into complex objects you could take a look at Dapper which focusses on this task with high performance in mind. (Dapper is the underlying ORM used here at Stackoverflow.)
A few days ago here was a question with great answers about EF performance. It has been migrated to "Programmers" now:
https://softwareengineering.stackexchange.com/questions/117357/is-entity-framework-suitable-for-high-traffic-websites
I have to get records from a table(about 40 columns), process each record, call a web service on this record, wait for its response and update the record to database.
Now, I see 2 options.
1. Linq to Sql
2. ADO.Net with Typed Dataset
(I leave the option of DataReader for all the extra work I have to do.)
2 closes the connection soon after fetching the data, I can process data offline and submit changes later, i.e, I don't have keep connection open for so long. With 1, in-order to be able to submit changes at the end, I have to keep the connection open all the time.
Do you think 2 is the always the best way whenever changes need to submitted after a certain period of processing or am I missing something?
Neither Linq to SQL or Entity Framework keep the connection open while you are working with the fetched objects. If you want to take advantage of the change tracking capabilities of the various contexts, you need to keep the context object in scope, but that doesn't mean the connection to the database remains open during that period. In actuality, the connection is only open while you are iterating on the results (databinding) and when you call SubmitChanges/SaveChanges. Otherwise the connetion is closed.
These technologies use ADO.Net DataReaders and command objects under the covers. There's still no concept of open cursors like you had back in the VB6 days.
L2S or Entity Framework introduce a bit of overhead, but honestly the time you save is well worth it and whenever you perform a Linq query on your objects the SQL is optimized for you.
Plain old ADO.Net is old school - Linq is the way to go now.
Plain old ADO.Net is old school - Linq is the way to go now.
Microsoft caters to newbies and sells itself to the masses as a "hobbyist" language.
LINQ allows developers to query a datasource who have no business accessing a data source.
Having deep knowledge and a brain enough to write your own optimized T-SQL is perhaps is a desireable skill, not laughable as, "old school".
As a software engineer a person should move in the direction of writing software which can perform as efficient and fast as possible. The whole "saving development time" argument for choosing a higher level construct because it is all too simple to implement speaks to how crappy the program will perform if stacked up next to a senior engineers version.
Don't get sold on latest and greatest offerings that make coding "faster". If you are smart enough to do it yourself, then do it yourself and avoid having a LINQ syntax write T-SQL for you.
There is a measure of scale in the business world where microseconds count, and being a valuable commodity to a company will put your deep knowledge to the test when your software needs to run as fast as you know how to make it.
In short, it is never out of style to have deep knowledge.
An attempt has been made to Attach or
Add an entity that is not new, perhaps
having been loaded from another
DataContext. This is not supported.
Having worked with Linq to SQL for some time, I believe that I know about its limitations and that I follow the rules when I write new code. But it is frustrating to get this exception, as there is no indication of which object caused the violation. In complex data manipulation scenarios with multiple DCs, I can only think of trial-and-error to narrow down the possible culprits. Is there a way to find out more?
If you attach an object to your DataContext, it must have been created by you using the new operator. If you read the object from a DataContext, you must save it to the same DataContext.
To alleviate these kinds of problems, I always use a single DataContext, and I do everything on a "unit of work" basis.
Generally speaking, that means that, at any given time, I am reading the records I need, performing work on those records, and saving changes, all in a single unit of work, using a single DataContext. Since it is a unit of work, it doesn't bleed over into other DataContext objects.
If Linq to SQL is fighting you on this, I would examine your architecture and see if the way you are doing it is optimal, especially if you are finding it difficult to identify the object causing the error. In general it is difficult to share objects between DataContexts. You can do it (using things like "attach" and "detach"), but it's a pain in the ass.