Let say if I used the Linq to Sql concept to interact with database from C# language , then what challenges I may be face? means in terms of architecture, performance , type safety, objects orientation etc ..!
Basically Linq to SQL generates a class for each table in your database, complete with relation properties and all, so you will have no problems with type safety. The use of C# partials allows you to add functionality to these objects without messing around with Linq to SQLs autogenerated code. It works pretty well.
As tables map directly to classes and objects, you will either have to accept that your domain layer mirrors the database design directly, or you will have to build some form of abstraction layer above Linq to SQL. The direct mirroring of tables can be especially troublesome with many-to-many relations, which is not directly supported - instead of Orders.Products you get Order.OrderDetails.SelectMany(od => od.Product).
Unlike most other ORMs Linq to SQL does not just dispense objects from the database and allow you to store or update objects by passing them back into the ORM. Instead Linq to SQL tracks the state of objects loaded from the database, and allows you to change the saved state. It is difficult to explain and strange to understand - I recommend you read some of Rick Strahls blogposts on the subject.
Performance wise Linq-to-SQL does pretty good. In benchmarking tests it shows speeds of about 90-95% of what a native SQL reader would provide, and in my experience real world usage is also pretty fast. Like all ORMs Linq to SQL is affected by the N+1 selects problem, but it provides good ways to specify lazy/eager loading depending on context.
Also, by choosing Linq to SQL you choose MSSQL - there do exist third party solutions that allow you to connect to other databases, but last time I checked, none of them appeared very complete.
All in all, Linq to SQL is a good and somewhat easy to learn ORM, which performs okay. If you need features beyond what Linq to SQL is offering, take a look at the new entity framework - it has more features, but is also more complex.
We've had a few challenges, mainly from opening the query construction capability to programmers that don't understand how databases work. Here are a few smells:
//bad scaling
//Query in a loop - causes n roundtrips
// when c roundtrips could have been performed.
List<OrderDetail> od = new List<OrderDetail>();
foreach(Customer cust in customers)
{
foreach(Order o in cust.Orders)
{
od.AddRange(dc.OrderDetails.Where(x => x.OrderId = o.OrderId));
}
}
//no seperation of
// operations intended for execution in the database
// from operations intended to be executed locally
var query =
from c in dc.Customers
where c.City.StartsWith(textBox1.Text)
where DateTime.Parse(textBox2.Text) <= c.SignUpDate
from o in c.Orders
where o.OrderCode == Enum.Parse(OrderCodes.Complete)
select o;
//not understanding when results are pulled into memory
// causing a full table load
List<Item> result = dc.Items.ToList().Skip(100).Take(20).ToList();
Another problem is that one more level of separation from the table structures means indexes are even easier to ignore (that's a problem with any ORM though).
Related
I have an application that allows users to filter applicants based on very large set of criteria. The criteria are each represented by boolean columns spanning multiple tables in the database. Instead of using active record models I thought it was best to use pure sql and put the bulk of the work in the database. In order to do this I have to construct a rather complex sql query based on the criteria that the users selected and then run it through AR on the db. Is there a better way to do this? I want to maximize performance while also having maintainable and non brittle code at the same time? Any help would be greatly appreciated.
As #hazzit said, it is difficult to answer without much details, but here's my two cents on this. Raw SQL is usually needed to perform complex operations like aggregates, calculations, etc. However, when it comes to search / filtering features, I often find using raw SQL overkill and not quite maintainable.
The key question here is : can you break down your problem in multiple independent filters ?
If the answer is yes, then you should leverage the power of ActiveRecord and Arel. I often find myself implementing something like this in my model :
scope :a_scope, ->{ where something: true }
scope :another_scope, ->( option ){ where an_option: option }
scope :using_arel, ->{ joins(:assoc).where Assoc.arel_table[:some_field].not_eq "foo" }
# cue a bunch of scopes
def self.search( options = {} )
output = relation
relation = relation.a_scope if options[:an_option]
relation = relation.another_scope( options[:another_option] ) unless options[:flag]
# add logic as you need it
end
The beauty of this solution is that you declare a clean interface in which you can directly pour all the params from your checkboxes and fields, and that returns a relation. Breaking the query into multiple, reusable scopes helps keeping the thing readable and maintainable ; using a search class method ties it all together and allows thorough documentation... And all in all, using Arel helps securing the app against injections.
As a side note, this does not prevent you from using raw SQL, as long as the query can be isolated inside a scope.
If this method is not suitable to your needs, there's another option : use a full-fledged search / filtering solution like Sunspot. This uses another store, separate from your db, that indexes defined parts of your data for easy and performant search.
It is hard to answer this question fully without knowing more details, but I'll try anyway.
While databases are bad at quite a few things, they are very good at filtering data, especially when it comes to a high volumes.
If you do the filtering in Ruby on Rails (or just about any other programming language), the system will have to retrieve all of the unfiltered data from the database, which will cause tons of disk I/O and network (or interprocess) traffic. It then has to go through all those unfiltered results in memory, which may be quite a burdon on RAM and CPU.
If you do the filtering in the database, there is a pretty good chance that most of the records will never be actually retrieved from disk, won't be handed over to RoR and won't then be filtered. The main reason for indexes to even exist is for the sole purpose of avoiding expensive operations in order to speed things up. (Yes, they also help maintain data integrity)
To make this work, however, you may need to help the database a bit to do its job efficiently. You will have to create indexes matching your filtering criteria, and you may have to look into performance issues with certain types of queries (how to avoid temporary tables and such). However, it is definately worth it.
Having that said, there actually are a few types of queries that a given database is not good at doing. Those are few and far between, but they do exist. In those cases, an implementation in RoR might be the better way to go. Even without knowing more about your scenario, I'd say it's a pretty safe bet that your queries are not among those.
I've recent gone through the process of revamping my database, normaising a lot of entities. Obviously I now have a few more tables than I had. A lot of data I use on the website is readonly so this is simple to denormalise using a view, however there are entities that could benefit from denormalised retrieval but still need to be updated.
Here's an example.
A User may be a Member
A Member may have a Profile
A Member may have an Account
In addition I have 3 further lookup tables.
In total there are 3 tables for User and 4 tables for Member.
Ideally, I can create 2 views from the above tables.
However, User needs to be updated as do the entities belonging to Member. Additionally there are 6 separate tables associated with Users/Members, i.e. FavouriteCategories that also need to be retreived and updated from time to time.
I'm struggling to come up with the best, most efficient way of doing this.
I could simply not use views and bring all the entities and lookups into the model, but I would be reliant on EF to produce the retreival queries. The stuff I've read suggest that EF is not best at dealing with joined data.
I could add both the view and tables, using the tables for updates only. This seems sloppy due to the duplication, complication of the model, as well as underutilising the EF model functionality.
Maybe I could use the readonly view for data retrieval and create stored procs. I believe that the process of using EF with stored procs is a bit of a hack, so I'd probably keep the stored procs distinct from EF and simply pass params and call the SP via traditional methods. This again seems like a bit of a halfway house.
I'm not that experienced with .net or EF, so would appreciate some solid advice on either the methods I've referred to above or any better technique to acheive this. I don't want to go hacking the edmx file at this stage because... well it's just wrong.
I have a few entities that would benefit from the right solution. The User example is amongst the simplest, so there's a lot to gain from the right approach.
Help and advice would be very much appreciated.
Do you want to use EF? If yes use either first approach with not using views at all and allowing EF to handle everything or the last approach with using views and mapping stored procedures for insert, update and delete operations.
Combining mapped views for reading and mapped tables for modifications is possible as well but it is mostly the first solution (allowing EF to handle everything) with additional views for some query optimization.
You will not find cleaner approaches. Are mentioned approaches are valid solution for your problem. The only question is if you want to write SQL yourselves (view and stored procedures) or let EF to do that.
The worst approach is using EF for querying and manual calling of stored procedures for updating but in some cases it can be also useful.
I am writing an aspx application that will host 1000's of small customers in a communal SQL Server database. All entities will be created and loaded via Linq-To-Sql.
Surrogate keys (identity columns) will be used throughout the schema for all table relationships and so starting with a root customer object I should be able to navigate to exclusive sets of data for a particular customer using regular Linq queries (SQL joins).
However from a security standpoint the above is a bit fragile so I wish to add an extra layer of tenancy check as a security backstop. All entities in my entity model will have a non-indexed int TenantId field.
I am looking for critical comments about this solution from a performance perspective.
public partial class MyLinqEntity
partial void OnLoaded() // linq-to-sql extensibility function
{
if ( this.TennantId != HttpContext.Current.Session["tenantId"] )
throw new ApplicationException("Logic error, LINQ query crossed tenantId data boundary");
}
partial void OnCreated() // linq-to-sql extensibility function
{
this.TennantId = HttpContext.Current.Session["tenantId"] );
}
Sorry this is mostly random thoughts….
I do not like all the objects depending on HttpContext.
Also I don’t know if looking in the session for each object is fast enough. I think you will be OK on speed, as a database lookup will normally be a lot slower then anything you do in process.
I would tend to use a dependency injection framework to auto create an object that has session scope to do the check. However if you don’t have a need for a dependency injection framework elsewhere this will be over kill.
As all your database rows will have tenantId column, I hope you can move the check into a linq-to-sql “row read callback”, so you don’t have to put it into each object. I don’t know “linq-to-sql”, however I expect you could hook into it’s query creation framework and add a “where tenanted = xx” to all database queries.
Having a “where tenanted = xx” in all quarries will let partition the database if needed by customer, so making “table scans” etc cheaper.
If we abstract out the DataContext, then are L2S and L2O queries identical?
I already have a working prototype which demonstrates this, but it is very simple and wonder if it will hold up to more advanced querying.
Does anyone know?
No they're not the same.
LINQ to Objects queries operate on IEnumerable<T> collections. The query iterates through the collection and executes a sequence of methods (for example, Contains, Where etc) against the items in the collection.
LINQ to SQL queries operate on IQueryable<T> collections. The query is converted into an expression tree by the compiler and that expression tree is then translated into SQL and passed to the database.
It's quite commonplace for LINQ to SQL to complain that a method can't be translated into SQL, even though that method works perfectly in a LINQ to Objects query. (In other cases, you may not see an exception, but the query results might be subtly different between LINQ to Objects and LINQ to SQL.)
For example, LINQ to SQL will choke on this simple query, whereas LINQ to Objects will be fine:
var query = from n in names
orderby n.LastName.TrimStart(',', ' ').ToUpper(),
n.FirstName.TrimStart(',', ' ').ToUpper()
select new { n.FirstName, n.LastName };
(It's often possible to workaround these limitations, but the fact that you can't guarantee that any arbitrary LINQ to Objects query will work as a LINQ to SQL query tells me that they're not the same!)
Frustratingly, all IQueryably<T> implementations are, essentially, leaky abstractions - and it is not safe to assume that something that works in LINQ-to-Objects will still work under any other provider. Apart from the obvious function mappings, things like:
LINQ-to-SQL can't possibly support all functions / overloads - listed here Data Types and Functions (LINQ to SQL)
plus it depends on the actual database server; Skip/Take etc work differently on SQL Server 2000 than 2005+, and not every such translation works on SQL Server 2000
EF doesn't support Single or Expression.Invoke (sub-expression invocation), or UDF usage
Astoria supports different use of Single/First; as I recall it supports Where(pred).Single() - but not Single(pred) (which is the preferred usage for LINQ-to-SQL)
So you can't really use IEnumerable<T> for your unit tests simulating a database, even via AsQueryable() - it simply isn't robust. Personally, I keep IQueryable<T> and Expression away from the repository interface for this reason - see Pragmatic LINQ.
The query syntax is the same. If you use Enumerable.ToQuerable, even the types are the same. But there are some differences:
some queries will only work on L2O and will result in an runtime error in L2S (e.g. if an expression tree contains a function that cannot be converted to SQL. This cannot be detected at compile time)
some queries return different results on L2S and L2O (example: Max([empty sequence]) will throw an exception in L2O but return null in L2S)
So in the end, you will have to test against a database to be sure, but I think L2O is pretty good for simple, fast unit-tests.
In a controversial blog post today, Hackification pontificates on what appears to be a bug in the new LINQ To Entities framework:
Suppose I search for a customer:
var alice = data.Customers.First( c => c.Name == "Alice" );
Fine, that works nicely. Now let’s see
if I can find one of her orders:
var order = ( from o in alice.Orders
where o.Item == "Item_Name"
select o ).FirstOrDefault();
LINQ-to-SQL will find the child row.
LINQ-to-Entities will silently return
nothing.
Now let’s suppose I iterate through
all orders in the database:
foreach( var order in data.Orders ) {
Console.WriteLine( "Order: " + order.Item ); }
And now repeat my search:
var order = ( from o in alice.Orders
where o.Item == "Item_Name"
select o ).FirstOrDefault();
Wow! LINQ-to-Entities is suddenly
telling me the child object exists,
despite telling me earlier that it
didn’t!
My initial reaction was that this had to be a bug, but after further consideration (and backed up by the ADO.NET Team), I realized that this behavior was caused by the Entity Framework not lazy loading the Orders subquery when Alice is pulled from the datacontext.
This is because order is a LINQ-To-Object query:
var order = ( from o in alice.Orders
where o.Item == "Item_Name"
select o ).FirstOrDefault();
And is not accessing the datacontext in any way, while his foreach loop:
foreach( var order in data.Orders )
Is accessing the datacontext.
LINQ-To-SQL actually created lazy loaded properties for Orders, so that when accessed, would perform another query, LINQ to Entities leaves it up to you to manually retrieve related data.
Now, I'm not a big fan of ORM's, and this is precisly the reason. I've found that in order to have all the data you want ready at your fingertips, they repeatedly execute queries behind your back, for example, that linq-to-sql query above might run an additional query per row of Customers to get Orders.
However, the EF not doing this seems to majorly violate the principle of least surprise. While it is a technically correct way to do things (You should run a second query to retrieve orders, or retrieve everything from a view), it does not behave like you would expect from an ORM.
So, is this good framework design? Or is Microsoft over thinking this for us?
Jon,
I've been playing with linq to entities also. It's got a long way to go before it catches up with linq to SQL. I've had to use linq to entities for the Table per Type Inheritance stuff. I found a good article recently which explains the whole 1 company 2 different ORM technologies thing here.
However you can do lazy loading, in a way, by doing this:
// Lazy Load Orders
var alice2 = data.Customers.First(c => c.Name == "Alice");
// Should Load the Orders
if (!alice2.Orders.IsLoaded)
alice2.Orders.Load();
or you could just include the Orders in the original query:
// Include Orders in original query
var alice = data.Customers.Include("Orders").First(c => c.Name == "Alice");
// Should already be loaded
if (!alice.Orders.IsLoaded)
alice.Orders.Load();
Hope it helps.
Dave
So, is this good framework design? Or is Microsoft over thinking this for us?
Well lets analyse that - all the thinking that Microsoft does so we don't have to really makes us lazier programmers. But in general, it does make us more productive (for the most part). So are they overthinking or are they just thinking for us?
If LINQ-to-Sql and LINQ-to-Entities came from two different companies, it would be an acceptable difference - there's no law stating that all LINQ-To-Whatevers have to be implemented the same way.
However, they both come from Microsoft - and we shouldn't need intimate knowledge of their internal development teams and processes to know how to use two different things that, on their face, look exactly the same.
ORMs have their place, and do indeed fill a gap for people trying to get things done, but the ORM uses must know exactly how their ORM gets things done - treating it like an impenetrable black box will only lead you to trouble.
Having lost a few days to this very problem, I sympathize.
The "fault," if there is one, is that there's a reasonable tendency to expect that a layer of abstraction is going to insulate from these kinds of problems. Going from LINQ, to Entities, to the database layer, doubly so.
Having to switch from MS-SQL (using LingToSQL) to MySQL (using LinqToEntities), for instance, one would figure that the LINQ, at least, would be the same if not just to save from the cost of having to re-write program logic.
Having to litter code with .Load() and/or LINQ with .Include() simply because the persistence mechanism under the hood changed seems slightly disturbing, especially with a silent failure. The LINQ layer ought to at least behave consistently.
A number of ORM frameworks use a proxy object to dynamically load the lazy object transparently, rather than just return null, though I would have been happy with a collection-not-loaded exception.
I tend not to buy into the they-did-it-deliberately-for-your-benefit excuse; other ORM frameworks let you annotate whether you want eager or lazy-loading as needed. The same could be done here.
I don't know much about ORMs, but as a user of LinqToSql and LinqToEntities I would hope that when you try to query Orders for Alice it does the extra query for you when you make the linq query (as opposed to not querying anything or querying everything for every row).
It seems natural to expect
from o in alice.Orders where o.Item == "Item_Name" select o
to work given that's one of the reasons people use ORM's in the first place (to simplify data access).
The more I read about LinqToEntities the more I think LinqToSql fulfills most developers needs adequately. I usually just need a one-to-one mappingn of tables.
Even though you shouldn't have to know about Microsoft's internal development teams and processes, fact of the matter is that these two technologies are two completely different beasts.
The design decision for LINQ to SQL was, for simplicity's sake, to implicitly lazy-load collections. The ADO.NET Entity Framework team didn't want to execute queries without the user knowing so they designed the API to be explicitly-loaded for the first release.
LINQ to SQL has been handed over to ADO.NET team and so you may see a consolidation of APIs in the future, or LINQ to SQL get folded into the Entity Framework, or you may see LINQ to SQL atrophy from neglect and eventually become deprecated.