Mutli tenancy backstop check on Linq-to-sql entity load() - linq-to-sql

I am writing an aspx application that will host 1000's of small customers in a communal SQL Server database. All entities will be created and loaded via Linq-To-Sql.
Surrogate keys (identity columns) will be used throughout the schema for all table relationships and so starting with a root customer object I should be able to navigate to exclusive sets of data for a particular customer using regular Linq queries (SQL joins).
However from a security standpoint the above is a bit fragile so I wish to add an extra layer of tenancy check as a security backstop. All entities in my entity model will have a non-indexed int TenantId field.
I am looking for critical comments about this solution from a performance perspective.
public partial class MyLinqEntity
partial void OnLoaded() // linq-to-sql extensibility function
{
if ( this.TennantId != HttpContext.Current.Session["tenantId"] )
throw new ApplicationException("Logic error, LINQ query crossed tenantId data boundary");
}
partial void OnCreated() // linq-to-sql extensibility function
{
this.TennantId = HttpContext.Current.Session["tenantId"] );
}

Sorry this is mostly random thoughts….
I do not like all the objects depending on HttpContext.
Also I don’t know if looking in the session for each object is fast enough. I think you will be OK on speed, as a database lookup will normally be a lot slower then anything you do in process.
I would tend to use a dependency injection framework to auto create an object that has session scope to do the check. However if you don’t have a need for a dependency injection framework elsewhere this will be over kill.
As all your database rows will have tenantId column, I hope you can move the check into a linq-to-sql “row read callback”, so you don’t have to put it into each object. I don’t know “linq-to-sql”, however I expect you could hook into it’s query creation framework and add a “where tenanted = xx” to all database queries.
Having a “where tenanted = xx” in all quarries will let partition the database if needed by customer, so making “table scans” etc cheaper.

Related

Group chats and private chats, seperate table or single table with type attribute?

Currently I have a table users, a table chats, however I want there to be "Group chats" and "Private chats (dm)".
A group chat needs more data than a private chat, such as for example: Group name, picture, ....
What is the best way to approach this?
Do I make 1 table chats, and put a type attribute in there that deteremines if it is private or not and leave some columns blank if it is a private chat. OR Would I make 2 tables one for private chats, and one for group chats?
This is a similar scenario to the general question "should you split sensitive columns into a new table" and the general answer is the same, it is going to depend largely on your data access code and your security framework.
What about a third option, why not just model a Private Chat as a Group Chat that only has 2 members in the group?. Sometimes splitting the model into these types is a premature optimisation, especially in the context of a chat style application. For instance, couldn't a private chat benefit from having an image in the same way that a group chat does? Could there not be some benefit to users being able to specify a group name to their own private group?
You will find the whole development and management of your application a lot simpler if there is just one type of chat and it is up to the user to decide how many people can join or indeed if other people can join the chat.
If you still want to explore the 2 conceptual types this is this is an answer that might give you some indirect insights: https://stackoverflow.com/a/74398184/1690217 but ultimately we need additional information to justify selecting one structure over the other. Performance, Security and general data governance are some considerations that have implications or impose caveats on implementation.
From a structural point of view, your Group Chats and Private Chats can be both implementations of a common Chat table, conceptually we could say that both forms inherit from Chat.
In relational databases we have 3 general options to model inheritance:
Table Per Hierarchy (TPH)
Use a single table with a discriminator column that determines for each row what the specific type is. Then in your application layer or via views you can query the specific fields that each type and scenario needs.
In TPH the base type is usually an abstract type definition
Table Per Type (TPT)
The base type and each concrete type exists as their own separate tables. The FK from the inheriting tables is the PK and shares the same PK value as the corresponding record in the base table, creating a 1:0-1 relationship. This requires some slightly more complicated data access logic but it makes it harder to accidentally retrieve a Private Chat in a Group Chat context because the data needs to be queried explicitly from the correct table.
In TPT the base type is itself a concrete type and data records do not have to inherit into the extended types at all.
Simple Isolated Tables (No inheritance in the schema)
This is often the simplest approach, if your tables do have inheritance in the application logic then the common properties would be replicated in each table. This can result in a lot of redundant data access logic, but the OO inheritance in the application layer following DRY principal solves most of code redundancy issues.
This answer to How can you represent inheritance in a database? covers DB inheritance from a more academic and researched point of view.
From a performance point of view, there are benefits to isolating workloads if the usage pattern is significantly different. So if Group Chats have a different usage profile, either the frequency or type of queries is significantly different, or the additional fields in Group Chat would benefit from their own index profiles, then splitting the tables will allow your database engine to provide better index management and execution plan optimisations due to more accurate capture of table statistics.
From a security and compliance point of view, a single table implementation (TPH) can reduce the data access logic and therefore the overall attack surface of the code. But a good ORM or code generation strategy usually mitigates any issues that might be raised in this space. Conversely TPH or simple tables make it easier to define database or schema level security policies and constraints.
Ultimately, which solution is best for you will come down to the effort required to implement and maintain the application logic for your choice.
I will sometimes use a mix of TPT and TPH in the same database but often lean towards TPT if I need inheritance within the data schema, this old post explains my reasoning against TPH: Database Design: Discrimator vs Separate Tables with regard to Constraints. My general rule is that if the type needs to be polymorphic, either to be considered of both types or for the type context to somehow change dynamically in the application runtime, then TPT or no inheritance is simpler to implement.
I use TPH when the differences between the types is minimal and not expected to reasonably diverge too much over the application life time, but also when the usage and implementations are going to be very similar.
TPT provides a way to express inheritance but also maintain a branch of vastly different behaviours or interactions (on top of the base implementation). many TPT implementations look as if they might as well have been separate tables, the desire to constrain the 1:1 link between the records is often a strong decider when choosing this architectural pattern. A good way to think about this model, even if you do not use inheritance at the application logic level, is that you can extend the base record to include the metadata and behaviours of any of the inheriting types. In fact with TPT it is hard to constrain the data records such that you cannot extend into multiple types.
Due to this limitation, TPT can often be modelled from the application layer as not using OO Inheritance at all
TPT complements Composition over Inheritance
TPH is often the default way to model a Domain Model that implements simple inheritance, but this introduces a problem in application logic if you need to change the type or is incompatible with the idea that a single record could be both types. There are simple workarounds for this, but historically this causes issues from a code maintenance point of view, it's a clash of concepts really, TPH aligns with Inheritance more than Composition
In the context of Chat, TPT can work from a Composition point of view. All chats have the same basic features and interactions, but Group Chat records can have extended metadata and behaviours. Unless you envision Private Chat having a lot of its own specific implementation there is not really a reason to extend the base concept of Chat to a Private Chat implementation if there is no difference in that implementation.
For that reason too though, is there a need to differentiate between Private and Group chats at all from a database perspective? Your application runtime shouldn't be using blind SELECT * style queries to access the data in either case, it should be requesting the specific fields that it needs for the given context, whether you use a Field in the table, or the Name of the table to discrimate between the different concepts is less important than being able to justify the existence of or the difference between those concepts.

Performance of JPA mappings

Right now I am using JPA. What I would like to know is the following:
I have a lot of tables mapped to each other. When I look at the log, I see that a lot of information is being pulled out from the database after a simple query. What will happen if there will be a lot of queries at a time? Or will it work fine? How can I increase performance?
There is an overhead that ORM framework comes with. Because it's fairly high level, it sometimes needs to generate a lot of native SQL queries to get you what you want in one or two lines of JPQL or pure EntityManager operations.
However, JPA uses two caches - L1 and L2. One operates on the entities level, other on PersistenceUnits level. Therefore, you might see a lot of SQL queries generated, but after some time, you should have some of the data cached.
If you're unhappy with the performance, you could try using lazy loaded collections or fetching the required data by yourself (you might be interested in Bozho's post regarding this matter).
Finally, if you see that the cache hasn't improved your performance and that the hand-made JPQL queries are not doing the job right - you can always revert to plain SQL queries. Beware, that those queries bypass the JPA caches and might require you to do some flushes before you execute the native query (or invoke it at the beginning of the active transaction).
Despite the optimisation route you'll choose - firstly test it in your environment and answer yourself if you need this optimisation at all. Do some heavy testing, performance tests and so on.
"Premature optimization is the root of all evil." D. Knuth
If you really have a LOT of entities mapped together, this could indeed lead to a performance problem. This will usually be the case if you have a lot of #OneToManyor #ManyToMany mappings:
#Entity
public class A {
#OneToMany
private List<B> listB;
#ManyToMany
private List<C> listC;
#OneToMany
private List<D> listD;
...
}
However one thing you could do, is using lazy fetching. This means that the loading of a field may be delayed until it is accessed for the first time. You could achieve this by using the fetch attribute:
#Entity
public class A {
#OneToMany(fetch=FetchType.LAZY)
private List<B> listB;
#ManyToMany(fetch=FetchType.LAZY)
private List<C> listC;
#OneToMany(fetch=FetchType.LAZY)
private List<D> listD;
...
}
In the above sample it means listB, listC, listD will not be fetched from the DB until the first access to the list.
The concrete implementation of the lazy fetching depends on your JPA provider.

How to make POCO work with Linq-to-SQL with complex relationships in DDD

I am struggling to find a way to make POCOs work with Linq-to-Sql when my domain model is not table-driven - meaning that my domain objects do not match-up with the database schema.
For example, in my domain layer I have an Appointment object which has a Recurrence property of type Recurrence. This is a base class with several subclasses each based on a specific recurrence pattern.
In my database, it makes no sense to have a separate AppointmentRecurrences table when there is always a one-to-one relationship between the Appointment record and its recurrence. So, the Appointments table has RecurrenceType and RecurrenceValue columns. RecurrenceType has a foreign key relationship to the RecurrenceTypes table because there is a one-to-many relationship between the recurrence type (pattern) and the Appointments table.
Unless there is a way to create the proper mapping between these two models in Linq-to-Sql, I am left with manually resolving the impedence mismatch in code.
This becomes even more difficult when it comes to querying the database using the Specification pattern. For example, if I want to return a list of current appointments, I can easily create a Specification object that uses the following Expression: appt => appt.Recurrence.IsDue. However, this does not translate into the Linq-to-SQL space because the source type of the Expression is not one that L2S recognizes (e.g. it's not the L2S entity).
So how can I create the complex mapping in Linq-to-SQL to support my domain model?
Or, is there a better way to implement the Specification pattern in this case? I'd thought about using interfaces that would be implemented by both my domain object and the L2S entity (through partials) but that's not possible with the impedence mismatch of the two object graphs.
Suggestions?
Unfortunately, Linq to SQL pretty much forces you into a class-per-table model, it does not support mapping a single entity class to several database tables.
Even more unfortunately, there are very few ORM's that will support more complicated mappings, and vanishingly few that do and offer decent LINQ support. The only I'm even remotely sure of is NHibernate (our experiences with Entity Framework rate it really no better than L2S in this regard).
Also, trying to use the specification pattern in LINQ expressions is going to be quite the challenge.
Even with ORM's, and even with a really strong abstracting ORM like NHibernate, there is still a large impedence mismatch to overcome.
This post explains how to use the specification pattern with linq-to-sql. The specifications can be chained together which builds up an expression tree that can be used by your repository and therefore linq-to-sql.
I haven't tried implementing it yet, but the linq-to-entities version is on my to-do list for a project I am currently working on.

Challenges with Linq to sql concept in dot net

Let say if I used the Linq to Sql concept to interact with database from C# language , then what challenges I may be face? means in terms of architecture, performance , type safety, objects orientation etc ..!
Basically Linq to SQL generates a class for each table in your database, complete with relation properties and all, so you will have no problems with type safety. The use of C# partials allows you to add functionality to these objects without messing around with Linq to SQLs autogenerated code. It works pretty well.
As tables map directly to classes and objects, you will either have to accept that your domain layer mirrors the database design directly, or you will have to build some form of abstraction layer above Linq to SQL. The direct mirroring of tables can be especially troublesome with many-to-many relations, which is not directly supported - instead of Orders.Products you get Order.OrderDetails.SelectMany(od => od.Product).
Unlike most other ORMs Linq to SQL does not just dispense objects from the database and allow you to store or update objects by passing them back into the ORM. Instead Linq to SQL tracks the state of objects loaded from the database, and allows you to change the saved state. It is difficult to explain and strange to understand - I recommend you read some of Rick Strahls blogposts on the subject.
Performance wise Linq-to-SQL does pretty good. In benchmarking tests it shows speeds of about 90-95% of what a native SQL reader would provide, and in my experience real world usage is also pretty fast. Like all ORMs Linq to SQL is affected by the N+1 selects problem, but it provides good ways to specify lazy/eager loading depending on context.
Also, by choosing Linq to SQL you choose MSSQL - there do exist third party solutions that allow you to connect to other databases, but last time I checked, none of them appeared very complete.
All in all, Linq to SQL is a good and somewhat easy to learn ORM, which performs okay. If you need features beyond what Linq to SQL is offering, take a look at the new entity framework - it has more features, but is also more complex.
We've had a few challenges, mainly from opening the query construction capability to programmers that don't understand how databases work. Here are a few smells:
//bad scaling
//Query in a loop - causes n roundtrips
// when c roundtrips could have been performed.
List<OrderDetail> od = new List<OrderDetail>();
foreach(Customer cust in customers)
{
foreach(Order o in cust.Orders)
{
od.AddRange(dc.OrderDetails.Where(x => x.OrderId = o.OrderId));
}
}
//no seperation of
// operations intended for execution in the database
// from operations intended to be executed locally
var query =
from c in dc.Customers
where c.City.StartsWith(textBox1.Text)
where DateTime.Parse(textBox2.Text) <= c.SignUpDate
from o in c.Orders
where o.OrderCode == Enum.Parse(OrderCodes.Complete)
select o;
//not understanding when results are pulled into memory
// causing a full table load
List<Item> result = dc.Items.ToList().Skip(100).Take(20).ToList();
Another problem is that one more level of separation from the table structures means indexes are even easier to ignore (that's a problem with any ORM though).

LINQ To Entities and Lazy Loading

In a controversial blog post today, Hackification pontificates on what appears to be a bug in the new LINQ To Entities framework:
Suppose I search for a customer:
var alice = data.Customers.First( c => c.Name == "Alice" );
Fine, that works nicely. Now let’s see
if I can find one of her orders:
var order = ( from o in alice.Orders
where o.Item == "Item_Name"
select o ).FirstOrDefault();
LINQ-to-SQL will find the child row.
LINQ-to-Entities will silently return
nothing.
Now let’s suppose I iterate through
all orders in the database:
foreach( var order in data.Orders ) {
Console.WriteLine( "Order: " + order.Item ); }
And now repeat my search:
var order = ( from o in alice.Orders
where o.Item == "Item_Name"
select o ).FirstOrDefault();
Wow! LINQ-to-Entities is suddenly
telling me the child object exists,
despite telling me earlier that it
didn’t!
My initial reaction was that this had to be a bug, but after further consideration (and backed up by the ADO.NET Team), I realized that this behavior was caused by the Entity Framework not lazy loading the Orders subquery when Alice is pulled from the datacontext.
This is because order is a LINQ-To-Object query:
var order = ( from o in alice.Orders
where o.Item == "Item_Name"
select o ).FirstOrDefault();
And is not accessing the datacontext in any way, while his foreach loop:
foreach( var order in data.Orders )
Is accessing the datacontext.
LINQ-To-SQL actually created lazy loaded properties for Orders, so that when accessed, would perform another query, LINQ to Entities leaves it up to you to manually retrieve related data.
Now, I'm not a big fan of ORM's, and this is precisly the reason. I've found that in order to have all the data you want ready at your fingertips, they repeatedly execute queries behind your back, for example, that linq-to-sql query above might run an additional query per row of Customers to get Orders.
However, the EF not doing this seems to majorly violate the principle of least surprise. While it is a technically correct way to do things (You should run a second query to retrieve orders, or retrieve everything from a view), it does not behave like you would expect from an ORM.
So, is this good framework design? Or is Microsoft over thinking this for us?
Jon,
I've been playing with linq to entities also. It's got a long way to go before it catches up with linq to SQL. I've had to use linq to entities for the Table per Type Inheritance stuff. I found a good article recently which explains the whole 1 company 2 different ORM technologies thing here.
However you can do lazy loading, in a way, by doing this:
// Lazy Load Orders
var alice2 = data.Customers.First(c => c.Name == "Alice");
// Should Load the Orders
if (!alice2.Orders.IsLoaded)
alice2.Orders.Load();
or you could just include the Orders in the original query:
// Include Orders in original query
var alice = data.Customers.Include("Orders").First(c => c.Name == "Alice");
// Should already be loaded
if (!alice.Orders.IsLoaded)
alice.Orders.Load();
Hope it helps.
Dave
So, is this good framework design? Or is Microsoft over thinking this for us?
Well lets analyse that - all the thinking that Microsoft does so we don't have to really makes us lazier programmers. But in general, it does make us more productive (for the most part). So are they overthinking or are they just thinking for us?
If LINQ-to-Sql and LINQ-to-Entities came from two different companies, it would be an acceptable difference - there's no law stating that all LINQ-To-Whatevers have to be implemented the same way.
However, they both come from Microsoft - and we shouldn't need intimate knowledge of their internal development teams and processes to know how to use two different things that, on their face, look exactly the same.
ORMs have their place, and do indeed fill a gap for people trying to get things done, but the ORM uses must know exactly how their ORM gets things done - treating it like an impenetrable black box will only lead you to trouble.
Having lost a few days to this very problem, I sympathize.
The "fault," if there is one, is that there's a reasonable tendency to expect that a layer of abstraction is going to insulate from these kinds of problems. Going from LINQ, to Entities, to the database layer, doubly so.
Having to switch from MS-SQL (using LingToSQL) to MySQL (using LinqToEntities), for instance, one would figure that the LINQ, at least, would be the same if not just to save from the cost of having to re-write program logic.
Having to litter code with .Load() and/or LINQ with .Include() simply because the persistence mechanism under the hood changed seems slightly disturbing, especially with a silent failure. The LINQ layer ought to at least behave consistently.
A number of ORM frameworks use a proxy object to dynamically load the lazy object transparently, rather than just return null, though I would have been happy with a collection-not-loaded exception.
I tend not to buy into the they-did-it-deliberately-for-your-benefit excuse; other ORM frameworks let you annotate whether you want eager or lazy-loading as needed. The same could be done here.
I don't know much about ORMs, but as a user of LinqToSql and LinqToEntities I would hope that when you try to query Orders for Alice it does the extra query for you when you make the linq query (as opposed to not querying anything or querying everything for every row).
It seems natural to expect
from o in alice.Orders where o.Item == "Item_Name" select o
to work given that's one of the reasons people use ORM's in the first place (to simplify data access).
The more I read about LinqToEntities the more I think LinqToSql fulfills most developers needs adequately. I usually just need a one-to-one mappingn of tables.
Even though you shouldn't have to know about Microsoft's internal development teams and processes, fact of the matter is that these two technologies are two completely different beasts.
The design decision for LINQ to SQL was, for simplicity's sake, to implicitly lazy-load collections. The ADO.NET Entity Framework team didn't want to execute queries without the user knowing so they designed the API to be explicitly-loaded for the first release.
LINQ to SQL has been handed over to ADO.NET team and so you may see a consolidation of APIs in the future, or LINQ to SQL get folded into the Entity Framework, or you may see LINQ to SQL atrophy from neglect and eventually become deprecated.