I have built an application with Laravel where I end up having rather deep-nested relationships, that I sometimes need to query. Database is MySQL.
For instance, I want to retrieve all Users who are allowed to read a Book. My data is structured as follows:
A User belongs to 0-n UserGroups through a UserMembership
A UserGroup has 0-n Rights
A Right relates to 1 Book and describes what action can be performed
After looking and browsing, I found that some people were recommending the following way to address nested relationships:
// class Book extends Model
public function readers() {
$bookId= $this->id;
return User::whereHas('memberships', function($m) use($bookId) {
$m->whereHas('group', function($g) use($bookId) {
$g->whereHas('rights', function($r) use($bookId) {
$r->where('resource_id', $bookId)->where('action', 'read');
});
});
});
}
I like that the code makes a lot of sense, but the performance is terrible.. Execution time is 430ms on average for Book::find(967)->readers()->get()
I re-wrote the function as follows:
public function readersNew() {
$bookId= $this->id;
$g = Right::where('resource_id', $bookId)->where('action', 'read')->pluck('group_id');
$uIds = UserMembership::whereIn('group_id', $g)->pluck('user_id');
return User::whereIn('id', $uIds);
}
With this code I achieve an average exec time of 4ms which is obviously much better. But this also looks much less "methodical" in terms of writing nested requests.
I would really like to understand :
why readers()->get() is so much slower than readersNew()->get()
what the best way is to write such requests
First of all, good job for improving the performance without even knowing why is it happening :)
Q1: why readers()->get() is so much slower than readersNew()->get()
Your readers()->get() function traverse in the hierarchy up to down which is why it makes more sense but its slower. It is same as running 3 foreach nested loops, it first returns all users which has membership and then iterate for each user and finds all the groups its belongs and then iterates each group and find the rights for each and then iterates each rights and gets the desired entry by matching resource_id and action.
whereas your readersNew()->get() traverse in the hierarchy down to up, that is why its faster. It first extracts the target group based upon the matched right and then extracts the membership, user associated with that group, hence faster.
Q2 what the best way is to write such requests
The approach readersNew()->get() is the best, you could just change your writing conventions to make more sense if you like:
public function readersNew() {
$bookId= $this->id;
$targetGroup = Right::where(['resource_id' => $bookId, 'action' => 'read'])->pluck('group_id');
$associatedUserIds = UserMembership::whereIn('group_id', $g)->pluck('user_id');
return User::whereIn('id', $associatedUserIds);
}
I hope it helps
Related
This is related (but fairly independent) to my question here: Why SELECT N + 1 with no foreign keys and LINQ?
I've tried using DataLoadOptions to force eager loading, but I'm not getting it to work.
I'm manually writing my LinqToSQL mappings and was first following this tutorial: http://www.codeproject.com/Articles/43025/A-LINQ-Tutorial-Mapping-Tables-to-Objects
Now I've found this tutorial: http://msdn.microsoft.com/en-us/library/bb386950.aspx
There's at least one major difference that I can spot. The first tutorial suggest returning ICollection's and the second EntitySet's. Since I'm having issues I tried to switch my code to return EntitySet's, but then I got issue with needing to reference System.Data.Linq in my Views and Controllers. I tried to do that, but didn't get it to work. I'm also not sure it's a good idea.
At this point, I just want to know which return type I'm supposed to use for a good design? Can I have a good design and still be able to force eager loading in specific cases?
A lot of trial and error finally lead to the solution. It's fine to return ICollection or IList, or in some cases IEnumerable. Some think returning EntitySet or IQueryable is a bad idea, and I agree because it exposes to much of the datasource/technology. Some thing returning IEnumerable is a bad idea and it seems like it depends. The problem beeing that it can be used for lazy loading, which may or may not be a good thing.
One reoccuring issue is that of returning paged results with a count for the total items outside the page. This can be solved by creating a CollectionPage<T> ( http://www.codetunnel.com/blog/post/104/how-to-properly-return-a-paged-result-set-from-your-repository )
More on what to return from repositories here:
http://www.codetunnel.com/blog/post/103/should-you-return-iqueryablet-from-your-repositories
http://www.shawnmclean.com/blog/2011/06/iqueryable-vs-ienumerable-in-the-repository-pattern/
IEnumerable vs IQueryable for Business Logic or DAL return Types
List, IList, IEnumerable, IQueryable, ICollection, which is most flexible return type?
Even more important, DataLoadOptions can do the eager loading! I've now restructured my code so much I'm not 100% sure what I did wrong to cause DataLoadOptions not to work. As far as I've gathered I should get an exception if I tried to add it to the DataContext after the DataContext has been used, which it didn't. What I've found out though is to think in the Unit of Work-pattern. However, for my needs (and because I don't want to return EntitySet or IQueryable from my repositories) I'm not going to implement a cross-repository Unit of Work. Instead I'm just thinking about my repository methods as their own small Unit of Work. I'm sure there's bad things about this (for instance it might cause more round-trips to the database in some update scenarios), and in the future I might reconcider. However it's a simple clean solution.
More info here:
https://stackoverflow.com/a/7941017/1312533
http://www.asp.net/mvc/tutorials/getting-started-with-ef-using-mvc/implementing-the-repository-and-unit-of-work-patterns-in-an-asp-net-mvc-application
This is what I ended up with in my repository:
public class SqlLocalizedCategoriesRepository : ILocalizedCategoriesRepository
{
private string connectionString;
private HttpContextBase httpContext;
public SqlLocalizedCategoriesRepository(string connectionString, HttpContextBase httpContext) // Injected with Inversion of Control
{
this.connectionString = connectionString;
this.httpContext = httpContext;
}
public CollectionPage<Product> GetProductsByLocalizedCategory(string category, int countryId, int page, int pageSize)
{
// Setup a DataContext
using (var context = new DataContext(connectionString)) // Because DataContext implements IDisposable it should be disposed of
{
var dlo = new System.Data.Linq.DataLoadOptions();
dlo.LoadWith<Product>(p => p.ProductSubs); // In this case I want all ProductSubs for the Products, so I eager load them with LoadWith. There's also AssociateWith which can filter what is eager loaded.
context.LoadOptions = dlo;
context.Log = (StringWriter)httpContext.Items["linqToSqlLog"]; // For logging queries, a must so you can see what LINQ to SQL generates
// Query the DataContext
var cat = (from lc in context.GetTable<LocalizedCategory>()
where lc.CountryID == countryId && lc.Name == category
select lc.Category).First(); // Gets the category into memory. Might be some way to not get it into memory by combining with the next query, but in my case my next step is that I'm also going to need the Category anyway so it's not worth doing because I'm going to restructure this code to take a categoryId parameter instead of the category parameter.
var products = (from p in context.GetTable<Product>()
where p.ProductCategories.Any(pm => pm.Category.CategoryID == cat.CategoryID)
select p); // Generates a single query to get the the relevant products, which with DataLoadOptions loads related ProductSubs. It's important that this is just a query and not loaded into memory since we're going to split it into pages.
// Return the results
var pageOfItems = new CollectionPage<Product>
{
Items = products.Skip(pageSize * (page - 1)).Take(pageSize).ToList(), // Gets the page of products into memory
TotalItems = products.Count(), // Get to total count of items belonging to the Category
CurrentPage = page
};
return pageOfItems;
}
}
}
I asked a similar question a while back: Using the Data Mapper Pattern, Should the Entities (Domain Objects) know about the Mapper? However, it was generic and I'm really interested in how to accomplish a few things with Doctrine2 specifically.
Here's a simple example model: Each Thing can have a Vote from a User, a User may cast more than one Vote but only the last Vote counts. Because other data (Msssage, etc) is related to the Vote, when the second Vote is placed the original Vote can't just be updated, it needs to be replaced.
Currently Thing has this function:
public function addVote($vote)
{
$vote->entity = $this;
}
And Vote takes care of setting up the relationship:
public function setThing(Model_Thing $thing)
{
$this->thing = $thing;
$thing->votes[] = $this;
}
It seems to me that ensuring a User only has the last Vote counted is something the Thing should ensure, and not some service layer.
So to keep that in the Model, the new Thing function:
public function addVote($vote)
{
foreach($this->votes as $v){
if($v->user === $vote->user){
//remove vote
}
}
$vote->entity = $this;
}
So how do I remove the Vote from within the Domain Model? Should I relax Vote::setThing() to accept a NULL? Should I involve some kind of service layer that Thing can use to remove the vote? Once the votes start accumulating, that foreach is going to be slow - should a service layer be used to allow Thing to search for a Vote without having to load the entire collection?
I'm definitely leaning toward using a light service layer; however, is there a better way to handle this type of thing with Doctrine2, or am I heading in the right direction?
I vote for the service layer. I've often struggled with trying to add as much logic on the Entity itself, and simply frustrated myself. Without access to the EntityManager, you're simply not able to perform query logic, and you'll find yourself using a lot of O(n) operations or lazy-loading entire relationship sets when you only need a few records (which is super lame when compared to all the advantages DQL offers).
If you need some assistance getting over the idea that the Anemic Domain Model is always an anti-pattern, see this presentation by Matthew Weier O'Phinney or this question.
And while I could be misinterpreting the terminology, I'm not completely convinced that Entities have to be the only objects allowed in your Domain Model. I would easily consider that the sum of Entity objects and their Services constitutes the Model. I think the anti-pattern arises when you end up writing a service layer that pays little to no attention to separation of concerns.
I've often flirted with the idea of having all my entity objects proxy some methods to the service layer:
public function addVote($vote)
{
$this->_service->addVoteToThing($vote, $thing);
}
However, since Doctrine does not have any kind callback event system on object hydration, I haven't found an elegant way to inject the service object.
My advice would be to put all the query logic into an EntityRepository and then make an interface out of it sort of like:
class BlogPostRepository extends EntityRepository implements IBlogPostRepository {}
that way you can use the interface in your unit-tests for the service objects and no dependency on the EntityManager is required.
I'm used to EF because it usually works just fine as long as you get to know it better, so you know how to optimize your queries. But.
What would you choose when you know you'll be working with large quantities of data? I know I wouldn't want to use EF in the first place and cripple my application. I would write highly optimised stored procedures and call those to get certain very narrow results (with many joins so they probably won't just return certain entities anyway).
So I'm a bit confused which DAL technology/library I should use? I don't want to use SqlConnection/SqlCommand way of doing it, since I would have to write much more code that's likely to hide some obscure bugs.
I would like to make bug surface as small as possible and use a technology that will accommodate my process not vice-a-versa...
Is there any library that gives me the possibility to:
provide the means of simple SP execution by name
provide automatic materialisation of returned data so I could just provide certain materialisers by means of lambda functions?
like:
List<Person> result = Context.Execute("StoredProcName", record => new Person{
Name = record.GetData<string>("PersonName"),
UserName = record.GetData<string>("UserName"),
Age = record.GetData<int>("Age"),
Gender = record.GetEnum<PersonGender>("Gender")
...
});
or even calling stored procedure that returns multiple result sets etc.
List<Question> result = Context.ExecuteMulti("SPMultipleResults", q => new Question {
Id = q.GetData<int>("QuestionID"),
Title = q.GetData<string>("Title"),
Content = q.GetData<string>("Content"),
Comments = new List<Comment>()
}, c => new Comment {
Id = c.GetData<int>("CommentID"),
Content = c.GetData<string>("Content")
});
Basically this last one wouldn't work, since this one doesn't have any knowledge how to bind both together... but you get the point.
So to put it all down to a single question: Is there a DAL library that's optimised for stored procedure execution and data materialisation?
Business Layer Toolkit might be exactly what's needed here. It's a lightweight ORM tool that supports lots of scenarios including multiple result sets although they seem very complicated to do.
This really is an architectural question. I feel like I'm going about this the wrong way and wanted some input on best practices.
Let's say I have a Transactions table and a TransactionTypes table. Views will submit the appropriate transaction data which is processed in my controller. The problem is that the logic in the controller may be a bit complex and the TransactionType is not provided by the view inputs, but computed in the controller. (Which may be part of my problem).
For example, let's say that the View submits a ViewModel that would map to a TransactionType of "Withdrawal". However, the controller detects that it needs to change this to an Overdraft" as funds aren't sufficient. What I don't want to do is this:
transaction.TypeId =
DataContext.TransactionTypes.Single(x => x.type == "Overdraft").id;
... as I'll be embedding string literals in my code. Right?
OK, so I could map the values to strong types that would allow me to do this:
class TranTypes
{
public const long Deposit = 1;
public const long Withdrawal = 2;
public const long Overdraft = 3;
}
...
transaction.TypeId =
DataContext.TransactionTypes.Single(x => x.id == TranTypes.Overdraft);
Now, if my lookups change in the DB, I have one place that I can update the mappings and my controllers still have insight into the model.
But this feels awkward too.
I feel like what I really want is for the Linq To SQL auto-code generation to be able to generate the association so I can just refer to strongly-typed names (Deposit, Withdrawal, and Draft) and be assured that it will always return the current values for these in the database. Changes made to the lookup table during runtime would result in problems, but it still seems so much cleaner.
What should I be digesting to understand how best to structure this?
Thanks in advance for enlarging my brain. :-)
Dont worry about whether you have an embedded string or a strongy typed value - either is perfectly acceptable - which ever makes sense fror your database design.
What you should do, however, is write a single routine in a repository or helper class that you can then call from whatever controller or action requires it - if anything changes there is only one place to make the change.
One simple approach I've always liked is the Enum approach.
public enum TransactionType {
Overdraft
}
transaction.TypeId =
DataContext.TransactionTypes.Single(x => x.type == TransactionType.Overdraft.ToString()).id;
It's pretty simple, but I like it.
A more sophisticated approach (not sure if this works with Linq to SQL, but more sophisticated ORMs support it (like EF, DO .NET, LLBLGen, etc.) is to use inheritance in your data model, with discriminators.
That is, have a subclass of TransactionType called OverdraftTransactionType with a discriminator (the key) that identifies different types of TransactionTypes from each other.
Random link:
http://weblogs.asp.net/zeeshanhirani/archive/2008/08/16/single-table-inheritance-in-entity-framework.aspx
I am new to domain models, POCO and DDD, so I am still trying to get my head around a few ideas.
One of the things I could not figure out yet is how to keep my domain models simple and storage-agnostic but still capable of performing some queries over its data in a rich way.
For instance, suppose that I have an entity Order that has a collection of OrdemItems. I want to get the cheapest order item, for whatever reason, or maybe a list of order items that are not currently in stock. What I don't want to do is to retrieve all order items from storage and filter later (too expensive) so I want to end up having a db query of the type "SELECT .. WHERE ITEM.INSTOCK=FALSE" somehow. I don't want to have that SQL query in my entity, or any variation of if that would tie me into a specific platform, like NHibernate queries on Linq2SQL. What is the common solution in that case?
Entities are the "units" of a domain. Repositories and services reference them, not vice versa. Think about it this way: do you carry the DMV in your pocket?
OrderItem is not an aggregate root; it should not be accessible through a repository. Its identity is local to an Order, meaning an Order will always be in scope when talking about OrderItems.
The difficulty of finding a home for the queries leads me to think of services. In this case, they would represent something about an Order that is hard for an Order itself to know.
Declare the intent in the domain project:
public interface ICheapestItemService
{
OrderItem GetCheapestItem(Order order);
}
public interface IInventoryService
{
IEnumerable<OrderItem> GetOutOfStockItems(Order order);
}
Declare the implementation in the data project:
public class CheapestItemService : ICheapestItemService
{
private IQueryable<OrderItem> _orderItems;
public CheapestItemService(IQueryable<OrderItem> orderItems)
{
_orderItems = orderItems;
}
public OrderItem GetCheapestItem(Order order)
{
var itemsByPrice =
from item in _orderItems
where item.Order == order
orderby item.Price
select item;
return itemsByPrice.FirstOrDefault();
}
}
public class InventoryService : IInventoryService
{
private IQueryable<OrderItem> _orderItems;
public InventoryService(IQueryable<OrderItem> orderItems)
{
_orderItems = orderItems;
}
public IEnumerable<OrderItem> GetOutOfStockItems(Order order)
{
return _orderItems.Where(item => item.Order == order && !item.InStock);
}
}
This example works with any LINQ provider. Alternatively, the data project could use NHibernate's ISession and ICriteria to do the dirty work.
Domain objects should be independent of storage, you should use the Repostiory pattern, or DAO to persist the objects. That way you are enforcing separation of concerns, the object itself should not know about how it is stored.
Ideally, it would be a good idea to put query construction inside of the repository, though I would use an ORM inside there.
Here's Martin Fowler's definition of the Repository Pattern.
As I understand this style of design, you would encapsulate the query in a method of an OrderItemRepository (or perhaps more suitably OrderRepository) object, whose responsibility is to talk to the DB on one side, and return OrderItem objects on the other side. The Repository hides details of the DB from consumers of OrderItem instances.
I would argue that it doesn't make sense to talk about "an Order that contains only the OrderItems that are not in stock". An "Order" (I presume) represents the complete list of whatever the client ordered; if you're filtering that list you're no longer dealing with an Order per se, you're dealing with a filtered list of OrderItems.
I think the question becomes whether you really want to treat Orders as an Aggregate Root, or whether you want to be able to pull arbitrary lists of OrderItems out of your data access layer as well.
You've said filtering items after they've come back from the database would be too expensive, but unless you're averaging hundreds or thousands of OrderItems for each order (or there's something else especially intensive about dealing with lots of OrderItems) you may be trying to optimize prematurely and making things more difficult than they need to be. I think if you can leave Order as the aggregate root and filter in your domain logic, your model will be cleaner to work with.
If that's genuinely not the case and you need to filter in the database, then you may want to consider having a separate OrderItem repository that would provide queries like "give me all of the OrderItems for this Order that are not in stock". You would then return those as an IList<OrderItem> (or IEnumerable<OrderItem>), since they're not a full Order, but rather some filtered collection of OrderItems.
In the service layer.