Related
This really is an architectural question. I feel like I'm going about this the wrong way and wanted some input on best practices.
Let's say I have a Transactions table and a TransactionTypes table. Views will submit the appropriate transaction data which is processed in my controller. The problem is that the logic in the controller may be a bit complex and the TransactionType is not provided by the view inputs, but computed in the controller. (Which may be part of my problem).
For example, let's say that the View submits a ViewModel that would map to a TransactionType of "Withdrawal". However, the controller detects that it needs to change this to an Overdraft" as funds aren't sufficient. What I don't want to do is this:
transaction.TypeId =
DataContext.TransactionTypes.Single(x => x.type == "Overdraft").id;
... as I'll be embedding string literals in my code. Right?
OK, so I could map the values to strong types that would allow me to do this:
class TranTypes
{
public const long Deposit = 1;
public const long Withdrawal = 2;
public const long Overdraft = 3;
}
...
transaction.TypeId =
DataContext.TransactionTypes.Single(x => x.id == TranTypes.Overdraft);
Now, if my lookups change in the DB, I have one place that I can update the mappings and my controllers still have insight into the model.
But this feels awkward too.
I feel like what I really want is for the Linq To SQL auto-code generation to be able to generate the association so I can just refer to strongly-typed names (Deposit, Withdrawal, and Draft) and be assured that it will always return the current values for these in the database. Changes made to the lookup table during runtime would result in problems, but it still seems so much cleaner.
What should I be digesting to understand how best to structure this?
Thanks in advance for enlarging my brain. :-)
Dont worry about whether you have an embedded string or a strongy typed value - either is perfectly acceptable - which ever makes sense fror your database design.
What you should do, however, is write a single routine in a repository or helper class that you can then call from whatever controller or action requires it - if anything changes there is only one place to make the change.
One simple approach I've always liked is the Enum approach.
public enum TransactionType {
Overdraft
}
transaction.TypeId =
DataContext.TransactionTypes.Single(x => x.type == TransactionType.Overdraft.ToString()).id;
It's pretty simple, but I like it.
A more sophisticated approach (not sure if this works with Linq to SQL, but more sophisticated ORMs support it (like EF, DO .NET, LLBLGen, etc.) is to use inheritance in your data model, with discriminators.
That is, have a subclass of TransactionType called OverdraftTransactionType with a discriminator (the key) that identifies different types of TransactionTypes from each other.
Random link:
http://weblogs.asp.net/zeeshanhirani/archive/2008/08/16/single-table-inheritance-in-entity-framework.aspx
I know this could be opinion, but I'm looking for best practices.
As I understand, IQueryable<T> implements IEnumerable<T>, so in my DAL, I currently have method signatures like the following:
IEnumerable<Product> GetProducts();
IEnumerable<Product> GetProductsByCategory(int cateogoryId);
Product GetProduct(int productId);
Should I be using IQueryable<T> here?
What are the pros and cons of either approach?
Note that I am planning on using the Repository pattern so I will have a class like so:
public class ProductRepository {
DBDataContext db = new DBDataContext(<!-- connection string -->);
public IEnumerable<Product> GetProductsNew(int daysOld) {
return db.GetProducts()
.Where(p => p.AddedDateTime > DateTime.Now.AddDays(-daysOld ));
}
}
Should I change my IEnumerable<T> to IQueryable<T>? What advantages/disadvantages are there to one or the other?
It depends on what behavior you want.
Returning an IList<T> tells the caller that they've received all of the data they've requested
Returning an IEnumerable<T> tells the caller that they'll need to iterate over the result and it might be lazily loaded.
Returning an IQueryable<T> tells the caller that the result is backed by a Linq provider that can handle certain classes of queries, putting the burden on the caller to form a performant query.
While the latter gives the caller a lot of flexibility (assuming your repository fully supports it), it's the hardest to test and, arguably, the least deterministic.
One more thing to think about: where is your paging/sorting support? If you are providing paging support within your repository, returning IEnumerable<T> is fine. If you are paging outside of your repository (like in the controller or service layer) then you really want to use IQueryable<T> because you don't want to load the entire dataset into memory before it's paged.
HUUUUGGGE difference. I see this quite a bit.
You build up an IQueryable before it hits the database. The IQueryable only hits the DB once an eager function is called (.ToList() for example) or you actually try to pull values out. IQueryable = lazy.
An IEnumerable will execute your lambda against the DB right away. IEnumerable = eager.
As for which to use with the Repository pattern, I believe it's eager. I usually see ILists being passed but someone else will need to iron that out for you. EDIT - You usually see IEnumerable instead of IQueryable because you don't want layers past your Repository A) determining when the database hit will happen or B) Adding any logic to the joins outside the Repository
There is a very good LINQ video that I enjoy a lot- it hits more than just IEnumerable v IQueryable, but it really has some fantastic insight.
http://channel9.msdn.com/posts/matthijs/LINQ-Tips-Tricks-and-Optimizations-by-Scott-Allen/
You can use IQueryable and accept that someone could create a scenario where a SELECT N+1 could happen. This is a disadvantage, along with the fact that you may end up with code that is specific to your repository implementation in the layers above your repository. The advantage of this is that you are allowing the delegation common operations like paging and sorting to be expressed outside of your respository, therefore alleviating it of such concerns. It is also more flexible if you need to join the data with other database tables, as the query will remain an expression, so can be added to before its resolved into a query and hits the database.
The alternative is to lock down your repository so that it returns materialised lists by calling ToList(). With the example of paging and sorting, you will need to pass in skip, take and a sort expression as parameters to the methods of your repository, and use the parameters to return only a window of results. This means that the repository is taking on the responsibility of paging and sorting, and all of the projection of your data.
This is a bit of a judgement call, do you give your application the power of linq, and have less complexity in the repository, or do you control your data access. For me it depends on the number of queries associated with each entity, and combinations of entities, and where I want to manage that complexity.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm working on a highly-specialized search engine for my database. When the user submits a search request, the engine splits the search terms into an array and loops through. Inside the loop, each search term is examined against several possible scenarios to determine what it could mean. When a search term matches a scenario, a WHERE condition is added to the SQL query. Some terms can have multiple meanings, and in those cases the engine builds a list of suggestions to help the user to narrow the results.
Aside: In case anyone is interested to know, ambigous terms are refined by prefixing them with a keyword. For example, 1954 could be a year or a serial number. The engine will suggest both of these scenarios to the user and modify the search term to either year:1954 or serial:1954.
Building the SQL query and the refine suggestions in the same loop feels somehow wrong to me, but to separate them would add more overhead because I would have to loop through the same array twice and test all the same scenarios twice. What is the better course of action?
I'd probably factor out the two actions into their own functions. Then you'd have
foreach (term in terms) {
doThing1();
doThing2();
}
which is nice and clean.
No. It's not bad. I would think looping twice would be more confusing.
Arguably some of the tasks might be put into functions if the tasks are decoupled enough from each other, however.
I don't think it makes sense to add multiple loops for the sake of theoretical purity, especially given that if you're going to add a loop against multiple scenarios you're going from an O(n) -> O(n*#scenarios). Another way to break this out without falling into the "God Method" trap would be to have a method that runs a single loop and returns an array of matches, and another that runs the search for each element in the match array.
Using the same loop seems as a valid optimization to me, try to keep the code of the two tasks independent so this optimization can be changed if necessary.
Your scenario fits the builder pattern and if each operation is fairly complex then it would serve you well to break things up a bit. This is waaaaaay over engineering if all your logic fits in 50 lines of code, but if you have dependencies to manage and complex logic, then you should be using a proven design pattern to achieve separation of concerns. It might look like this:
var relatedTermsBuilder = new RelatedTermsBuilder();
var whereClauseBuilder = new WhereClauseBuilder();
var compositeBuilder = new CompositeBuilder()
.Add(relatedTermsBuilder)
.Add(whereClauseBuilder);
var parser = new SearchTermParser(compositeBuilder);
parser.Execute("the search phrase");
string[] related = relatedTermsBuilder.Result;
string whereClause = whereClauseBuilder.Result;
The supporting objects would look like:
public interface ISearchTermBuilder {
void Build(string term);
}
public class SearchTermParser {
private readonly ISearchTermBuilder builder;
public SearchTermParser(ISearchTermBuilder builder) {
this.builder = builder;
}
public void Execute(string phrase) {
foreach (var term in Parse(phrase)) {
builder.Build(term);
}
}
private static IEnumerable<string> Parse(string phrase) {
throw new NotImplementedException();
}
}
I'd call it a code smell, but not a very bad one. I would separate out the functionality inside the loop, putting one of the things first, and then after a blank line and/or comment the other one.
I would look to it as if it were an instance of the observer pattern: each time you loop you raise an event, and as many observers as you want can subscribe to it. Of course it would be overkill to do it as the pattern but the similarities tell me that it is just fine to execute two or three or how many actions you want.
I don't think it's wrong to make two actions in one loop. I'd even suggest to make two methods that are called from inside the loop, like:
for (...) {
refineSuggestions(..)
buildQuery();
}
On the other hand, O(n) = O(2n)
So don't worry too much - it isn't such a performance sin.
You could certainly run two loops.
If a lot of this is business logic, you could also create some kind of data structure in the first loop, and then use that to generate the SQL, something like
search_objects = []
loop through term in terms
search_object = {}
search_object.string = term
// suggestion & rules code
search_object.suggestion = suggestion
search_object.rule = { 'contains', 'term' }
search_objects.push(search_object)
loop through search_object in search_objects
//generate SQL based on search_object.rule
This at least saves you from having to do if/then/elses in both loops, and I think it is a bit cleaner to move SQL code creation outside of the first loop.
If the things you're doing in the loop are related, then fine. It probably makes sense to code "the stuff for each iteration" and then wrap it in a loop, since that;s probably how you think of it in your head.
Add a comment and if it gets too long, look at splitting it or using simple utility methods.
I think one could argue that this may not exactly be language-agnostic; it's also highly dependent on what you're trying to accomplish. If you're putting multiple tasks in a loop in such a way that they cannot be easily parallelized by the compiler for a parallel environment, then it is definitely a code smell.
While building by DAL Repository, I stumbled upon a concept called Pipes and Filters. I read about it here, here and saw a screencast from here. I am still not sure how to go about implementing this pattern. Theoretically all sounds good , but how do we really implement this in an enterprise scenario?
I will appreciate, if you have any resources,tips or examples ro explanation for this pattern in context to the data mappers/ORM mentioned in the question.
Thanks in advance!!
Ultimately, LINQ on IEnumerable<T> is a pipes and filters implementation. IEnumerable<T> is a streaming API - meaning that data is lazily returns as you ask for it (via iterator blocks), rather than loading everything at once, and returning a big buffer of records.
This means that your query:
var qry = from row in source // IEnumerable<T>
where row.Foo == "abc"
select new {row.ID, row.Name};
is:
var qry = source.Where(row => row.Foo == "abc")
.Select(row = > new {row.ID, row.Name});
as you enumerate over this, it will consume the data lazily. You can see this graphically with Jon Skeet's Visual LINQ. The only things that break the pipe are things that force buffering; OrderBy, GroupBy, etc. For high volume work, Jon and myself worked on Push LINQ for doing aggregates without buffering in such scenarios.
IQueryable<T> (exposed by most ORM tools - LINQ-to-SQL, Entity Framework, LINQ-to-NHibernate) is a slightly different beast; because the database engine is going to do most of the heavy lifting, the chances are that most of the steps are already done - all that is left is to consume an IDataReader and project this to objects/values - but that is still typically a pipe (IQueryable<T> implements IEnumerable<T>) unless you call .ToArray(), .ToList() etc.
With regard to use in enterprise... my view is that it is fine to use IQueryable<T> to write composable queries inside the repository, but they shouldn't leave the repository - as that would make the internal operation of the repository subject to the caller, so you would be unable to properly unit test / profile / optimize / etc. I've taken to doing clever things in the repository, but return lists/arrays. This also means my repository stays unaware of the implementation.
This is a shame - as the temptation to "return" IQueryable<T> from a repository method is quite large; for example, this would allow the caller to add paging/filters/etc - but remember that they haven't actually consumed the data yet. This makes resource management a pain. Also, in MVC etc you'd need to ensure that the controller calls .ToList() or similar, so that it isn't the view that is controlling data access (otherwise, again, you can't unit test the controller properly).
A safe (IMO) use of filters in the DAL would be things like:
public Customer[] List(string name, string countryCode) {
using(var ctx = new CustomerDataContext()) {
IQueryable<Customer> qry = ctx.Customers.Where(x=>x.IsOpen);
if(!string.IsNullOrEmpty(name)) {
qry = qry.Where(cust => cust.Name.Contains(name));
}
if(!string.IsNullOrEmpty(countryCode)) {
qry = qry.Where(cust => cust.CountryCode == countryCode);
}
return qry.ToArray();
}
}
Here we've added filters on-the-fly, but nothing happens until we call ToArray. At this point, the data is obtained and returned (disposing the data-context in the process). This can be fully unit tested. If we did something similar but just returned IQueryable<T>, the caller might do something like:
var custs = customerRepository.GetCustomers()
.Where(x=>SomeUnmappedFunction(x));
And all of a sudden our DAL starts failing (cannot translate SomeUnmappedFunction to TSQL, etc). You can still do a lot of interesting things in the repository, though.
The only pain point here is that it might push you to have a few overloads to support different calling patterns (with/without paging, etc). Until optional/named parameters arrives, I find the best answer here is to use extension methods on the interface; that way, I only need one concrete repository implementation:
class CustomerRepository {
public Customer[] List(
string name, string countryCode,
int? pageSize, int? pageNumber) {...}
}
interface ICustomerRepository {
Customer[] List(
string name, string countryCode,
int? pageSize, int? pageNumber);
}
static class CustomerRepositoryExtensions {
public static Customer[] List(
this ICustomerRepository repo,
string name, string countryCode) {
return repo.List(name, countryCode, null, null);
}
}
Now we have virtual overloads (as extension methods) on ICustomerRepository - so our caller can use repo.List("abc","def") without having to specify the paging.
Finally - without LINQ, using pipes and filters becomes a lot more painful. You'll be writing some kind of text based query (TSQL, ESQL, HQL). You can obviously append strings, but it isn't very "pipe/filter"-ish. The "Criteria API" is a bit better - but not as elegant as LINQ.
I'm currently working on a class that calculates the difference between two objects. I'm trying to decide what the best design for this class would be. I see two options:
1) Single-use class instance. Takes the objects to diff in the constructor and calculates the diff for that.
public class MyObjDiffer {
public MyObjDiffer(MyObj o1, MyObj o2) {
// Calculate diff here and store results in member variables
}
public boolean areObjectsDifferent() {
// ...
}
public Vector getOnlyInObj1() {
// ...
}
public Vector getOnlyInObj2() {
// ...
}
// ...
}
2) Re-usable class instance. Constructor takes no arguments. Has a "calculateDiff()" method that takes the objects to diff, and returns the results.
public class MyObjDiffer {
public MyObjDiffer() { }
public DiffResults getResults(MyObj o1, MyObj o2) {
// calculate and return the results. Nothing is stored in this class's members.
}
}
public class DiffResults {
public boolean areObjectsDifferent() {
// ...
}
public Vector getOnlyInObj1() {
// ...
}
public Vector getOnlyInObj2() {
// ...
}
}
The diffing will be fairly complex (details don't matter for the question), so there will need to be a number of helper functions. If I take solution 1 then I can store the data in member variables and don't have to pass everything around. It's slightly more compact, as everything is handled within a single class.
However, conceptually, it seems weird that a "Differ" would be specific to a certain set of results. Option 2 splits the results from the logic that actually calculates them.
EDIT: Option 2 also provides the ability to make the "MyObjDiffer" class static. Thanks kitsune, I forgot to mention that.
I'm having trouble seeing any significant pro or con to either option. I figure this kind of thing (a class that just handles some one-shot calculation) has to come up fairly often, and maybe I'm missing something. So, I figured I'd pose the question to the cloud. Are there significant pros or cons to one or the other option here? Is one inherently better? Does it matter?
I am doing this in Java, so there might be some restrictions on the possibilities, but the overall question of design is probably language-agnostic.
Use Object-Oriented Programming
Use option 2, but do not make it static.
The Strategy Pattern
This way, an instance MyObjDiffer can be passed to anyone that needs a Strategy for computing the difference between objects.
If, down the road, you find that different rules are used for computation in different contexts, you can create a new strategy to suit. With your code as it stands, you'd extend MyObjDiffer and override its methods, which is certainly workable. A better approach would be to define an interface, and have MyObjDiffer as one implementation.
Any decent refactoring tool will be able to "extract an interface" from MyObjDiffer and replace references to that type with the interface type at some later time if you want to delay the decision. Using "Option 2" with instance methods, rather than class procedures, gives you that flexibility.
Configure an Instance
Even if you never need to write a new comparison method, you might find that specifying options to tailor the behavior of your basic method is useful. If you think about using the "diff" command to compare text files, you'll remember how many different options there are: whitespace- and case-sensitivity, output options, etc. The best analog to this in OO programming is to consider each diff process as an object, with options set as properties on that object.
You want solution #2 for a number of reasons. And you don't want it to be static.
While static seems like fun, it's a maintenance nightmare when you come up with either (a) a new algorithm with the same requirements, or (b) new requirements.
A first-class object (without much internal state) allows you to evolve into a class hierarchy of different differs -- some slower, some faster, some with more memory, some with less memory, some for old requirements, some for new requirements.
Some of your differs may wind up with complicated internal state or memory, or incremental diffing or hash-code-based diffing. All kinds of possibilities might exist.
A reusable object allows you to pick your differ at application start-up time using a properties file.
In the long run, you want to minimize the number of new operations that are scattered throughout your application. You'd like to have your new operations focused in places where you can find and control them. To change from old differ algorithm to new differ algorithm, you'd like to do the following.
Write the new subclass.
Update a properties file to start using the new subclass.
And be completely confident that there wasn't some hidden d= new MyObjDiffer( x, y ) tucked away that you didn't know about.
You want to use d= theDiffer.getResults( x, y ) everywhere.
What the Java libraries do is they have a DifferFactory that's static. The factor checks the properties and emits the correct Differ.
DifferFactory df= new DifferFactory();
MyObjDiffer mod= df.getDiffer();
mod.getResults( x, y );
The Factory typically caches the single copy -- it doesn't have to physically read the properties every time getDiffer is called.
This design gives you ultimate flexibility in the future. At it looks like other parts of the Java libraries.
I can't really say I have firm reasons why it's the 'best' approach, but I usually write classes for objects that you can have a 'conversation' with. So it would be like your 'single use' option 1, except that by calling a setter, you would 'reset' it for another use.
Rather than supplying the implementation (which is pretty obvious), here's a sample invocation:
MyComparer cmp = new MyComparer(obj1, obj2);
boolean match = cmp.isMatch();
cmp.setSubjects(obj3,obj4);
List uniques1 = cmp.getOnlyIn(MyComparer.FIRST);
cmd.setSubject(MyComparer.SECOND,obj5);
List uniques = cmp.getOnlyIn(MyComparer.SECOND);
... and so on.
This way, the caller gets to decide whether they want to instantiate lots of objects, or keep reusing the one.
It's particularly useful if the object needs a lot of setup. Lets say your comparer takes options. There could be many. Set it up once, then use it many times.
// set up cmp with options and the master object
MyComparer cmp = new MyComparer();
cmp.setIgnoreCase(true);
cmp.setIgnoreTrailingWhitespace(false);
cmp.setSubject(MyComparer.FIRST,canonicalSubject);
// find items that are in the testSubjects objects,
// but not in the master.
List extraItems = new ArrayList();
for (Iterator it=testSubjects.iterator(); it.hasNext(); ) {
cmp.setSubject(MyComparer.SECOND,it.next());
extraItems.append(cmp.getOnlyIn(MyComparer.SECOND);
}
Edit: BTW I called it MyComparer rather than MyDiffer because it seemed more natural to have an isMatch() method than an isDifferent() method.
I'd take numero 2 and reflect on whether I should make this static.
Why are you writing a class whose only purpose is to calculate the difference between two objects? That sounds like a task either for a static function or a member function of the class.
I would go for a static constructor method, something like.
Diffs diffs = Diffs.calculateDifferences(foo, bar);
In this way, it's clear when you're calculating the differences, and there is no way to misuse the object's interface.
I like the idea of explicitly starting the work rather than having it occur on instantiation. Also, I think the results are substantial enough to warrant their own class. Your first design isn't as clean to me. Someone using this class would have to understand that after performing the calculation some other class members are now holding the results. Option 2 is more clear about what is happening.
It depends on how you're going to use diffs. In my mind, it makes sense to treat diffs as a logical entity because it needs to support some operations like 'getDiffString()', or 'numHunks()', or 'apply()'. I might take your first one and do it more like this:
public class Diff
{
public Diff(String path1, String path2)
{
// get diff
if (same)
throw new EmptyDiffException();
}
public String getDiffString()
{
}
public int numHunks()
{
}
public bool apply(String path1)
{
// try to apply diff as patch to file at path1. Return
// whether the patch applied successfully or not.
}
public bool merge(Diff diff)
{
// similar to apply(), but do merge yourself with another diff
}
}
Using a diff object like this also might lend itself to things like keeping a stack of patches, or serializing to a compressed archive, maybe an "undo" queue, and so on.