What is your understanding of the Repository Pattern?

What is your understanding of the Repository Pattern? - language-agnostic

I'm in the process of catching up on technical documentation for a project I completed some months ago, and one I'm coming close to finishing. I used repositories to abstract out the data access layer in both and was writing a short summary of the pattern on our wiki at work.
It was whilst writing this summary that I realised I took a slightly different approach the second time.
One used an explicit InsertOnSubmit method coupled with a Unit of Work and an implicit update with the UoW tracking changes. The other had a Save method which inserted new entries and updated existing (without a UoW).
Which approach would you typically favour? Consider the usual CRUD scenarios, where should the the responbility for each of them lie?

I think whether a repository uses Unit of Work, caching, or any other related concepts should be left to the implementation. I prefer for the interface to resemble a data store which is aligned with the domain model at hand. So that a customer repository would look something like this:
interface ICustomerRepository
{
Customer Load(int id);
IEnumerable<Customer> Search(CustomerQuery q);
void Save(Customer c);
void Delete(Customer c);
}
This can be easily implemented by something like NHibernate, or NHibernate with NHibernate.Linq, or a direct SQL library, or even an XML or flat-file store. If possible, I like the keep the concept of transaction outside of the repository, or at a more global scope so that operations of several repositories may be part of a single transaction.

Related

Question about Domain Objects, A Service Layer, and Using Linq2SQL and ASP.net MVC with the Repository Pattern

First off, apologies for the long description of my brainspace below. I'm still wrapping my head around lots of these new ideas, so I'm sure I'm describing something incorrectly. Please feel free to correct me where I'm wrong.
We are in the R&D phase of a new ASP.net MVC2 site and want to ensure that we can 1) decouple our data store from our application, 2) allow for our application to be tested via unit tests and 3) allow us to change out our datastore or use something other than Linq2SQL down the line.
This seemingly simple goal has opened up a whole new world to me that includes the Repository pattern, IoC, DI, and all sorts of other things that are making my head swim. Here's what is so far coming into focus, or at least what I believe is a somewhat correct plan to reach our goals:
We will have a number of ISpecificRepository interfaces that define the contract between users of the interface and the underlying data store.
The SpecificRepository implementations will query specific datastores and return POCO representing our domain objects (or collections of them).
Our Service Layer will perform the application specific business logic using an instance of ISpecificRepository passed to the various service methods and pass these POCO domain objects back to our presentation layer.
As mentioned, we are planning on using Linq2SQL to implement our specific repositories for the application and have decided to decouple our service layer from this implementation by creating the POCO for our domain objects and create a mapping to and from these objects to the LINQ generated entities. In the service layer, we can then create business logic to query the repository, add data, and do whatever else we need to do for each use case. This seems fine but my concern is that since we're using Linq2SQL, our specific Linq repository implementation will now have to house all of the many Get queries that the service layer requires to implement the business logic efficiently.
I'm curious as to whether this somehow breaks the Repository pattern since we're now housing application specific logic not in the service layer but in the repository instead.
The reason I feel that we need to do it this way is so that I can write more efficient Linq queries on my specific Linq repository using various DataLoadOptions, etc. without returning IQueryable from my repository up to my service layer, where it would seem that sort of logic actually belongs. Also, all of the example IRepository interfaces I've seen seem very lightweight and only provide a few methods to GetByID, GetAll, Find, Insert, Delete, and SubmitChanges to the underlying data store. In my case, it sounds like my specific repositories will be doing a great deal more than that.
Thanks for reading this far. Any and all help that can clarify my misconceptions would be greatly appreciated.
-Mustafa

our specific Linq repository
implementation will now have to house
all of the many Get queries that the
service layer requires to implement
the business logic efficiently.
I'm curious as to whether this somehow
breaks the Repository pattern
Not at all. A Repository is a collection of domain entities. If I have a Repository of Accounts, it is perfectly reasonable to want Accounts.ThatAreOverdue().
I personally prefer fluent naming. Accounts.ThatAreOverdue() feels better than AccountRepository.GetOverdue() .. but I suppose that is a point of preference.
Also, all of the example IRepository
interfaces I've seen seem very
lightweight and only provide a few
methods to GetByID, GetAll, Find,
Insert, Delete, and SubmitChanges to
the underlying data store.
A Repository interface can be thin. Find is meant to be used with the Specification pattern. Encapsulate the criteria in another object. The implementation of the criteria can be passed Linq2Sql objects from which to query - but it will be more difficult to re-use the criteria classes against in-memory domain objects (versus in database, where Linq2Sql is involved).
Our Service Layer will perform the
application specific business logic
using an instance of
ISpecificRepository passed to the
various service methods and pass these
POCO domain objects back to our
presentation layer.
Are you saying that your logic will all be in Services and the "domain objects" will be bags of properties and bound to in the view?
I don't think I'd recommend that.
If the same object that is used in the application logic is also used in the view, then you have tightly coupled the two application layers and experience says that causes problems. It will be very difficult to maintain coherence in the Services and Domain through changes if the View uses the same objects. The View will need pieces of data and they will inevitably get stuck onto places they don't really belong in the domain.

How Long to keep a LINQ-to-SQL DataContext Open?

I'm new to linq-to-sql (and sql for that matter), and I've started to gather evidence that maybe I'm not doing things the right way, so I wanted to see what you all have to say.
In my staff allocation application I allow the user to create assignments between employees and projects. At the beginning of the application, I open up a linq-to-sql data context to my management database. Throughout the program, I never let that data context go. As a matter of fact, most of the form constructors take this data context as one of their arguments.
I kinda thought that this was the way to do things until I read through another SO question where the asker was discussing repetitively re-creating the data context throughout his program and then "attaching" the entities to the new data contexts as needed. This would help me get around the problem I've been having wherein things are "sneaking" into my database.
So where would you use the first style (and don't be ashamed to say never), and where would you use the second style?

If you are writing a web application in, say, ASP.NET MVC, or a web service, you will be recreating the DataContext each time, as the application is "stateless" between page GETs and POSTs.
If you are writing a Winforms or WPF application, you can do it the same way, although holding a DataContext open can be easier to do, since Winforms applications are stateful (i.e. you have a container for the DataContext to live).
In general, it is probably sensible to open a DataContext each time you need to complete a "unit of work." The DataContext itself is fairly lightweight, so opening one for each "transaction" is not that big of a deal. This practice is also consistent with software layers in enterprise applications (i.e. Database, DAL, Service Layer, Repository, etc.), and helps to enforce separation of concerns between the requisite layers.

The generally recommended way of doing things is to create a new DataContext for each atomic operation. DataContext's are actually quite cheap to instantiate, and are very well suited to rapid turnover.
As a general rule of thumb, I tend to instantiate a DataContext, perform a CRUD operation, then dispose of it again. This could be the updating of a single entity, or inserting a load of objects. Do whatever makes the most sense for your scenario.
Just be careful if you're passing entities from your context around, as exceptions will be thrown if you try to enumerate or retrieve related data - it's best practice to transform the LINQ entities into independent objects (for example, a Person LINQ entity could be transformed into a PersonResult, which is consumed by the logic layer of your solution).
Hope that helps!

Data Repository - business objects?

I'm reading the book "ASP.NET 3.5 Social Networking - Andrew Siemer" and I got confused when he uses Repositories to access the data.
Here is the idea of his code:
public interface IAccountRepository
{
Account GetAcountByID(int acId);
void SaveAccount(Account account);
List<Account> GetAllAccounts();
}
public class AccountRepositoryLINQ : IAccountRepository
{
Account GetAcountByID(int acId){
..... LINQ query .....
...... return.....
}
void SaveAccount(Account account){
..... LINQ .....
}
List<Account> GetAllAccounts(){
..... LINQ query .....
...... return.....
}
}
The class "Account" is the one generated automatically on the "LINQ to SQL Classes".
Some of the problems I see:
1º
I code my business layer, GUI, etc... and later in time the table Accounts in the database is changed (example: change the name of one column), then I need to rebuild the "LINQ to SQL Classes" and all my code layers will need to be recoded because my "Account" object changed.
2º
If I need to have other repositories (MySQL, Oracle, XML, other), what "Account" class will I use?
What to do?
Shouldn't I use a custom Account class? This will be used in all application layers.
How do the mapping from LINQ to my custom Account class?
Using simple "myClass.Name = linqClass.Name;" ???
Isn't this consuming machine resources if I need to "map" all the classes?
There isn't a easiest/lightest way to do it?
Is this the correct approach? Is there other ways?

Good instinct..
My suggestion is to abstract away the LinqToSQL objects, and create a set of Business Domain Objects. Then the Repository can query for the needed data and map them to the Domain objects that your application uses, and return those. Now your Data Access layer is decoupled from your application, and you can now do all of the things you listed.
The mapping can be a pain, so look at tools like Automapper to accomplish this.

I have a love hate relationship with LINQ to SQL classes myself, but I thought I'd play devils advocate :-), firstly addressing the points you made:-
1º I code my business layer, GUI,
etc... and later in time the table
Accounts in the database is changed
(example: change the name of one
column), then I need to rebuild the
"LINQ to SQL Classes" and all my code
layers will need to be recoded because
my "Account" object changed.
The general approach is that you'd add behaviour to the partial classes generated by LINQ to SQL, these files won't be replaced when you refresh a table from the data context. If you change the name of the column and don't want to change the rest of your code just update the class in the designer to use the old column name?
Even if you used POCOs for persistence with NHibernate for instance you'd still need to change the mapping so I don't really see this as an issue.
2º If I need to have other repositories (MySQL, Oracle, XML, other),
what "Account" class will I use?
Personally I'd call YAGNI on this one, if you really anticipate needing support for multiple databases LINQ to SQL might not be the best solution to start with in any case (simply to keep your infrastructure consistent across the application), tools like NHibernate would have far better support for such situations.
Moving on to adding a custom account class, mapping code can be taken care of by tools like AutoMapper, though this might mean you give up things like lazy loading (which may or may not be a big deal to you).
In the end it can be quite empowering to have full control over your entities (e.g. not having to use a parameterless constructor, control over instatiation etc, simple user types that map to one or two columns) and if you feel that your application might benefit from this it's probably the way to go, but you will pay the price in the repository implementation which will be complicated by mapping code and handling whether things need to be updated / deleted / inserted.
A good middle ground might be to simply code to an interface (e.g. IAccount) this should define the properties and method you expect from an account. Your repository would then become
IAccount GetById(int accountId);
You'll then give yourself freedom over what the implementation is (i.e. whether it's implemented by a LINQ to SQL class or a projection / mapping) and if you do opt for a custom class in future it'd be a simple case of moving the implementation to that class and altering the repository implementation.
In the end it's down to the application, if you think it's going to end up a huge application with extremely complex business logic by all means I would opt for a segregated domain layer that at least tries to be persistence ignorant. If, however, it isn't and opting for the repository pattern is simply a means to achieve good testability and a simple abstraction above your data access. I don't see why explicitly referencing LINQ to SQL classes and using them as a simple domain layer is such a big deal.

I personally use a combination of NHibernate and FluentNHibernate and seperate my domain(business objects) from all other things. I use messages from my other layers, like a GUI, to my domain which has a handler which injects repositories inside that hydrate the object(s) in question and perform the business logic, the interfaces in the repositories above are a nice way to decouple if you want to use other implementations of repositories or data access.

How should I refactor my code to remove unnecessary singletons?

I was confused when I first started to see anti-singleton commentary. I have used the singleton pattern in some recent projects, and it was working out beautifully. So much so, in fact, that I have used it many, many times.
Now, after running into some problems, reading this SO question, and especially this blog post, I understand the evil that I have brought into the world.
So: How do I go about removing singletons from existing code?
For example:
In a retail store management program, I used the MVC pattern. My Model objects describe the store, the user interface is the View, and I have a set of Controllers that act as liason between the two. Great. Except that I made the Store into a singleton (since the application only ever manages one store at a time), and I also made most of my Controller classes into singletons (one mainWindow, one menuBar, one productEditor...). Now, most of my Controller classes get access the other singletons like this:
Store managedStore = Store::getInstance();
managedStore.doSomething();
managedStore.doSomethingElse();
//etc.
Should I instead:
Create one instance of each object and pass references to every object that needs access to them?
Use globals?
Something else?
Globals would still be bad, but at least they wouldn't be pretending.
I see #1 quickly leading to horribly inflated constructor calls:
someVar = SomeControllerClass(managedStore, menuBar, editor, sasquatch, ...)
Has anyone else been through this yet? What is the OO way to give many individual classes acces to a common variable without it being a global or a singleton?

Dependency Injection is your friend.
Take a look at these posts on the excellent Google Testing Blog:
Singletons are pathologic liars (but you probably already understand this if you are asking this question)
A talk on Dependency Injection
Guide to Writing Testable Code
Hopefully someone has made a DI framework/container for the C++ world? Looks like Google has released a C++ Testing Framework and a C++ Mocking Framework, which might help you out.

It's not the Singleton-ness that is the problem. It's fine to have an object that there will only ever be one instance of. The problem is the global access. Your classes that use Store should receive a Store instance in the constructor (or have a Store property / data member that can be set) and they can all receive the same instance. Store can even keep logic within it to ensure that only one instance is ever created.

My way to avoid singletons derives from the idea that "application global" doesn't mean "VM global" (i.e. static). Therefore I introduce a ApplicationContext class which holds much former static singleton information that should be application global, like the configuration store. This context is passed into all structures. If you use any IOC container or service manager, you can use this to get access to the context.

There's nothing wrong with using a global or a singleton in your program. Don't let anyone get dogmatic on you about that kind of crap. Rules and patterns are nice rules of thumb. But in the end it's your project and you should make your own judgments about how to handle situations involving global data.
Unrestrained use of globals is bad news. But as long as you are diligent, they aren't going to kill your project. Some objects in a system deserve to be singleton. The standard input and outputs. Your log system. In a game, your graphics, sound, and input subsystems, as well as the database of game entities. In a GUI, your window and major panel components. Your configuration data, your plugin manager, your web server data. All these things are more or less inherently global to your application. I think your Store class would pass for it as well.
It's clear what the cost of using globals is. Any part of your application could be modifying it. Tracking down bugs is hard when every line of code is a suspect in the investigation.
But what about the cost of NOT using globals? Like everything else in programming, it's a trade off. If you avoid using globals, you end up having to pass those stateful objects as function parameters. Alternatively, you can pass them to a constructor and save them as a member variable. When you have multiple such objects, the situation worsens. You are now threading your state. In some cases, this isn't a problem. If you know only two or three functions need to handle that stateful Store object, it's the better solution.
But in practice, that's not always the case. If every part of your app touches your Store, you will be threading it to a dozen functions. On top of that, some of those functions may have complicated business logic. When you break that business logic up with helper functions, you have to -- thread your state some more! Say for instance you realize that a deeply nested function needs some configuration data from the Store object. Suddenly, you have to edit 3 or 4 function declarations to include that store parameter. Then you have to go back and add the store as an actual parameter to everywhere one of those functions is called. It may be that the only use a function has for a Store is to pass it to some subfunction that needs it.
Patterns are just rules of thumb. Do you always use your turn signals before making a lane change in your car? If you're the average person, you'll usually follow the rule, but if you are driving at 4am on an empty high way, who gives a crap, right? Sometimes it'll bite you in the butt, but that's a managed risk.

Regarding your inflated constructor call problem, you could introduce parameter classes or factory methods to leverage this problem for you.
A parameter class moves some of the parameter data to it's own class, e.g. like this:
var parameterClass1 = new MenuParameter(menuBar, editor);
var parameterClass2 = new StuffParameters(sasquatch, ...);
var ctrl = new MyControllerClass(managedStore, parameterClass1, parameterClass2);
It sort of just moves the problem elsewhere though. You might want to housekeep your constructor instead. Only keep parameters that are important when constructing/initiating the class in question and do the rest with getter/setter methods (or properties if you're doing .NET).
A factory method is a method that creates all instances you need of a class and have the benefit of encapsulating creation of the said objects. They are also quite easy to refactor towards from Singleton, because they're similar to getInstance methods that you see in Singleton patterns. Say we have the following non-threadsafe simple singleton example:
// The Rather Unfortunate Singleton Class
public class SingletonStore {
private static SingletonStore _singleton
= new MyUnfortunateSingleton();
private SingletonStore() {
// Do some privatised constructing in here...
}
public static SingletonStore getInstance() {
return _singleton;
}
// Some methods and stuff to be down here
}
// Usage:
// var singleInstanceOfStore = SingletonStore.getInstance();
It is easy to refactor this towards a factory method. The solution is to remove the static reference:
public class StoreWithFactory {
public StoreWithFactory() {
// If the constructor is private or public doesn't matter
// unless you do TDD, in which you need to have a public
// constructor to create the object so you can test it.
}
// The method returning an instance of Singleton is now a
// factory method.
public static StoreWithFactory getInstance() {
return new StoreWithFactory();
}
}
// Usage:
// var myStore = StoreWithFactory.getInstance();
Usage is still the same, but you're not bogged down with having a single instance. Naturally you would move this factory method to it's own class as the Store class shouldn't concern itself with creation of itself (and coincidentally follow the Single Responsibility Principle as an effect of moving the factory method out).
From here you have many choices, but I'll leave that as an exercise for yourself. It is easy to over-engineer (or overheat) on patterns here. My tip is to only apply a pattern when there is a need for it.

Okay, first of all, the "singletons are always evil" notion is wrong. You use a Singleton whenever you have a resource which won't or can't ever be duplicated. No problem.
That said, in your example, there's an obvious degree of freedom in the application: someone could come along and say "but I want two stores."
There are several solutions. The one that occurs first of all is to build a factory class; when you ask for a Store, it gives you one named with some universal name (eg, a URI.) Inside that store, you need to be sure that multiple copies don't step on one another, via critical regions or some method of ensuring atomicity of transactions.

Miško Hevery has a nice article series on testability, among other things the singleton, where he isn't only talking about the problems, but also how you might solve it (see 'Fixing the flaw').

I like to encourage the use of singletons where necessary while discouraging the use of the Singleton pattern. Note the difference in the case of the word. The singleton (lower case) is used wherever you only need one instance of something. It is created at the start of your program and is passed to the constructor of the classes that need it.
class Log
{
void logmessage(...)
{ // do some stuff
}
};
int main()
{
Log log;
// do some more stuff
}
class Database
{
Log &_log;
Database(Log &log) : _log(log) {}
void Open(...)
{
_log.logmessage(whatever);
}
};
Using a singleton gives all of the capabilities of the Singleton anti-pattern but it makes your code more easily extensible, and it makes it testable (in the sense of the word defined in the Google testing blog). For example, we may decide that we need the ability to log to a web-service at some times as well, using the singleton we can easily do that without significant changes to the code.
By comparison, the Singleton pattern is another name for a global variable. It is never used in production code.

Proper Logging in OOP context

Here is a problem I've struggled with ever since I first started learning object-oriented programming: how should one implement a logger in "proper" OOP code?
By this, I mean an object that has a method that we want every other object in the code to be able to access; this method would output to console/file/whatever, which we would use for logging--hence, this object would be the logger object.
We don't want to establish the logger object as a global variable, because global variables are bad, right? But we also don't want to have the pass the logger object in the parameters of every single method we call in every single object.
In college, when I brought this up to the professor, he couldn't actually give me an answer. I realize that there are actually packages (for say, Java) that might implement this functionality. What I am ultimately looking for, though, is the knowledge of how to properly and in the OOP way implement this myself.

You do want to establish the logger as a global variable, because global variables are not bad. At least, they aren't inherently bad. A logger is a great example of the proper use of a globally accessible object. Read about the Singleton design pattern if you want more information.

There are some very well thought out solutions. Some involve bypassing OO and using another mechanism (AOP).
Logging doesn't really lend itself too well to OO (which is okay, not everything does). If you have to implement it yourself, I suggest just instantiating "Log" at the top of each class:
private final log=new Log(this);
and all your logging calls are then trivial: log.print("Hey");
Which makes it much easier to use than a singleton.
Have your logger figure out what class you are passing in and use that to annotate the log. Since you then have an instance of log, you can then do things like:
log.addTag("Bill");
And log can add the tag bill to each entry so that you can implement better filtering for your display.
log4j and chainsaw are a perfect out of the box solution though--if you aren't just being academic, use those.

A globally accessible logger is a pain for testing. If you need a "centralized" logging facility create it on program startup and inject it into the classes/methods that need logging.
How do you test methods that use something like this:
public class MyLogger
{
public static void Log(String Message) {}
}
How do you replace it with a mock?
Better:
public interface ILog
{
void Log(String message);
}
public class MyLog : ILog
{
public void Log(String message) {}
}

I've always used the Singleton pattern to implement a logging object.

You could look at the Singleton pattern.

Create the logger as a singleton class and then access it using a static method.

I think you should use AOP (aspect-oriented programming) for this, rather than OOP.

In practice a singleton / global method works fine, in my opinion. Preferably the global thing is just a framework to which you can connect different listeners (observer pattern), e.g. one for console output, one for database output, one for Windows EventLog output, etc.
Beware for overdesign though, I find that in practice a single class with just global methods can work quite nicely.
Or you could use the infrastructure the particular framework you work in offers.

The Enterprise Library Logging Application Block that comes from Microsoft's Pattern & Practices group is a great example of implementing a logging framework in an OOP environment. They have some great documentation on how they have implemented their logging application block and all the source code is available for your own review or modification.
There are other similar implementations: log4net, log4j, log4cxx
They way they have implemented the Enterprise Library Logging Application Block is to have a static Logger class with a number of different methods that actually perform the log operation. If you were looking at patterns this would probably be one of the better uses of the Singleton pattern.

I am all for AOP together with log4*. This really helped us.
Google gave me this article for instance. You can try to search more on that subject.

(IMHO) how 'logging' happens isn't part of your solution design, it's more part of whatever environment you happen to be running in - like System and Calendar in Java.
Your 'good' solution is one that is as loosely coupled to any particular logging implementation as possible so think interfaces. I'd check out the trail here for an example of how Sun tackled it as they probably came up with a pretty good design and laid it all out for you to learn from!

use a static class, it has the least overhead and is accessible from all project types within a simple assembly reference
note that a Singleton is equivalent, but involves unnecessary allocation
if you are using multiple app domains, beware that you may need a proxy object to access the static class from domains other than the main one
also if you have multiple threads you may need to lock around the logging functions to avoid interlacing the output
IMHO logging alone is insufficient, that's why I wrote CALM
good luck!

Maybe inserting Logging in a transparent way would rather belong in the Aspect Oriented Programming idiom. But we're talking OO design here...
The Singleton pattern may be the most useful, in my opinion: you can access the Logging service from any context through a public, static method of a LoggingService class.
Though this may seem a lot like a global variable, it is not: it's properly encapsulated within the singleton class, and not everyone has access to it. This enables you to change the way logging is handled even at runtime, but protects the working of the logging from 'vilain' code.
In the system I work on, we create a number of Logging 'singletons', in order to be able to distinguish messages from different subsystems. These can be switched on/off at runtime, filters can be defined, writing to file is possible... you name it.

I've solved this in the past by adding an instance of a logging class to the base class(es) (or interface, if the language supports that) for the classes that need to access logging. When you log something, the logger looks at the current call stack and determines the invoking code from that, setting the proper metadata about the logging statement (source method, line of code if available, class that logged, etc.) This way a minimal number of classes have loggers, and the loggers don't need to be specifically configured with the metadata that can be determined automatically.
This does add considerable overhead, so it is not necessarily a wise choice for production logging, but aspects of the logger can be disabled conditionally if you design it in such a way.
Realistically, I use commons-logging most of the time (I do a lot of work in java), but there are aspects of the design I described above that I find beneficial. The benefits of having a robust logging system that someone else has already spent significant time debugging has outweighed the need for what could be considered a cleaner design (that's obviously subjective, especially given the lack of detail in this post).
I have had issues with static loggers causing permgen memory issues (at least, I think that's what the problem is), so I'll probably be revisiting loggers soon.

To avoid global variables, I propose to create a global REGISTRY and register your globals there.
For logging, I prefer to provide a singleton class or a class which provides some static methods for logging.
Actually, I'd use one of the existing logging frameworks.

One other possible solution is to have a Log class which encapsulates the logging/stored procedure. That way you can just instantiate a new Log(); whenever you need it without having to use a singleton.
This is my preferred solution, because the only dependency you need to inject is the database if you're logging via database. If you're using files potentially you don't need to inject any dependencies. You can also entirely avoid a global or static logging class/function.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008