What is the purpose of a Data Access Layer? [closed] - terminology

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I started a project a long time ago and created a Data Access Layer project in my solution but have never developed anything in it. What is the purpose of a data access layer? Are there any good sources that I could learn more about the Data Access Layer?

In two words: Loose Coupling
To keep the code you use to pull data from your data store (database, flat files, web services, whatever) separate from business logic and presentation code. This way, if you have to change data stores, you don't end up rewriting the whole thing.
These days, various ORM frameworks are kind of blending the DAL with other layers. This typically makes development easier, but changing data stores can be painful. To be fair, changing data stores like that is pretty uncommon.

There are two primary purposes of a Data Access Layer
Abstract the actual database engine
or other data store, such that your
applications can switch from using
say Oracle to using MS SQL server
Abstract the logical data model such
that your Business Layer is
decoupled from this knowledge and is
agnostic of it. Giving you the
ability to modify the logical data
model without impacting the business
layer
Most answers here have provided the first reason. In my mind it is the second that is far more important. Essentially your business layer should not be aware of the logical data model that is in use. Today with ORMs and Linq #2 seems to go out the window and people tend to forget (or are not able to see the fine lines that do and should exist) about #2.
Essentially, to get a good understanding of the purpose and function of a Data Layer, you need to see things from the Business Layer's perspective, keeping in mind that the Business layer should be agnostic of the logical data model of your data store.
So each time the business layer need data for example, if should ask for the data it needs in a very simple logical data model agnostic way. So it would make a call into the Data Access Layer such as:
GetOrdersForCustomer(42)
And it gets back exactly the data it needs without being aware of what tables store this information of or relationship exists etc.
I've written an article on my blog that goes into more details.
The Purpose and function of a Data Access Layer

A data access layer follows the idea of "separation of concerns" whereby all of the logic required for your business logic to interact with your data layer (database) is isolated to a single set of classes (layer). This allows you to more easily change the backend physical data storage technology (move from XML files to a database, or from SQL Server to Oracle or MySQL, for example) without having a large impact (and if done right having zero impact) to your business logic.
There are a lot of tools that will help you build your data layer. If you search for the phrase "object relational mapper" or "ORM" you should find some more detailed information.

Data access layers make a lot of sense when many different parts of your application need to access data the same way.
It also makes sense when you need access the same data in many different ways. For example, how word processors can read many different file types and silently convert them into the application's internal format.
Keep in mind that a DAL can also be very counter productive. If you are building a system where data access performance is critical, separating it from the business logic can make some vital optimizations impossible.

The DAL should abstract your database from the rest of your project -- basically, there should be no SQL in any code other than the DAL, and only the DAL should know the structure of the database.
The purpose is mainly to insulate the rest of your app from database changes, and to make it easier to extend and support your app because you will always know where to go to modify database-interaction code.

The purpose is to abstract out the database access details that other parts of your application need not be concerned about.

A data access layer is used to abstract away the storage and retrieval of data from its representation. You can read more about this kind of abstraction in 1994's Design Patterns

The purpose is to abstract the data storage retrieval mechanism from data usage and manipulation.
Benefits:
Underlying storage can change (switch from Oracle to MSSQL for example), and you need a way to localize those changes
Schema changes - see above
You want a way to run disconnected from your db (demo mode): Add file serialization/deserialization to the DAL

I recommend you read up here: http://msdn.microsoft.com/en-us/practices/default.aspx
Using a DAL will help you isolate your data access from your presentation and business logic. I use it a lot so that I can easily swap out (through reflection and dynamically loading assemblies) data providers.
Read up, lots of good info there.
Also, look into the Data Access Block if you are planning on using .NET. It can be a big help.

Something which hasn't been brought up that I thought I'd add is that having a DAL allows you to improve the security of your system. For instance, the DB and DAL could run on server(s) inaccessible to the public while the business logic can run on a public facing server such that the public server can't run raw SQL on the DB. This could help mitigate a lot of damage should the public server be compromised.

Related

How to migrate data from mongodb to mysql?

I am currently working on an application like to analitics, i has Angularjs app which communicates with Spring REST Client App from which user creates token(trackingID) and use generated script with this id putting on his website to collect information about visitor's actions through another Spring REST tracking App, for tracking app i am using as mongodb to collect visitor actions/visitor info for fast insertion, but for rest client app mysql with user/accounts details.
My question is how to migrate mongo data from tracking app to mysql maybe for getting posibility of join for easily and fastest way of analyze data with any kind of filters from angularjs client app, to create manually any workers that periodically will transfer data from last point to present state from mongo to mysql, or are any existed tools that can be setted for this transfer?
There is no official library to do this.
But you can use mongoexport feature from mongoDB to export it in a CSV format and mysqlimport to import them into MySQL.
Here are links to the documentation MySQL import and MongoDB Export.
One more method you can try to write a program in one of your favorite language and read from MongoDB and write into MySQL
MySQL 5.7 has a new JSON data type, that can be very convenient.
You can create a table at MySQL to receive the JSON messages AS IS, and then use SQL to query it or do a post processing to load the data in a structured set of database tables.
Check this out: https://dev.mysql.com/doc/refman/5.7/en/json.html
I realise this question is a few years old - but recently I've had a number of people enquiring whether a tool I developed (https://virtual.blue/apps/json-converter) can do exactly what the OP is asking (convert MongoDB to SQL) so I am guessing it is still something people want. Keep reading to find out why I am honestly not surprised by this.
The short answer to whether the tool can help you is: perhaps. If your existing data relationships are not too complicated, and your database is not enormous, it may well be worth a try.
However, I thought it might help to try and explain what the issues are with this kind of conversion, since all the answers I have seen so far are along the lines of "try tool X" or "first convert to format Y and then you can slurp it into MySQL using utility Z". ie there is no thought to whether what you get at the end of doing this is going to make sense in terms of data relationships and integrity.
For example, you could just stick your entire database dump in a single field of a single SQL table (ok space limitations might prevent this in reality, but hopefully you get my point). Then your database would be "in MySQL format", but it would be absolutely no use to anyone.
The point is, what you actually want is a fully defined database model, correctly encapsulating all of the intrinsic data relationships. ("Database normalization" as it is known.) If your conversion process gets those relationships wrong, then you have a broken model, and any queries you try to run over it are likely to return nonsense. Unfortunately there is no magic tool that is just going to "know" the best way to represent your data in MySQL, and closing your eyes and shovelling it into a bunch of random tools is unlikely to miraculously get you what you want.
And herein lies the fundamental problem with the "NoSQL" philosophy (fad). They sold people the bogus notion of "non-relational data". My first thought when I heard this was, "How does that work? Surely all data is relational?" By the looks of things we are steadily getting more and more evidence that my instincts were right. ("NoSQL? Why stop there? I go with 'NoDatabase'. It returns no results at all, but it sure is fast!")
The NoSQL madness throws several important fundamental engineering principles to the wind. We shouted "don't hard code!", "DRY!" (Don't Repeat Yourself) because these actions infuse inflexibility into systems. Traditional wisdom makes precisely the same flexibility argument when it advises "create a fully described model with all the data relationships represented". Then you can execute any arbitrary query over it and expect meaningful results. "Yes but there are a whole bunch of queries we are never going to need to run," says the NoSQL proponent. But surely we learnt our lesson on things we are "never going to need to do"? ("I hard code liberally, because I know I am never going to want to change my code." Hmm...)
The arguments about speed are largely moot. Say it turns out you are frequently doing a complex 9 table join, with unsurprisingly sluggish performance. So create an index. Cache it. Swap some disk space for speed. The NoSQL philosophy is to swap data integrity for speed, which makes no sense at all.
When you generate your fast lookup index (cache/table/map/whatever) what you are really doing is creating a view over your model. If your model changes, you can readily update your view. Going from a model to a view is easy - it's a one to many operation and you are on the right side of entropy.
However, when you went with MongoDB you effectively decided to create views without bothering to describe your fundamental model. Now you discover there are queries you want to run, but can't - and so it's no wonder you want to move over to SQL and actually have your data modelled correctly. The problem is you now want to go from a view to a model. Now you're on the wrong side of entropy. Your view is a lossy representation of the model's fundamental relationships. You can't expect a tool to "translate" your database, because you are asking it to insert new relationships which were not originally defined. These are real world relationships that are not machine-guessable. The tool cannot know what relationships were intended.
In short the only way you can do this reliably is to get your hands dirty. An intelligent human, with complete understanding of the system you are modelling needs to sit down and carefully come up with (possibly a substantial amount of) code which effectively picks through the data and resolves all of the insufficiently represented data relationships. If your data is complex then it's going to be a headache and there is no way to cheat.
If your data is still relatively simple then I would suggest making the conversion as soon as possible, before it becomes difficult. In this case my tool (https://virtual.blue/apps/json-converter) may be able to help.
(They really should have asked a Physicist before they came up with all this nonsense...!)
You can download a trial version of Studio 3T for Mongo and export your database to SQL (or JSON) directly

Do stored procedures improve performance? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am newbie to web development.
I have an application server where my ASP.NET code resides. My application server communicates to a MySQL instance which is on a different server.
I was wondering, whether it is a good practice to move the computation from the application server to the database server by having a Stored Procedure with Views or should I just move on with all logic kept in application server and query the database only to retrieve data from tables directly without having stored procedures and views.
I am a strong advocate of putting database logic into the database and not splitting it between the application and the server. This means that I prefer to wrap all database calls in stored procedures and views.
The driving reasons are maintenance, security, and functionality, not performance, although performance is often better on the server side.
The number one reason is to isolate the application from changes in the underlying data structure. So, if the data structure changes, the application does not (always) break.
Other reasons the come to mind:
The same logic gets used for the same thing. That is, one piece of code doesn't define "foobar" one way and another "foobar" another way.
Auditing and logging are implemented within stored procedures rather than using triggers.
Database tables are off-limits to all users, unless they go through the defined interface.
A newer version and older version can often co-exist.
Admittedly, for a one-off, quick-and-dirty application these issues may not be important. However, I think it is a good idea to have well defined interfaces (APIs) between different components of a system, and databases and the application layer are a prime example where such APIs are quite useful.
I agree with Gordon on separating out a "layer" of code between the application and the actual database. I dispute how practical Stored Routines are at such.
PHP (etc) is far more expressive than SProcs.
One SProc can execute multiple queries faster because it is closer to the server. This can be an overwhelming performance gain if the client and server are on opposite sides of the country.
Error checking is clumsy in SProcs.
PHP recompiles only when the code changes; SProcs recompile once per connection; Perl always recompiles; etc.
VIEWs are sometimes poorly optimized, so I avoid them.
The secret to a good design for the "layer" is in the compromise between the forces tugging on either side. One example: Can you completely hide a schema change from the app? Even if you split one table into two?
A really bad example was when the UI did pagination by using page numbers. The layer thought in terms of OFFSET and LIMIT, and fed that to the MySQL back-end. Then came an item will 216K pages (Yes, that many!) They found out that OFFSET+LIMIT is not a good way to implement "next page", but fixing it required a changes to all layers of the system.

Any drawback of building website based on JSON API for Data Access Layer

For instance, in ecommerce websites, we generally have two interfaces. One with which customer interacts and places orders and one with which company employees interact to manage orders and customers etc.
If we divide this website into two different websites. That means, two different projects all together, not dependent on each other. Only thing common between both websites will be the database. Both websites will be using the same database. Then what would be a good option for making Data Access Layer
Each website have its own Database access code and entities.
Link both website with a centralized layer - which exposes Read/Write to database using API based on JSON
In my opinion, second option would be better. As it cancels out dependency of database, any changes made in database need not to be made at two places. And many other benefits.
But my only concern is, how much it could hamper performance of overall system? Because in that case we are serializing and de-serializing objects and also making use of HTTP connections.
Could someone please throw some light over what would be benefits and drawbacks of API backed Data Access Layer in comparison to having own Database access code.
People disagree about the best architecture for this sort of thing, but one common and popular architectural guideline suggest that you avoid integrating two products at the database layer at all costs. It is simpler to have two separate apps and databases which can change independently of each other, and if you need to reference data from one in the other you should have some sort of event pipeline between the two configured on the esb.
And, you should probably have more than two back end databases anyway -- unless you have an incredibly simple system with only the two classes of objects you mentioned, you'll probably find that you have more than two bounded domains.
Also, if your performance requirements increase then you'll probably want to look at splitting the read and write sides of your services and databases, connecting the two sides through an eventing system of some sort, (maybe event-sourcing).
Before you decide what to do you should read Implementing Domain Driven Design by Vaughn Vernon. And, the paper on CQRS by Martin Fowler. And the paper on event sourcing, also from Dr Fowler. For extra points you should also read Fowler on Microservices architecture.
Finally, on JSON -- and I'm a big fan -- but you should only use it at the repository interface if you're either using javascript on the back end (which is a great idea if you're using io.js and Koa) and the front end (backbone & marionette, please), or if you're using a data-source that natively emits json. If you have to parse it then it's only going to slow you down so use some format native to the data-source and its consumers, that way you'll be as fast as possible.
An API centric approach makes more sense as the data is standardised and gives you more flexibility by being usable in any language for one or multiple interfaces.
Performance wise this would greatly depend on the quality and implementation of the technology stack behind the API. You could also look at caching certain data on the frontend to improve page load time.
The guys over at moltin have already built a platform like this and I've had great success using it. There's already a backend dashboard and the response times are pretty fast too!

Is Data Mapper a more modern trend than Active Record

I've come across a couple of ORMs that recently announced they are planning to move their implementation from Active Record to Data Mapper. My knowledge of this subject is very limited. So a question for those who know better, is Data Mapper newer than Active Record? Was it around when the Active Record movement started? How do the two relate together?
Lastly since I'm not a database person and know little about this subject, should I follow an ORM that's moving to the Data Mapper implementation, as in what's in it for me as someone writing software (not a data person)?
The DataMapper is not more modern or newer, but just more suited for an ORM.
The main reason people change is because ActiveRecord does not make for a good ORM. An AR wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data. So by definition, an AR is a 1:1 representation of a database record, which makes it particularly suited for simple CRUD.
Some ARs added fetching of related data, which made people believe AR is an ORM. It is not. The point of an ORM is to tackle the object relational impedance mismatch between your database structure and your domain objects. When using AR, you don't solve this impedance mismatch because your AR represents a database row and not a proper OO design. You are tieing your db layout to your objects. Some of the object-relational behavioral patterns can still be applied though (for instance lazy loading).
Another reason why AR is often criticised is because it intermingles two concerns: business logic and db access logic. This leads to unwanted coupling and can result in less maintainability and flexibility in larger applications. There is no isolation between the two layers. Coupling always leads to less flexibility.
A DataMapper on the other hand moves data between objects and a database while keeping them independent of each other and the mapper itself. While more difficult to implement, it allows for much more flexible design in your application. Your domain objects no longer have to match the db structure. DAL and Domain layer are decoupled.
Even though the post is 8 years old, the question is still valid in 2018.
Active record is Anti pattern beware of that. It creates a very tight coupling between code and database. It might not be a problem for small simple projects. However, I would strongly recommend to avoid using it in anything bigger.
A good OOP design is done in layers. Input layer, service layer, repository layer, data mapper and DB - just a simple example. You should not mix Input layer with the DB. How this can be done? For example, in Laravel, you can use a Validator rule like this:
'email' => 'exists:staff,email'
It checks whether the email exists in the table staff.
This is a complete OOP non-sense. It tights your top layer with the DB column name. I cannot imagine any better example of a bad OOP design.
The bottom line - if you are creating a simple site with 2-3 tables, like a blog, Active record might not be a problem. For anything bigger, go for Data Mapper and be careful about OOP principles such as IoC, SoC, etc.

Database responsibility

I'm starting with Databases. I've been playing around with MySQL and Informix, but never had a real life project.
What is the real responsibility of a Database? Should we add Store procedures and functions to de Database or just let it to be a data repository with no logic?
What is the real responsibility of a Database?
A database at its core is a system to store and retrieve data. A CSV file on disk + suitable tools (e.g. Excel) is a simple example of this. In addition, a database might provide additional capabilities, such as transaction control, data integrity, and security.
Should we add Store procedures and functions to de Database or just let it to be a data repository with no logic?
What do you want from the database? If all you want is a "bit bucket", then by all means, store it in a plain file on disk and call it "the database". If you want a bit more than that, use a product that suits your needs. If you want to be able to query it using a 4GL like SQL, use MySQL. If you want transaction control, security, advanced query features, etc etc, use another DBMS if appropriate. Whatever product you choose, however, take advantage of that product. Otherwise you're wasting your time and money. Sure, you'll never use all of the features (only a subset will be useful to you), but if you use very few of them, you may as well downgrade to a simpler product.
If you're using Oracle, you can store procedures and functions (even better, whole packages) right there in the database alongside the data. The real question is, what do you need to write in those procedures and functions - business logic or presentation logic?
Personally, I usually prefer to keep business logic close to the data, whereas presentation logic is custom-made for each interface.
It is possible to create an API layer over your data so that no matter how your applications access your database, they will get a consistent view of it, and they will all modify it using a consistent mechanism. In other words, instead of writing the business logic multiple times (once for each interface), you write it once and once only, then re-use it everywhere.
There are two reasons I've heard why business logic should not be stored in the database:
1. Maintainability: it's hard to change. I never really understood this one. How hard is it to type CREATE OR REPLACE PACKAGE? I suspect it's just the burden of having to learn "yet another language".
2. Database independence: what works in Oracle won't work elsewhere. This is a biggie, and better minds than I have written about this one. Basically, if you really need it to be "database agnostic", you won't be able to use any of the advanced features of the database you bought, so you may as well just use the simplest/cheapest one you can find; in which case, you don't need it to work on every database anyway!
Generally it's considered good practice to not place business logic in your database. The main reason is maintainability. It is ok to use stored procedures still, but including business logic within those stored procedures makes your application harder to debug and update.
Including business logic in your database will also effectively tie you to using that one DBMS, and not allow the data layer to remain independent from your application. For example, you may encounter performance and scalability problems with one DB once your application is live, but due to business logic scattered throughout the db, migrating to a more scalable database will be time consuming at best.
If business logic is kept in application code (eg java or c#) and the data layer is abstracted using a data abstraction layer, and an ORM if language permits, then interchanging databases is much less problematic.
We should be striving for separation of concerns, and keeping business logic out of the db helps achieve that.
edit: There are also performance concerns which may dictate that stored procedures are a good place to keep business logic. Containing logic within the data tier (ie the sproc) in some cases reduces the many round trips between the data abstraction layer and the database, which can give a performance boost. I've worked on systems like this in the past, for this reason, but I've always found then difficult to maintain. The problem being that you can look through the classes and procedures to see the business logic and think that's it and you will not see how a particular bug or process can be occurring, then you'll find the stored procedure and see the other half of the business operation (a real pain when the sproc is a 1000 lines!)
As with many things, where you place your business logic depends on the particular problem you're trying to solve.
We have a lot of data around us which can be of great use to us. Ordered collection of information helps businesses to take more proper decisions. Databases are ordered storage of information.
Responsibility: In a common scenario, we can state that there is a lot of information around, ordered collection of information is called data, this information relates to an entity, and ordered collection of data is a database, information relating to a group of entities. Collection of these databases is a DBMS. Responsibility of the database is organizing information.
Stored procedures, functions are more like the business processes that you require in order to collect the data you desire to.
First starting point,
Begin:
Select database in {postgreSQL, MySQL, SQL Server(Express edition)} and install it.
Learn about Codd Rules, Normal forms, Good resource
Start learning SQL, write queries.
Understand the basics involved in schema creation.
Learn procedural language implementation in database.
Ask doubts in SO.