Is Data Mapper a more modern trend than Active Record - mysql

I've come across a couple of ORMs that recently announced they are planning to move their implementation from Active Record to Data Mapper. My knowledge of this subject is very limited. So a question for those who know better, is Data Mapper newer than Active Record? Was it around when the Active Record movement started? How do the two relate together?
Lastly since I'm not a database person and know little about this subject, should I follow an ORM that's moving to the Data Mapper implementation, as in what's in it for me as someone writing software (not a data person)?

The DataMapper is not more modern or newer, but just more suited for an ORM.
The main reason people change is because ActiveRecord does not make for a good ORM. An AR wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data. So by definition, an AR is a 1:1 representation of a database record, which makes it particularly suited for simple CRUD.
Some ARs added fetching of related data, which made people believe AR is an ORM. It is not. The point of an ORM is to tackle the object relational impedance mismatch between your database structure and your domain objects. When using AR, you don't solve this impedance mismatch because your AR represents a database row and not a proper OO design. You are tieing your db layout to your objects. Some of the object-relational behavioral patterns can still be applied though (for instance lazy loading).
Another reason why AR is often criticised is because it intermingles two concerns: business logic and db access logic. This leads to unwanted coupling and can result in less maintainability and flexibility in larger applications. There is no isolation between the two layers. Coupling always leads to less flexibility.
A DataMapper on the other hand moves data between objects and a database while keeping them independent of each other and the mapper itself. While more difficult to implement, it allows for much more flexible design in your application. Your domain objects no longer have to match the db structure. DAL and Domain layer are decoupled.

Even though the post is 8 years old, the question is still valid in 2018.
Active record is Anti pattern beware of that. It creates a very tight coupling between code and database. It might not be a problem for small simple projects. However, I would strongly recommend to avoid using it in anything bigger.
A good OOP design is done in layers. Input layer, service layer, repository layer, data mapper and DB - just a simple example. You should not mix Input layer with the DB. How this can be done? For example, in Laravel, you can use a Validator rule like this:
'email' => 'exists:staff,email'
It checks whether the email exists in the table staff.
This is a complete OOP non-sense. It tights your top layer with the DB column name. I cannot imagine any better example of a bad OOP design.
The bottom line - if you are creating a simple site with 2-3 tables, like a blog, Active record might not be a problem. For anything bigger, go for Data Mapper and be careful about OOP principles such as IoC, SoC, etc.

Related

Ways to structure an application that has two clear parts

I am in a project that has an infinite amount of tables, We have to come to a solution that brings scalability to the platform, and we don't seem to figure out what would be a really good one.
The platform is a job seeker, so it has two clear parts, candidates, and companies.
We've been thinking and have come to those posible solutions to re-estructure the current database, as it is a monster.
2 API's 2 Databases: This way would take a lot of database migration work, but would define very clearly the different parts of the platform.
2 API's 1 Database: Doing this, the database work would be reduced to normalize what we have now, but we would still have the two parts of the platform logically separated.
1 API 1 Database: Normalize the database, and do everything in the same API, trying to logically separate everything, making it scalable but at the same time accesible from one part to the other.
Right now I am more into the 1 API 1 Database solution, but we would like to read some experienced users to make the final choice.
Thank you!
I was in a situation kind of like yours some years ago. I will try to express my thoughts on how we handled it. All this might sound opinionated but each and every task is different, therefore the implementations are as well.
The two largest problems I notice:
Having an infinite number of tables is the first sign that your current database schema design is a Big Ball of Mud.
Acknowledging that you have a monster database indicates that you better start refactoring it to smaller pieces. Yes I know it's never easy.
It would add a lot more value to your question if you would show us some of the architectural details/parts of your codebase, so we could give better suited ideas.
Please forgive me for linking Domain Driven Design related information sources. I know that DDD is not about any technological fluff, however the strategy you need to choose is super important and I think it brings value to this post.
Know your problem domain
Before you start taking your database apart you should clearly understand how your problem domain works. To put it simply: the problem domain definition in short is the domain of the business problems you are trying to solve with the strategy you are going to apply.
Pick your strategy
The most important thing here is: the business value your strategy brings. The proposed strategy in this case is to make clear distinctions between your database objects.
Be tactical!
We chose the strategy, now we need to to define tactics applied to this refactoring. Our definition of our tactics here should be clearly set like:
Separate the related database objects that belong together, this defines explicit boundaries.
Make sure the connections between the regrouped database objects remain intact and are working. I'm talking about cross table/object references here.
Let's get technical - the database
How to break things
I personally would split up your current schema to three individual separate parts:
Candidates
Companies
Common tables
Reasoning
By strategically splitting up these database objects you consciously separate these concerns. This separation lets you have a new thing: tactical boundary.
Each of your newly separated schemas now have different contexts, and different boundaries. For example there is the Candidates schemas bounded context. It groups together business concepts/rules/etc. The same applies to the Companies schema.
The only difference is the Common tables schema. This could serve as a shared kernel -a bridge, if you like- between your other databases, containing all the shared tables that every other schema needs to reach.
Outcome
All that has been said could bring you up to a level where you can:
Backup/restore faster and more conveniently
Scale database instances separately
Easily set/monitor the access of database objects defined per schema
The API
How to glue things
This is the point where it gets really greasy, however implementing an API is really dependent on your business use case. I personally would design two different public API's.
Example
For Candidates
For Companies
The same design principles apply here as well. The only difference here is that I think there is no added business value to add an API for the Common tables. It could be just a simple database schema which both of these main API's could query or send commands to.
In my humble opinion, seperating databases results in some content management difficulties. Both of these seperate parts will contain exactly same tables like job positions, cities, business areas etc. How will you maintain these tables? Will you insert country "Zimbabwe" to both of them? What if their primary keys not equal? At some point you will need to use data from these seperated databases and which record of "Zimbabwe" will be used? I'm not talking about performance but using same database for these two project will be make life easier for you. Also we are in cloud age and you can scale your single database service/server/droplet as you want. For clearity of modules, you can define your naming conventions. For example if table is used by both parts, add prefix "common_", if table only used by candidates use "candidate_" etc.
For API, you can use same methodology, too. Define 3 different API part. Common, candidates and companies. But in this way, you should code well-tested authentication and authorization layer for your API.
If I were you, I'd choose the 1 API, 1 Database.
If it fails, seperating 1 API to 2 API or 1 Database to 2 Database is much easier then merging them (humble opinion...)

EAV vs Document DB vs XML Column for a system with arbitrary entities?

Despite reading an awful lot of varying opinions and advice online and in SO I still cannot really decide the best solution to solve my current requirements.
In essence I need to make a system where objects can be arbitarily defined with any number of properties. The applicaiton tracks the where abouts and state of these objects and I cannot possibly know at compile time what the full gambit of these objects will be (besides this will be sold to many companies to track what they will).
The data itself WILL have some forms of relation to each other. The biggest of which will be the notion of a location hierarchy; think Country->Province->Town->Postcode->Building->Room->Locker->Object
There will be some other Parent->Child relations in the data too. For example an instance of a car has an instance of an engine, has an instance of a piston.
The history of objects and data will be important. What the state of the object has been at various times and places will be a heavily used feature of this system. Being able to retrieve the full history for reporting will be important too.
The options as I see them:
EAV - Entity Attribute Value (or hybrid of) in SQL
Pros:
It's relational and normalised
Querying is powerful
The relational and hierarchical parts of the data fit this paradigm
History achieved by storing dates against properties
Cons:
Query complexity
90% of the time every attribute for an object will be required giving
a serious number of joins
Most likely there will be pivots everywhere
Others?
Relational with XML catch all columns:
Pros:
Relational goodness, ORMs etc, etc (all of above)
How to store the history?
Cons:
The vast majority of the attributes will be in this XML column
(say>70%)
Slow queries?
Others?
Document DB
Pros:
Open Schema
History is as simple as retrieving older docs
Cons:
I've got a fair amount of relational data!
Query support (I'm not well enough versed to say what the pros and
cons are of each Document DB tech)
Others?
As you can tell the most of my experience has come in the form of relational DBs (SQL). Add to that I have already prototyped a similar solution with and EAV/Relational hybrid in SQL and found it an utter pain when things got even remotely complex.
I'm tech agnostic; I have front to back end experience in lots of techs and not adverse to learning anything new.
What are the thoughts on my situation; the long and short of it is each of the above is a valid way to solve the problem but I'm keen to hear what other people think and have experienced so I can try to avoid any costly blind sides.

How to migrate data from mongodb to mysql?

I am currently working on an application like to analitics, i has Angularjs app which communicates with Spring REST Client App from which user creates token(trackingID) and use generated script with this id putting on his website to collect information about visitor's actions through another Spring REST tracking App, for tracking app i am using as mongodb to collect visitor actions/visitor info for fast insertion, but for rest client app mysql with user/accounts details.
My question is how to migrate mongo data from tracking app to mysql maybe for getting posibility of join for easily and fastest way of analyze data with any kind of filters from angularjs client app, to create manually any workers that periodically will transfer data from last point to present state from mongo to mysql, or are any existed tools that can be setted for this transfer?
There is no official library to do this.
But you can use mongoexport feature from mongoDB to export it in a CSV format and mysqlimport to import them into MySQL.
Here are links to the documentation MySQL import and MongoDB Export.
One more method you can try to write a program in one of your favorite language and read from MongoDB and write into MySQL
MySQL 5.7 has a new JSON data type, that can be very convenient.
You can create a table at MySQL to receive the JSON messages AS IS, and then use SQL to query it or do a post processing to load the data in a structured set of database tables.
Check this out: https://dev.mysql.com/doc/refman/5.7/en/json.html
I realise this question is a few years old - but recently I've had a number of people enquiring whether a tool I developed (https://virtual.blue/apps/json-converter) can do exactly what the OP is asking (convert MongoDB to SQL) so I am guessing it is still something people want. Keep reading to find out why I am honestly not surprised by this.
The short answer to whether the tool can help you is: perhaps. If your existing data relationships are not too complicated, and your database is not enormous, it may well be worth a try.
However, I thought it might help to try and explain what the issues are with this kind of conversion, since all the answers I have seen so far are along the lines of "try tool X" or "first convert to format Y and then you can slurp it into MySQL using utility Z". ie there is no thought to whether what you get at the end of doing this is going to make sense in terms of data relationships and integrity.
For example, you could just stick your entire database dump in a single field of a single SQL table (ok space limitations might prevent this in reality, but hopefully you get my point). Then your database would be "in MySQL format", but it would be absolutely no use to anyone.
The point is, what you actually want is a fully defined database model, correctly encapsulating all of the intrinsic data relationships. ("Database normalization" as it is known.) If your conversion process gets those relationships wrong, then you have a broken model, and any queries you try to run over it are likely to return nonsense. Unfortunately there is no magic tool that is just going to "know" the best way to represent your data in MySQL, and closing your eyes and shovelling it into a bunch of random tools is unlikely to miraculously get you what you want.
And herein lies the fundamental problem with the "NoSQL" philosophy (fad). They sold people the bogus notion of "non-relational data". My first thought when I heard this was, "How does that work? Surely all data is relational?" By the looks of things we are steadily getting more and more evidence that my instincts were right. ("NoSQL? Why stop there? I go with 'NoDatabase'. It returns no results at all, but it sure is fast!")
The NoSQL madness throws several important fundamental engineering principles to the wind. We shouted "don't hard code!", "DRY!" (Don't Repeat Yourself) because these actions infuse inflexibility into systems. Traditional wisdom makes precisely the same flexibility argument when it advises "create a fully described model with all the data relationships represented". Then you can execute any arbitrary query over it and expect meaningful results. "Yes but there are a whole bunch of queries we are never going to need to run," says the NoSQL proponent. But surely we learnt our lesson on things we are "never going to need to do"? ("I hard code liberally, because I know I am never going to want to change my code." Hmm...)
The arguments about speed are largely moot. Say it turns out you are frequently doing a complex 9 table join, with unsurprisingly sluggish performance. So create an index. Cache it. Swap some disk space for speed. The NoSQL philosophy is to swap data integrity for speed, which makes no sense at all.
When you generate your fast lookup index (cache/table/map/whatever) what you are really doing is creating a view over your model. If your model changes, you can readily update your view. Going from a model to a view is easy - it's a one to many operation and you are on the right side of entropy.
However, when you went with MongoDB you effectively decided to create views without bothering to describe your fundamental model. Now you discover there are queries you want to run, but can't - and so it's no wonder you want to move over to SQL and actually have your data modelled correctly. The problem is you now want to go from a view to a model. Now you're on the wrong side of entropy. Your view is a lossy representation of the model's fundamental relationships. You can't expect a tool to "translate" your database, because you are asking it to insert new relationships which were not originally defined. These are real world relationships that are not machine-guessable. The tool cannot know what relationships were intended.
In short the only way you can do this reliably is to get your hands dirty. An intelligent human, with complete understanding of the system you are modelling needs to sit down and carefully come up with (possibly a substantial amount of) code which effectively picks through the data and resolves all of the insufficiently represented data relationships. If your data is complex then it's going to be a headache and there is no way to cheat.
If your data is still relatively simple then I would suggest making the conversion as soon as possible, before it becomes difficult. In this case my tool (https://virtual.blue/apps/json-converter) may be able to help.
(They really should have asked a Physicist before they came up with all this nonsense...!)
You can download a trial version of Studio 3T for Mongo and export your database to SQL (or JSON) directly

Any drawback of building website based on JSON API for Data Access Layer

For instance, in ecommerce websites, we generally have two interfaces. One with which customer interacts and places orders and one with which company employees interact to manage orders and customers etc.
If we divide this website into two different websites. That means, two different projects all together, not dependent on each other. Only thing common between both websites will be the database. Both websites will be using the same database. Then what would be a good option for making Data Access Layer
Each website have its own Database access code and entities.
Link both website with a centralized layer - which exposes Read/Write to database using API based on JSON
In my opinion, second option would be better. As it cancels out dependency of database, any changes made in database need not to be made at two places. And many other benefits.
But my only concern is, how much it could hamper performance of overall system? Because in that case we are serializing and de-serializing objects and also making use of HTTP connections.
Could someone please throw some light over what would be benefits and drawbacks of API backed Data Access Layer in comparison to having own Database access code.
People disagree about the best architecture for this sort of thing, but one common and popular architectural guideline suggest that you avoid integrating two products at the database layer at all costs. It is simpler to have two separate apps and databases which can change independently of each other, and if you need to reference data from one in the other you should have some sort of event pipeline between the two configured on the esb.
And, you should probably have more than two back end databases anyway -- unless you have an incredibly simple system with only the two classes of objects you mentioned, you'll probably find that you have more than two bounded domains.
Also, if your performance requirements increase then you'll probably want to look at splitting the read and write sides of your services and databases, connecting the two sides through an eventing system of some sort, (maybe event-sourcing).
Before you decide what to do you should read Implementing Domain Driven Design by Vaughn Vernon. And, the paper on CQRS by Martin Fowler. And the paper on event sourcing, also from Dr Fowler. For extra points you should also read Fowler on Microservices architecture.
Finally, on JSON -- and I'm a big fan -- but you should only use it at the repository interface if you're either using javascript on the back end (which is a great idea if you're using io.js and Koa) and the front end (backbone & marionette, please), or if you're using a data-source that natively emits json. If you have to parse it then it's only going to slow you down so use some format native to the data-source and its consumers, that way you'll be as fast as possible.
An API centric approach makes more sense as the data is standardised and gives you more flexibility by being usable in any language for one or multiple interfaces.
Performance wise this would greatly depend on the quality and implementation of the technology stack behind the API. You could also look at caching certain data on the frontend to improve page load time.
The guys over at moltin have already built a platform like this and I've had great success using it. There's already a backend dashboard and the response times are pretty fast too!

Database responsibility

I'm starting with Databases. I've been playing around with MySQL and Informix, but never had a real life project.
What is the real responsibility of a Database? Should we add Store procedures and functions to de Database or just let it to be a data repository with no logic?
What is the real responsibility of a Database?
A database at its core is a system to store and retrieve data. A CSV file on disk + suitable tools (e.g. Excel) is a simple example of this. In addition, a database might provide additional capabilities, such as transaction control, data integrity, and security.
Should we add Store procedures and functions to de Database or just let it to be a data repository with no logic?
What do you want from the database? If all you want is a "bit bucket", then by all means, store it in a plain file on disk and call it "the database". If you want a bit more than that, use a product that suits your needs. If you want to be able to query it using a 4GL like SQL, use MySQL. If you want transaction control, security, advanced query features, etc etc, use another DBMS if appropriate. Whatever product you choose, however, take advantage of that product. Otherwise you're wasting your time and money. Sure, you'll never use all of the features (only a subset will be useful to you), but if you use very few of them, you may as well downgrade to a simpler product.
If you're using Oracle, you can store procedures and functions (even better, whole packages) right there in the database alongside the data. The real question is, what do you need to write in those procedures and functions - business logic or presentation logic?
Personally, I usually prefer to keep business logic close to the data, whereas presentation logic is custom-made for each interface.
It is possible to create an API layer over your data so that no matter how your applications access your database, they will get a consistent view of it, and they will all modify it using a consistent mechanism. In other words, instead of writing the business logic multiple times (once for each interface), you write it once and once only, then re-use it everywhere.
There are two reasons I've heard why business logic should not be stored in the database:
1. Maintainability: it's hard to change. I never really understood this one. How hard is it to type CREATE OR REPLACE PACKAGE? I suspect it's just the burden of having to learn "yet another language".
2. Database independence: what works in Oracle won't work elsewhere. This is a biggie, and better minds than I have written about this one. Basically, if you really need it to be "database agnostic", you won't be able to use any of the advanced features of the database you bought, so you may as well just use the simplest/cheapest one you can find; in which case, you don't need it to work on every database anyway!
Generally it's considered good practice to not place business logic in your database. The main reason is maintainability. It is ok to use stored procedures still, but including business logic within those stored procedures makes your application harder to debug and update.
Including business logic in your database will also effectively tie you to using that one DBMS, and not allow the data layer to remain independent from your application. For example, you may encounter performance and scalability problems with one DB once your application is live, but due to business logic scattered throughout the db, migrating to a more scalable database will be time consuming at best.
If business logic is kept in application code (eg java or c#) and the data layer is abstracted using a data abstraction layer, and an ORM if language permits, then interchanging databases is much less problematic.
We should be striving for separation of concerns, and keeping business logic out of the db helps achieve that.
edit: There are also performance concerns which may dictate that stored procedures are a good place to keep business logic. Containing logic within the data tier (ie the sproc) in some cases reduces the many round trips between the data abstraction layer and the database, which can give a performance boost. I've worked on systems like this in the past, for this reason, but I've always found then difficult to maintain. The problem being that you can look through the classes and procedures to see the business logic and think that's it and you will not see how a particular bug or process can be occurring, then you'll find the stored procedure and see the other half of the business operation (a real pain when the sproc is a 1000 lines!)
As with many things, where you place your business logic depends on the particular problem you're trying to solve.
We have a lot of data around us which can be of great use to us. Ordered collection of information helps businesses to take more proper decisions. Databases are ordered storage of information.
Responsibility: In a common scenario, we can state that there is a lot of information around, ordered collection of information is called data, this information relates to an entity, and ordered collection of data is a database, information relating to a group of entities. Collection of these databases is a DBMS. Responsibility of the database is organizing information.
Stored procedures, functions are more like the business processes that you require in order to collect the data you desire to.
First starting point,
Begin:
Select database in {postgreSQL, MySQL, SQL Server(Express edition)} and install it.
Learn about Codd Rules, Normal forms, Good resource
Start learning SQL, write queries.
Understand the basics involved in schema creation.
Learn procedural language implementation in database.
Ask doubts in SO.