Should I pass in or encapsulate a connection in a DAO? - language-agnostic

Is it better to encapsulate the connection inside a DAO, ie have the DAO create or retrieve the connection and then close, or is better to pass the connection into the DAO and handle the details in code external to the DAO?
Follow-up: How do you mange closing connections if you encapsulate the connection inside the DAO?

The DAO should do CRUD operations and hide those operations from the callers. So you should encapsulate the connection.
On the other hand, if the upper levels are coordinating the DAOs (e.g. transactions) then you also could pass the connection into the DAOs (and close it at the same level you opened it, not in the DAOs).
Bottom line is... it really depends on the responsibility that each layer of your application has. Should the callers care where the DAOs are retrieving the data or not? If not, then encapsulate the connections.

I think you've answered your own question. The basic design pattern explains the DAO should be creating/retrieving the Connection (say via a Factory) and hide those from any callers like Service tier classes.
http://java.sun.com/blueprints/corej2eepatterns/Patterns/DataAccessObject.html
Do you see any merits in keeping this external?

Coming at this from a pure usability and standards viewpoint I think that you want the DAO to take care of the connections. That is after all a main function of Data access.
Consider your use, would you want the presentation/business layer code that uses the DAO to know enough about the database to create a connection to pass to the DAO? What if you need to move the database or re-name it, at that point its very nice to have the connections encapsulated.
Using a DAO that manages its own connections also makes for a more terse use of the objects in the calling code, boosting the overall readability, IMO.

I think the key point of a DAO is that you can swap out the implementation without the rest of the application knowing or caring. I've actually done this on a project. The DAO interface remains the same but connection details change so you CAN'T have it visible externally.

Related

How to connect to an arbitary database using FaaS?

I just did some reading about serverless computing and FaaS. If using FaaS to access an arbitrary database, we need each time to establish and close a database connection. In, lets say a node applications, we would usually establish the connection once and reuse it for multiple requests.
Correct?
I have a hosted MongoDB at mlab and thought about implementing a REST API with Googles Cloud Functions Service. Don't know how to handle the database connection efficient.
For sure thing get clearer while coding and testing. But I would like to know chances to succeed before spending a lot of time.
Thanks
Stefan
Serverless platforms reuse the underlying containers between distinct function invocations whenever possible. Hence you can set up a database connection pool in the global function scope and reuse it for subsequent invocations - as long as the container stays warm. GCP has a guide here using MySQL but I imagine the same applies to MongoDB.

Should I close mySQL connection in-between method calls?

So this question is a matter of good idea/bad idea. I am using a MySQL connection many times in a short amount of time. I have created my own method calls to update values, insert, delete, etc.
I am reusing the same connection for each of these methods, but I am opening and closing the connection at each call. The problem being that I need to check to make sure that the connection is not open before I try to open it again.
So, the question is: Is there danger in just leaving the MySQL connection open in between method calls? I'd like to just leave it open and possibly improve speed while I am at it.
Thanks for any advice!
Generally speaking, no you shouldn't be closing it if in the same class / library / code scope you're just going to open it again.
This is dependant on the tolling / connection library you're using. if you're using connection pooling some library's will not actually close the connection (immediately) but return it to the pool.
The only comment I'll make about reusing a connection is that if you're using variables that are connection specific those variables will still be valid for the same connection and may cause problems later if another query uses one of them and it has a value from a past query that is no longer reliant - however this would also raise questions about the suitability of the variable in the first place.
Opening a connection is something is within MySQL is fairly light (compared with other databases) however you shouldn't be creating extra work if you can avoid it.

Class should support an interface but this requires adding logic to the class in an intrusive way. Can we prevent this?

I have a C++ application that loads lots of data from a database, then executes algorithms on that data (these algorithms are quite CPU- and data-intensive that's way I load all the data before hand), then saves all the data that has been changed back to the database.
The database-part is nicely separate from the rest of the application. In fact, the application does not need to know where the data comes from. The application could even be started on file (in this case a separate file-module loads the files into the application and at the end saves all data back to the files).
Now:
the database layer only wants to save the changed instances back to the database (not the full data), therefore it needs to know what has been changed by the application.
on the other hand, the application doesn't need to know where the data comes from, hence it does not want to feel forced to keep a change-state per instance of its data.
To keep my application and its datastructures as separate as possible from the layer that loads and saves the data (could be database or could be file), I don't want to pollute the application data structures with information about whether instances were changed since startup or not.
But to make the database layer as efficient as possible, it needs a way to determine which data has been changed by the application.
Duplicating all data and comparing the data while saving is not an option since the data could easily fill several GB of memory.
Adding observers to the application data structures is not an option either since performance within the application algorithms is very important (and looping over all observers and calling virtual functions may cause an important performance bottleneck in the algorithms).
Any other solution? Or am I trying to be too 'modular' if I don't want to add logic to my application classes in an intrusive way? Is it better to be pragmatic in these cases?
How do ORM tools solve this problem? Do they also force application classes to keep a kind of change-state, or do they force the classes to have change-observers?
If you can't copy the data and compare, then clearly you need some kind of record somewhere of what has changed. The question, then, is how to update those records.
ORM tools can (if they want) solve the problem by keeping flags in the objects, saying whether the data has been changed or not, and if so what. It sounds as though you're making raw data structures available to the application, rather than objects with neatly encapsulated mutators that could update flags.
So an ORM doesn't normally require applications to track changes in any great detail. The application generally has to say which object(s) to save, but the ORM then works out what needs persisting to the DB in order to do that, and might apply optimizations there.
I guess that means that in your terms, the ORM is adding observers to the data structures in some loose sense. It's not an external observer, it's the object knowing how to mutate itself, but of course there's some overhead to recording what has changed.
One option would be to provide "slow" mutators for your data structures, which update flags, and also "fast" direct access, and a function that marks the object dirty. It would then be the application's choice whether to use the potentially-slower mutators that permit it to ignore the issue, or the potentially-faster mutators which require it to mark the object dirty before it starts (or after it finishes, perhaps, depending what you do about transactions and inconsistent intermediate states).
You would then have two basic situations:
I'm looping over a very large set of objects, conditionally making a single change to a few of them. Use the "slow" mutators, for application simplicity.
I'm making lots of different changes to the same object, and I really care about the performance of the accessors. Use the "fast" mutators, which perhaps directly expose some array in the data. You gain performance in return for knowing more about the persistence model.
There are only two hard problems in Computer Science: cache invalidation and naming things.
Phil Karlton

Is it a bad idea to open a separate MySQL connection inside Rails' environment?

I'm in a situation where I need to make a call to a stored procedure from Rails. I can do it, but it either breaks the MySQL connection, or is a pseudo hack that requires weird changes to the stored procs. Plus the pseudo hack can't return large sets of data.
Right now my solution is to use system() and call the mysql command line directly. I'm thinking that a less sad solution would be to open my own MySQL connection independent of Active Record's connection.
I don't know of any reasons why this would be bad. But I also don't know the innards of the MySQL well enough to know it's 100% safe.
It would solve my problem neatly, in that with the controller actions that need to call a stored proc would open a fresh database connection, make the call and close it. I might sacrifice some performance, but if it works that's good enough. It also solves the issue of multiple users in the same process (we use mongrel, currently) in edge rails where it's now finally thread safe, as the hack requires two sql queries and I don't think I can guarantee I'm using the same database connection via Active Record.
So, is this a bad idea and/or dangerous?
Ruby on Rails generally eschews stored procedures, or implementing any other business logic in the database. One might say that you're not following "the Rails way" to be calling a stored proc in the first place.
But if you must call the stored proc, IMO opening a second connection from Ruby must be preferable to shelling out with system(). The latter method would open a second connection to MySQL anyway, plus it would incur the overhead of forking a process to run the mysql client.
You should check out "Enterprise Recipes with Ruby and Rails" by Maik Schmidt. It has a chapter on calling stored procedures from Rails.
MySQL can handle more than one connection per request, though it will increase the load on the database server. You should open the second connection in a 'lazy' manner, only when you are sure you need it on a given request.
Anyway, if performance were important in this application, you wouldn't be using Rails! >:-)
(joking!)
Considering how firmly RoR is intertwined with its own view of dbms usage, you probably should open a second connection to the database for any interaction it doesn't manage for you, just for SoC purposes if nothing else. It sounds from your description like it's the simplest approach as well, which is usually a strong positive sign.
Applications from other languages (esp. e.g. PHP) open multiple connections regularly (which doesn't make it desirable, but at least it demonstrates that mysql won't object.)
We've since tried the latest mysql gem from github and even that doesn't solve the problem.
We've patched the mysql adapter in Rails and that actually does work. All it does is make sure the MySQL connection has no more results before continuing on.
I'm not accepting this answer, yet, because I don't feel 100% that the fix is a good one. We haven't done quite enough testing. But I wanted to put it out there for anyone else looking at this question.

What the best way to access the database inside a class in PHP?

I have a session class that needs to store session information in a MySQL database. Obviously I will need to query the database in the methods of this class. In general I may need to connect more than one database simultaneously and may or may not be connected to that database already.
Given that, what's the best way to access databases for the session class or any class for that matter. Would creating a class to manage connections make sense?
I'd advise to check out this presentation, among other things it talks about best practices when accessing database:
http://laurat.blogs.com/talks/best_practices.pdf
Database Connections are a prime example of when and where you can safely use a Singleton pattern; however, if you know that the Session Object will be a global object and it will be the only place that you need to create Database Connections, you could pretty safely store the db connections as instance members of the Session Class.
Yes, I would use a DBAL. Either you can write your own, or you can use an existing solution like PDO. Even if using an existing solution, you may want to write a wrapper class that uses the singleton pattern so that a single connection can be shared with all parts of your code.