Testing framework for data access tier - mysql

Is there any testing framework for Data access tier? I'm using mysql DB.

If you are using ORM ( such as Hibernate), then the testing for DAL is easy. All you have to do, is to specify a test config involving in memory sqlite database and then executing all your DAL tests against the sqlite. Of course you need to do a proper data population, schema definition in the first place.
Dbunit will help you here.

Why do you need a database test tool?
Use your services (or DAOs) to populate the database. Otherwise you're going to duplicate your fixture state in your tests and your domain logic in your fixtures. This will result in worse maintainability (most notably readability).
If you get weary of inventing test data think about tools like Quickcheck (there are ports for all major languages).

Related

Doctrine, SQLite and Enums

We have an application running on Symfony 2.8 with a package named "liip/functional-test-bundle". We plan on using PHP Unit to run functional tests on our application, which uses MySQL for it's database.
The 'functional test bundle' package allows us to use the entities as a schema builder for an in-memory SQLite database, which is very handy because:
It requires zero configuration to run
It's extremely fast to run tests
Our tests can be run independently from each test and the development data
Unfortunately, some of our entities use 'enums' which is not supported by SQLite, and our technical lead has opted to keep existing enums whilst refraining from using them anymore.
Ideally we need this in the project sooner rather than later, so the team can start writing new tests in the future to help maintain the stability of the application.
I have 3 options at this point, but I need help choosing the correct one and performing it correctly:
Convince the technical lead that enums are a bad idea and lookup tables could instead be used (Which may cost time where the workload is already high)
Switch to using MySQL for the testing database. (This will require additional configuration for our tests to run, and may be slower)
Have doctrine detect when enums are used on a SQLite driver, and switch them out for strings. (I would have no idea how to do this, but this is, in my opinion, the most ideal solution)
Which action is the best, and how should I carry it out?

SQLite3 database per customer

Scenario:
Building a commercial app consisting in an RESTful backend with symfony2 and a frontend in AngularJS
This app will never be used by many customers (if I get to sell 100 that would be fantastic. Hopefully much more, but in any case will be massive)
I want to have a multi tenant structure for the database with one schema per customer (they store sensitive information for their customers)
I'm aware of problem when updating schemas but I will have to live with it.
Today I have a MySQL demo database that I will clone each time a new customer purchase the app.
There is no relationship between my customers, so I don't need to communicate with multiple shards for any query
For one customer, they can be using the app from several devices at the time, but there won't be massive write operations in the db
My question
Trying to set some functional tests for the backend API I read about having a dedicated sqlite database for loading testing data, which seems to be good idea.
However I wonder if it's also a good idea to switch from MySQL to SQLite3 database as my main database support for the application, and if it's a common practice to have one dedicated SQLite3 database PER CLIENT. I've never used SQLite and I have no idea if the process of updating a schema and replicate the changes in all the databases is done in the same way as for other RDBMS
Is this a correct scenario for SQLite?
Any suggestion (aka tutorial) in how to achieve this?
[I wonder] if it's a common practice to have one dedicated SQLite3 database PER CLIENT
Only if the database is deployed along with the application, like on a phone. Otherwise I've never heard of such a thing.
I've never used SQLite and I have no idea if the process of updating a schema and replicate the changes in all the databases is done in the same way as for other RDBMS
SQLite is a SQL database and responds to ALTER TABLE and the like. As for updating all the schemas, you'll have to re-run the update for all schemas.
Schema synching is usually handled by an outside utility, usually your ORM will have something. Some are server agnostic, some only support specific servers. There are also dedicated database change management tools such as Sqitch.
However I wonder if it's also a good idea to switch from MySQL to SQLite3 database as my main database support for the application, and
SQLite's main advantage is not requiring you to install and run a server. That makes sense for quick projects or where you have to deploy the database, like a phone app. For server based application there's no problem having a database server. SQLite's very restricted set of SQL features becomes a disadvantage. It will also likely run slower than a server database for anything but the simplest queries.
Trying to set some functional tests for the backend API I read about having a dedicated sqlite database for loading testing data, which seems to be good idea.
Under no circumstances should you test with a different database than the production database. Databases do not all implement SQL the same, MySQL is particularly bad about this, and your tests will not reflect reality. Running a MySQL instance for testing is not much work.
This separate schema thing claims three advantages...
Extensibility (you can add fields whenever you like)
Security (a query cannot accidentally show data for the wrong tenant)
Parallel Scaling (you can potentially split each schema onto a different server)
What they're proposing is equivalent to having a separate, customized copy of the code for every tenant. You wouldn't do that, it's obviously a maintenance nightmare. Code at least has the advantage of version control systems with branching and merging. I know only of one database management tool that supports branching, Sqitch.
Let's imagine you've made a custom change to tenant 5's schema. Now you have a general schema change you'd like to apply to all of them. What if the change to 5 conflicts with this? What if the change to 5 requires special data migration different from everybody else? Now let's imagine you've made custom changes to ten schemas. A hundred. A thousand? Nightmare.
Different schemas will require different queries. The application will have to know which schema each tenant is using, there will have to be some sort of schema version map you'll need to maintain. And every different possible query for every different possible schema will have to be maintained in the application code. Nightmare.
Yes, putting each tenant in a separate schema is more secure, but that only protects against writing bad queries or including a query builder (which is a bad idea anyway). There are better ways mitigate the problem such as the view filter suggested in the docs. There are many other ways an attacker can access tenant data that this doesn't address: gain a database connection, gain access to the filesystem, sniff network traffic. I don't see the small security gain being worth the maintenance nightmare.
As for scaling, the article is ten years out of date. There are far, far better ways to achieve parallel scaling then to coarsely put schemas on different servers. There are entire databases dedicated to this idea. Fortunately, you don't need any of this! Scaling won't be a problem for you until you have tens of thousands to millions of tenants. The idea of front loading your design with a schema maintenance nightmare for a hypothetical big parallel scaling problem is putting the cart so far before the horse, it's already at the pub having a pint.
If you want to use a relational database I would recommend PostgreSQL. It has a very rich SQL implementation, its fast and scales well, and it has something that renders this whole idea of separate schemas moot: a built in JSON type. This can be used to implement the "extensibility" mentioned in the article. Each table can have a meta column using the JSON type that you can throw any extra data into you like. The application does not need special queries, the meta column is always there. PostgreSQL's JSON operators make working with the meta data very easy and efficient.
You could also look into a NoSQL database. There are plenty to choose from and many support custom schemas and parallel scaling. However, it's likely you will have to change your choice of framework to use one that supports NoSQL.

Performing a join across multiple heterogenous databases e.g. PostgreSQL and MySQL

There's a project I'm working on, kind of a distributed Database thing.
I started by creating the conceptual schema, and I've partitioned the tables such that I may require to perform joins between tables in MySQL and PostgreSQL.
I know I can write some sort of middleware that will break down the SQL queries and issue sub-queries targeting individual DBs, and them merge the results, but I'd like to do do this using SQL if possible.
My search so far has yielded this (Federated storage engine for MySQL) but it seems to work for MySQL databases.
If it's possible, I'd appreciate some pointer's on what to look at, preferably in Python.
Thanks.
It might take some time to set up, but PrestoDB is a valid OpenSource solution to consider.
see https://prestodb.io/
You connect connect to Presto with JDBC, send it the SQL, it interprets the different connections, dispatches to the different sources, then does the final work on the Presto node before returning the result.
From the postgres side, you can try using a foreign data wrapper such as mysql_ftw (example). Queries with joins can then be run through various Postgres clients, such as psql, pgAdmin, psycopg2 (for Python), etc.
This is not possible with SQL.
Your options are to write your own "middleware" as you hinted at. To do that in Python, you would use the standard DB-API drivers for both databases and write individual queries; then merge their results. An ORM like sqlalchemy will go a long way to help with that.
The other option is to use an integration layer. There are many options out there, however, none that I know that are written in Python. mule esb, apache servicemix, wso2 and jboss metamatrix are some of the more popular ones.
You can colocate the data on a single RDBMS node (either PostgreSQL or MySQL for example).
Two main approaches
Readonly - You might want to use read-replicas of both source systems, then use a process to copy the data to a new writeable converged node; OR
Primary - You might chose a primary database of 2. Move the data from one to the primary using a conversion process (eg. ETL or off the shelf table-level replication)
Then you can just run the query on the one RDBMS with JOINs as usual.
BONUS: You can also do log reading from RDBMS that can ship logs through Kafka. You can make it really complex as required.

MySQL: How do I test my database architecture (foreign key consistency, stored procedures, etc)

I'm just about designing a larger database architecture. It will contain a set of tables, several views and quite some stored procedures. Since it's a database of the larger type and in the very early stage of development (actually it's still only in the early design stage) I feel the need of a test suite to verify integrity during refactoring.
I'm quite familiar with testing concepts as far as application logic is concerned, both on server side (mainly PHPUnit) and client side (Selenium and the Android test infrastructure).
But how do I test my database architecture?
Is there some kind of similar testing strategies and tools for databases in general and MySQL in particular?
How do I verify that my views, stored procedures, triggers and God-knows-what are still valid after I change an underlying table?
Do I have to wrap the database with, say, a PHP layer to enable testing of database logic (stored procedures, triggers, etc)?
There are two sides of database testing.
One is oriented to testing database from the business logic point of view and should not concern persisted data. At that level there is a well-known technique - ORM. Algorithm in this case is simple: describe a model and create a set of unique cases or criterias to test if all cascades actions perform as they should (I mean, if we create Product and link it to Category, than after saving a session we get all entities written in DB with all required relations between). More to say: some ORMs already provide a unit testing module (for example, NHibernate) and some of them even more cool tool: the easiest and the fastest way to create database schemes, models, test cases: for example, Fluent NHibernate.
Second is oriented to testing database schema itself. For that purpose you can look at a good library DbUnit. Quote from official site:
DbUnit is a JUnit extension (also usable with Ant) targeted at database-driven projects that, among other things, puts your database into a known state between test runs. DbUnit has the ability to export and import your database data to and from XML datasets. Since version 2.0, DbUnit can also work with very large datasets when used in streaming mode. DbUnit can also help you to verify that your database data match an expected set of values.
At finally, I highly recommend you to read the article "Evolutionary Database Design" from Martin Fowler's site. It's a bit outdated (2003), but still worth to read indeed.
To test a database, some of the things you will need are:
A test database containing all your data test cases, initial data, and so on. This will enable you to test from a known start position each time.
A set of transactions (INSERT, DELETE, UPDATE) that move your database through the states you want to test. These can themselves be stored in the test database.
Your set of tests - expressed as queries on the database, that do the actual checking of the results of your actions. These results will be tested by your test suite.
Exceptions can be thrown by a database, but if you are getting exceptions, you are likely to have much more serious concerns in your database and data. You can test the action of the database in a similar fashion, but except for "corner cases" this should be less necessary, as modern database engines are pretty robust at their task of data serving.
You should not need to wrap your database with a PHP layer - if you follow the above structure it should be possible to have your complete test suite in the DML and DDL of your actual database combined with your normal test suite.

Database choices

I have a prickly design issue regarding the choice of database technologies to use for a group of new applications. The final suite of applications would have the following database requirements...
Central databases (more than one database) using mysql (must be mysql due to justhost.com).
An application to be written which accesses the multiple mysql databases on the web host. This application will also write to local serverless database (sqlite/firebird/vistadb/whatever).
Different flavors of this application will be created for windows (.NET), windows mobile, android if possible, iphone if possible.
So, the design task is to minimise the quantity of code to achieve this. This is going to be tricky since the languages used are already c# / java (android) and objc (iphone). Not too worried about that, but can the work required to implement the various database access layers be minimised?
The serverless database will hold similar data to the mysql server, so some kind of inheritance in the DAL would be useful.
Looking at hibernate/nhibernate and there is linq to whatever. So many choices!
Get a better host. Seriously - SQL Server hosts don't cost that much more. An hour development time possibly per month - and that is already non-conervative.
Otherwise - throw out stuff you do not need. Neutralize languages to one. If that is an internet access stuff, check out OData for exposing data - nice nidependant protocol
The resit sis architecture. and LINQ (2Sql) sucks - compared to nhibernate ;)
but can the database access layer be reused?
Yes, it can be, but you have to carefully create a loosely coupled datalayer with no dependency on other parts.