Pattern for compile time discovery of unsafe/wrong queries

Pattern for compile time discovery of unsafe/wrong queries - mysql

My team is developing a large java application which extensively queries a MySQL database (in different classes and modules).
I'd like to known if there is a pattern that allows me to be notified at compile time if there are queries that refer to a wrong table structure (for instance if I remove or add a field on a table and the query string refers to it), in order to prevent runtime errors.
This should work also for JOIN queries.

Querydsl is similar to LiquidForm and supports both JPA / Hibernate and SQL based backends.
For the SQL based version we currently support MySQL (5.? tested), Oracle (10g tested) and HSQLDB.
In a nutshell a query like this
select count(*) from test where name = null
would become
long count = query.from(test).where(test.name.isnull()).count();
Querydsl SQL uses code generation to reflect SQL schemas into Java classes.

There's an open-source tool called DODS (Data Object Design Studio) that could do what you want. The DODS tool was originally part of the Enhydra Java application server project, and since the company backing that project went kablooey in 2002, DODS has been hosted and maintained at ObjectWeb. Anyway, it's open-source (LGPL).
http://forge.objectweb.org/projects/dods
The concept is that you describe your schema in an XML file, and DODS generates Java POJO classes with which you can query and manipulate the database tables. Of course every time you change your schema, you need to run DODS again to re-generate the ORM classes, and recompile your app against them.
But the result is that if a table or column disappears, and your app is querying database metadata that no longer exists, you do get a compile-time error, because your code is now calling a corresponding class or method that no longer exists.

I would say that the simple answer is "no". The more complete answer is "yes, to some degree", depending on your willingness to jump through hoops.
Unless you have a java representation of your database schema you will never be able to get compile time notification if your queries are wrong (these classes can be generated). Also, you must use these classes to build your queries, so the method you use today (query strings) must be abandoned. To be able to use the java classes to build your queries, you must also use tricks. LiquidForm uses the required tricks to build JPA queries, but I have not seen a similar library for constructing SQL queries (LiquidForm is new and quite brilliant). You would actually have to build a similar library yourself. So, as you see, getting compile time warnings when constructing SQLs is hard, but not impossible (only nearly impossible). But even if you should be able to create what I suggest, your java representation of the schema must be updated immediately after a schema change, so the generation of java classes would have to be built into your IDE or build tool.
I would suggest you rather have good unit tests that will notice when your queries become illegal as a result of schema change. This is the most common way to achieve what you want. Also, should you decide to "upgrade" to JPA, you could use LiquidForm to get what you want.

Related

Is it possible to create a SQL data base using a core data schema

I have a core data schema file with relationships between the entities.
I need to create a SQL database and would like to know if it can be created automatically (MySql or MS-SQL) using only this file.
Looking at the SQLite DB I see that the relationships are not mapped in any logical way.

First, your assessment that the relationships are "not mapped in any logical way" is not correct. If you look carefully at the Core Data generated database you will discover that the relationships are mapped exactly as in any other old relational database scheme, i.e. with foreign keys referring to rows in other tables.
Also, the naming conventions in these SQLite databases are very transparent (e.g., entity and attribute names start with Z, etc.
That being said, I would strongly discourage you to hack the Core Data generated database file, or even to use it to inform another database scheme, the reason being that these are undocumented features that could change any time without notice and thus break any code you write based on them.
IMO, the most practical thing to do is to rewrite the model quickly in the usual MySQL schema format and update it manually as well when you change the managed object model.
If you would like to automate the process, there is a rich set of APIs provided for interpreting and parsing NSManagedObjectModel, including classes like NSEntityDescription, NSAttributeDescription etc. You could write a framework that iterates though your entities and attributes and generates a text file that is a readable schema for MySQL, complete with information about indexing, versions etc..
If you go down that route, please make sure to notify us and do post your framework on Github for the benefit of others.

If you use Core Data you can create an SQL based database using a schema file but its structure is entirely controlled by the Core Data framework. Apple specifically tell us as developers to leave it alone and do not edit it using libsqlite or any other method. If you do then Core Data won't have anything to do with you!
In terms of making your own DB using one of Apple's schema files, I'm sure it is possible, but you'd have to know the inner workings of the Core Data framework to even attempt it.
In terms of making your own SQLite DB then you have to sort out all the relationships and mapping yourself.
I think that mixing and matching Core Data resources and custom built SQLite databases is probably a headache waiting to happen. I have used both methods and find that Core Data is brilliant (especially with iCloud) as long as you're OK with your App being limited to Apple only.

What's the appropriate way to test code that uses MySQL-specific queries internally

I am collecting data and store this data in a MySQL database using Java. Additionally, I use Maven for building the project, TestNG as a test framework, and Spring-Jdbc for accessing the database. I've implemented a DAO layer that encapsulates the access to the database. Besides adding data using the DAO classes I want to execute some queries which aggregate the data and store the results in some other tables (like materialized views).
Now, I would like to write some testcases which check whether the DAO classes are working as they should. Therefore, I thought of using an in-memory database which will be populated with some test data. Since I am also using MySQL-specific SQL queries for aggregating data, I went into some trouble:
Firstly, I've thought of simply using the embedded-database functionality provided by Spring-Jdbc to instantiate an embedded database. I've decided to use the H2 implementation. There I ran into trouble because of the aggregation queries, which are using MySQL-specific content (e.g. time-manipulation functions like DATE()). Another disadvantage of this approach is that I need to maintain two ddl files - the actual ddl file defining the tables in MySQL (here I define the encoding and add comments to tables and columns, both features are MySQL-specific); and the test ddl file that defines the same tables but without comments etc. since H2 does not support comments.
I've found a description for using MySQL as an embedded database which I can use within the test cases (http://literatitech.blogspot.de/2011/04/embedded-mysql-server-for-junit-testing.html). That sounded really promising to me. Unfortunately, it didn't worked: A MissingResourceExcpetion occurred "Resource '5-0-21/Linux-amd64/mysqld' not found". It seems that the driver is not able to find the database daemon on my local machine. But I don't know what I have to look for to find a solution for that issue.
Now, I am a little bit stuck and I am wondering if I should have created the architecture differently. Do someone has some tips how I should setup an appropriate system? I have two other options in mind:
Instead of using an embedded database, I'll go with a native MySQL instance and setup a database that is only used for the testcases. This options sounds slow. Actually, I might want to setup a CI server later on and I thought that using an embedded database would be more appropriate since the test run faster.
I erase all the MySQL-specific stuff out of the SQL queries and use H2 as an embedded database for testing. If this option is the right choice, I would need to find another way to test the SQL queries that aggregates the data into materialized views.
Or is there a 3rd option which I don't have in mind?
I would appreciate any hints.
Thanks,
XComp

I've created Maven plugin exactly for this purpose: jcabi-mysql-maven-plugin. It starts a local MySQL server on pre-integration-test phase and shuts it down on post-integration-test.

If it is not possible to get the in-memory MySQL database to work I suggest using the H2 database for the "simple" tests and a dedicated MySQL instance to test MySQL-specific queries.
Additionally, the tests for the real MySQL database can be configured as integration tests in a separate maven profile so that they are not part of the regular maven build. On the CI server you can create an additional job that runs the MySQL tests periodically, e.g. daily or every few hours. With such a setup you can keep and test your product-specific queries while your regular build will not slow down. You can also run a normal build even if the test database is not available.
There is a nice maven plugin for integration tests called maven-failsafe-plugin. It provides pre- and post- integration test steps that can be used to setup the test data before the tests and to cleanup the database after the tests.

Why does LinqPad create Fields instead of Properties?

I recently took on a project of creating a tool for LinqPad that would Dump query results into CSV format in order to use the tool on massive databases for quick results. One thing I wanted out of the tool is for it to be able to work in Visual Studio, and LinqPad. Thus, if I was using LinqtoSQL in VS2010, or LinqPad, I could dump results quickly to a csv file, and then open it up into Excel to view the results.
The biggest hiccup in the project came from how LinqPad builds their DataContexts vs. how Visual Studio builds their DataContexts. The best information I could find on how LinqPad does it comes from here. Basically what I found from my project, was that VS2010 creates properties for their DataContexts, but LinqPad creates Fields. Thus when using reflection:
LinqPad:
dataContextType.GetProperties() //returns 0
dataContextType.GetFields() //returns the Fields from LinqPad created DataContext
VS 2010 LinqToSQL:
dataContextType.GetProperties() //returns the Properties from VS created DataContext
dataContextType.GetFields() //returns 0
So why does LinqPad use Fields instead of Properties in their DataContexts? Wouldn't it have been more feasible to copy the Visual Studio LinqToSQL pattern?
Update
Based on a comment I decided to ask the same question within the LinqPad forum as well.

This is a good question. The main reason for LINQPad using fields to map columns is for performance when building the typed DataContext that backs database-connected queries.
We're not talking about the speed of executing the properties themselves (there's actually very little overhead in executing simple accessors and the JIT may even inline them.) The overhead is when building the typed DataContext via Reflection.Emit. A field is simply that: one item of metadata, whereas a property requires emitting a field definition, a property definition, two methods for the accessors (each with IL to get/set the underlying field). Because users may point LINQPad to databases with upwards of 1000 tables and functions, this can add up in terms of the time taken to build the assembly - as well as its size (increasing HDD activity and working set).
You have raised an interesting issue in the lack of unification between PropertyInfo and FieldInfo in the reflection object model. It would be nice if there was an interface that unified fields and (non-indexed) properties.

Database and logic layer for ASP.NET MVC application

I'm going to start a new project which is going to be small initially but may grow to big over the years. I'm strongly convinced that I'm going to use ASP.NET MVC with jQuery for UI. I want to go for MySQL as database for some reasons but worried on few things.
I'm totally new to Linq but it seems that it is easier to use once you are familiar with it.
First thing is that accessing data should be easy. So I thought I should use MySQL to Linq but somewhere I read that it is not directly supported but MySQL .NET connector adds support for EntityFramework. I don't know what are the pros and cons of it. DbLinq is what I also heard. I would love if I can implement repository pattern as it allows to apply filter in logic layer rather than in data access layer. Will it be possible if I use Entity Framework?
I'm also concerned about the performance. Someone told me that if we use Entity framework it fetches lot of data and then filter it. Is that right?
So questions basically are -
Is MySQL to Linq possible? If yes where can I get more details on it?
Pros and cons of using EntityFramework or DbLinq with MySQL?
Will it be easy to access data using EntityFramework or DbLinq with MySQL?
Will I be able to implement repository pattern which allows applying filter in logic layer rather than data access layer (when I use EntityFramework with MySQL)
Does it fetches hell lot of data from database and then apply filter on it?
If it sounds too many questions from my side in that case, if you can just let me know what you will do (with a considerable reason) in this situation as an experienced person in this area, that should answer my question.

As I am fan of ALT.NET I would recomend you to use NHibernate for your project instead of EntityFramework, you may google for the advantages over it, I am convinced you'll choose it.

Based on the points you've mentioned, then I would seriously consider going with MS SQL instead of MySQL initially and implementing LINQ-to-SQL instead of Entity Framework, and here's why:
The fact that you are anticipating a lot of traffic initially tells me that you need to think about where you plan to end up, rather than where to start. I have considerably more experience with MS SQL than I do with MySQL, but if you're talking about starting with the community version of MySQL and upgrading later, you're going to be incurring a significant expense anyway with the Enterprise version.
I have heard there is a version of LINQ that supports MySQL, but, unless things have changed recently, it is still in beta. I am completing an 18-month web-based project that used ASP.NET MVC 1.0, LINQ-to-SQL, JavaScript, jQuery, AJAX, and MS SQL. I implemented the repository pattern, view models, interfaces, unit tests and integration tests using WatiN. The combination of technologies worked very well for me, and I plan to go with the same combination for a personal project I'm developing.
When you get MS SQL with a hosting plan, you typically have the ability to create multiple databases from that single instance. It looks like they give you more storage because they give you multiple MySQL databases, but that's only because the architecture only supports the creation of one database per instance.
I won't use the Entity Framework for my ASP.NET MVC projects, because I wasn't crazy about ADO.NET in the first place. I don't want to have to open a connection, create a command object, populate a parameter collection, issue the execute method, and then iterate through a one-way reader object to get my data. Once you see how LINQ-to-SQL simplifies the process, you won't want to go back either. In the project I mentioned earlier, I have over 60 tables in the database with about 200 foreign key relationships. Because I used LINQ-to-SQL with the repository pattern in my data layer, I was able to build the application using not a single stored procedure. LINQ-to-SQL automatically protects against SQL injection attacks and support optimistic and pessimistic concurrency checking.
I don't know what your project is, but you don't want to get into a situation where you're going to have trouble scaling the application later. Code for the end result, not for the starting point, and you'll save yourself a lot of headaches later.

What is the best way to build a data layer across multiple databases?

First a bit about the environment:
We use a program called Clearview to manage service relationships with our customers, including call center and field service work. In order to better support clients and our field technicians we also developed a web site to provide access to the service records in Clearview and reporting. Over time our need to customize the behavior and add new features led to more and more things being tied to this website and it's database.
At this point we're dealing with things like a Company being defined partly in the Clearview database and partly in the website database. For good measure we're also starting to tie the scripting for our phone system into the same website, which will require talking to the phone system's own database as well.
All of this is set up and working... BUT we don't have a good data layer to work with it all. We moved to Linq to SQL and now have two DBMLs that we can use, along with some custom classes I wrote before I'd ever heard of Linq, along with some of the old style ADO datasets. So yeah, basically things are a mess.
What I want is a data layer that provides a single front end for our applications, and on the back end manages everything into the correct database.
I had heard something about Entity Framework allowing classes to be built from multiple sources, but it turns out there can only be one database. So the question is, how could I proceed with this?
I'm currently thinking of getting the Linq To SQL classes all set for each database, then manually writing Linq compatible front ends that tie those together. Seems like a lot of work, and given Linq's limitations (such as not being able to refresh) I'm not sure it's a good idea.
Could I do something with Entity Framework that would turn out better? Should I look into another tool? Am I crazy?

The Entity Framework does give a certain measure of database independence, insofar as you can build an entity model from one database, and then connect it to a different database by using a different entity connect string. However, as you say, it's still just one database, and, moreover, it's limited to databases which support the Entity Framework. Many do, but not all of them. You could use multiple entity models within a single application in order to combine multiple databases using the Entity Framework. There is some information on this on the ADO.NET team blog. However, the Entity Framework support for doing this is, at best, in an early stage.
My approach to this problem is to abstract my use of the Entity Framework behind the Repository pattern. The most immediate benefit of this, for me, is to make unit testing very simple; instead of trying to mock my Entity model, I simply substitute a mock repository which returns IQueryables. But the same pattern is also really good for combining multiple data sources, or data sources for which there is no Entity Framework provider, such as a non-data-services-aware Web service.
So I'm not going to say, "Don't use the Entity Framework." I like it, and use it, myself. In view of recent news from Microsoft, I believe it is a better choice than LINQ to SQL. But it will not, by itself, solve the problem you describe. Use the Repository pattern.

if you want to use tools like Linq2SQl or EF and don't want to have to manage multiple DBMLS (or whaetever its called in EF or other tools), you could create views in your website database, that reference back to the ClearView or Phone system's DB.
This allows you to decouple your web site from their database structure. I believe Linq2Sql and EF can use a view as the source for an Entity. If they can't look at nHibernate.
This will also let you have composite entities that are pulled from the various data sources. There are some limitations updating views in SQL Server; however, you can define your own Instead of trigger(s) on the view which can then do the actual insert update delete statements.

L2S works with views, perfectly, in my project. You only need to make a small trick:
1. Add a secondary DB table to the current DB as a view.
2. In Designer, add a primary key attribute to a id field on the view.
3. Only now, add an association to whatever other table you want in the original DB.
Now, you might see the view available for the navigation.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008