Automated ETL / Database Migration Solution - mysql

I'm looking for an ETL solution that we can create a configure by hand and then deploy to run autonomously. This is basic transformation, it need not be feature heavy. Key points would be free or open source'ed software that could be tailored more to suit specific needs.
In fact, this could be reduced to a simple DB migration tool that will run on a Linux server. Essentially the same as the above but we probably won't need to validate / transform the data at all besides renaming columns.
I forgot to mention that this is going to have to be very cross platform. I'd like to be able to deploy it to a server, as well as test it on OSX and Windows.

Try Pentaho or Talend. Pentaho has a nice job-scheduling paradigm as well as the ETL workbench (Kettle). I haven't used Talend, but I've heard good things and I imagine it carries similar functionality.

Related

What database for Data Acquisition?

I have to develop a database that will be used for data acquisition, mainly measurements from micrometer which will be compared against a Reference Table inside the db. The platform is OS X. I have been looking at Valentina-DB, SQLite and even MySQL.
My main requirement is: The database will be used by factory workers which may not have a lot of experience in using software. Therefore, the front-end has to be extremely easy to use. This includes installation of the database and the front-end.
What are my options when it comes to custom GUI apps?
Most of databases have no GUI front-end to use "by factory workers for data acquisition" so you need to program it yourself.
One of the approaches would be to use Java Swing GUI and some Java-based database like Apache Derby maybe. You could put everything into runnable jars, talk to database exclusively directly (not network setup, no authentication) and Java is available on OS X form Oracle website. Seems relatively easy to setup and would also run under Windows if at the end desired. This is not the only possible approach but something that is likely to work.
There are many possible alternative approaches.

Is there any database system which I can use it without installation

I am designing a small program, which I want to use a database to manage my data. I want a database which I may put it into my system, so that users may use it without install any database system.
So is there any database suit for my project requirement?
Programming Language: Java
Platform: cross-platform (linux, windows, mac)
There are three reasons why I want to use an embedded database system
there is not too much information in my system
what I wish it user may directly to use my system without to setup any database
I also wish when I copy my system from computer A to computer B, it may keep all data
besides database, I also consider to manage data with XML file, but I don't think it is a good idea to use XML, because it is not easy to update or delete data.
AND this is my first time to use stackoverflow, so there are some culture I am not very clear. If I offended stackoverflow, please forgive me.
I use h2 in java.
For following scenario:
h2 is really small (about 1Mb), it is easy to copy, and easy to use with maven, gradle, etc.
h2 is pure java db, when I want to write unit test on my DAO java code, it is easy to start.
h2 can simulate oracle dialect、mysql dialect. After I built a demo with h2, it is easy to move all java code to work with a big DB system.
In H2 jdbc URL, it is easy to config a init SQL script in a file. By this feature, It is easy to create a clear database with only necessary data in it.
When you want to share your data, you can simply release your database file with your production, which is difficult for Oracle or MySQL.
Another real example is Atlassian Confluence. You may download and install Confluence, start it with h2 for trial. When you decide to use it in business, set it up to connect MySQL or Oracle.
SQLite is a common choice. You can embed the database core functionality as a library in your app. The only local resource required is plain vanilla files on the normal file system -- no drivers, daemons etc.

Migrating subsets of production data back to dev

In our rails app we sometimes have db entries created by users that we'd like to make part of our dev environment, without exporting the whole table. So, we'd like to be able to have a special 'dev and testing' dump.
Any recommended best practices? mysqldump seems pretty cumbersome, and we'd like to pull in rails associations as well, so maybe a rake task would make more sense.
Ideas?
You could use an ETL tool like Pentaho Kettle. Once you have initial transformation setup that you want you could easily run it with different parameters in the future. This way you could also keep all your associations. I wrote a little blurb about Pentaho for another question here.
If you provide a rough schema I could probably help you get started on what your transformation would look like.
I had a similar need and I ended up creating a plugin for that. It was developed for Rails 2.x and worked fine for me, but I didn't have much use for it lately.
The documentation is lacking, but it's pretty simple. You basically install the plugin and then have a method to_sql available on all your models. Options are explained in README.
You can try it out and let me know if you have any issues, I'll try to help.
I'd go after it using a Rails runner script. That will allow your code to access the same things your Rails app would, including the database initializations. ActiveRecord will be able to take advantage of the model relationships you've defined.
Create some "transfer" tables in your production database and copy the desired data into those using the "runner" script. From there you could serialize the data, or use a dump tool, since you'll be dealing with a reduced amount of records. Reverse the process in the development environment to move the data into the database.
I had a need to populate the database in one of my apps from remote web logs and wrote a runner script that fired off periodically via cron, ftps the data from my site and inserts the data.

continuous integration with mysql

My entire environment, java, js, and php are set up with our continuous integration server (Hudson).
But how do I get out database into the mix?
I would like to deploy fresh MySql databases for unit testing, development, and qa.
And then I'd like to diff development against production and have an update script that would be used for releasing.
I would look at liquibase (http://www.liquibase.org/). It's a open source java based db migration tool that can be integrated into your build script and can handle db diffing. I've used it before to manage db updates on a project before with a lot of success.
You could write a script in Ant to do all that stuff and execute it during the build.
Perhaps investigate database migrations such as migrate4j.
Write a script that sets up your test database. Run it from your build tool, whatever that is, before your build tests run. I do this manually and it works pretty well; still integrating it into maven. Shouldn't be too much trouble.
Isn't HyperSQL in-memory DB (http://hsqldb.org/) better for running your tests?
For managing migrations to your database sechema between releases, you could do a lot worse than to use Scala migrations:
http://opensource.imageworks.com/?p=scalamigrations
It's an open source tool that I've found to integrate well in a Java development ecosystem, and has extra appeal if any of your team have been looking at ways to introduce Scala.
It should also be able to build you a database from scratch, for testing purposes.
Give http://code.google.com/p/mysql-php-migrations/ a try!
Very PHP oriented, but seems to work well for most purposes.

ETL Tool for transfering old Firebird Database to a new organized Firebird Database

After looking at a lot of questions..i found no real answer for this.
I redisigned an Database for our customer.
With Microsoft Access i found a good Tool to get old table Data in my new well formed Database Structure. It is really easy but takes a lot of time (cause handling old Data with a lot of care).
Are there any Open Source Tools that bring that facilities like Microsoft Access?
To clear it up: I "just" want to reorder old Firebird Database Data in a new "best-practise" Way.
Edit:
I would be really nice if i can get a Log File or something similar to have some documentation on the changes.
Update:
After checking some of the Tools of that Wikipedia Site. I found no real Logging Mechanism.
How do you documentate the changes on a Database? Simply by writing it down?
Result:
So i dont got an real answer...i ma still searching for an nice tool. thnak you guys for the hints and your thoughts regarding this question. I want to reward Kenneth Cochran with the Bounty cause he pointed me to ETL. Thank you!
Talend's Open Source ETL supports FireBird. Very cool tool.
http://www.talend.com/download.php?src=DataGovernanceBlog
It sounds like what you're asking for is an ETL(extract, transform, load) tool.
Wikipedia has a list of open source tools that may help with this. I've not used any of them personally.
Well, I used the Pentaho suite for doing ETL using their Kettle tool.
It's quite easy to use and should be more than enough to reach your intent.
And it's open source.
Give a look at it.
I advice you to use a tool like IBExpert or Database Workbench which are the best tools for Firebird.
For migrating Firebird 1.5 to Firebird 2.1 : you just have to make a backup of your database with Firebird 1.5 server and restore your database with Firebird 2.1 server
I've used Excel in the past to document data model changes - each worksheet used the application version in order to sync with our tags in CVS. Every thing was logged in it - columns that were removed as well as minor alterations to datatypes like varchar(10) to varchar(20) etc along with a note describing why the change was made.
Personally, I've only ever scripted things like these as DDL/DML scripts broken into a script that dealt with table creation, constraint dropping, index drops, DML script(s), constraint application, index application, and removing orphaned tables.
If you want a basic ETL tool, that is client based (and cheap at $300), look at Advanced Query Tool. It mainly queries any type of ODBC connection(including Excel files set up that way), but also has some extended features, including moving data. And has a command line interface. http://www.querytool.com/
I've used it instead of Informatica for one-off jobs, but I've also used to extract from Excel to another file for business users, for a few months, scheduled from my desktop.