Preserving data integrity in a multi-step application - mysql

I'm currently working on a PHP web application with Symfony 2/Doctrine and MySQL as SGBD.
I have multiple steps (about 12) and at the end of a step, I store some data in my SGBD and I go to a next step, etc.
The user can return to a specific step with a 'go back' button. If he decides to do that, I need to update my stored data. For example, if a user is in step 6 and he returns to step 1, I need to clear some of my columns values.
My SQL model is light, 3 tables and I have a column state in one to keep the current step (step 1, step 2, etc). I don't know how to implement this.
Maybe it's a good idea to create stored procedures and call it before each save. In my mind, the stored procedure clean up my tables (perform an update) to restore at a given step.
Any ideas ?
Thankls

This sounds like a app design problem. If you are working with a framework my advice is to stay away from stored procedures and use your framework/DMS to interact with the database.
My suggestion would be to use a state machine. you need to:
1) Define all your steps
2) Define all possible transitions from one step to another
If you tell us more about the context on your problem we might be able to give you better advice. There are some great implementations for the state pattern for some frameworks
For symfony 2 i found these libraries:
https://github.com/yohang/Finite
https://github.com/winzou/StateMachineBundle

Related

Pentaho Kettle insert Error Handling of step

I am new to GeoKettle (Spoon) of Pentaho and I am currently rows from an Excel-file into my database. Now I want to avoid duplicates in my databasetable. That is why I want to insert only those rows into my database table which aren't there yet (to have only unique records in my database table).
And as far as I know, there are two ways to realize that. The first way I tried was with the Insert/Update step (I have disabled the Update functionality) and defined all the columns which have to be equal in order to insert the record or not. But it does not work. All records are still inserted into the database.
That is why I am trying to do the (according to pentaho) much faster option which is a "Table Output" with an "Update" error handling step as shown in the picture.
As shown in the picture, the arrow which is pointing from "table output" to "update" is black. But I need a red dotted one for error handling of step . But I do not know how to create this. In tutorials I often see that there pops up a little window with 2 options like in the picture:
But I do not get that popup. If I want to create a hop, I will have to mark both steps and do a right-click on one of them.
So in which possible ways can I create such a red dotted arrow? In the end, it has to look like this:
Thank you so much in advance!!
You have a problem with your setup. Or with your version of the PDI. The functionality of an error step was introduced in V4 but fully implemented for all steps around V6.
Download a fresh PDI from SourceForge. V7.1 is really a robust and stable edition. Unzip and test.
By the way, what you want to achieve is know as the CRUD pattern. CRUD for Create, Read, Update, Delete. The step doing this the Merge Rows (diff) (in the Joins family). You tell the steps which columns to check, and it produce a new column with the value identical, changed, new, or deleted. You can them redirect the flow in a Switch / Case to do the appropriate action. Further information here (V4).

Sybase to MySQL automatic exportation

I have two databases: Sybase and MySQL. I need to export records to MySql when these are inserted in Sybase or export in some scheduled event.
I've tried with output statement but this can not be used in triggers or procedures.
Any suggestion to solve this problem?
(disclaimer, I've done similar things previously, but by no means would I consider the answer below the state of the art - just one possible approach
google around something like 'cross-database replication' or 'cross rdbms replication' to see who's done this before.
).
I would first of all see if you can't score an ETL tool do the job without too much work. There are free open source ones and even things like Microsoft SSIS might work on non-MS databases.
If not, I would split this into different steps.
Find an appropriate Sybase output command that exports a subset of rows from one or more tables. By subset I mean you need to be able to add a WHERE clause, not just do a full table dump.
Use an appropriate MySQL import script/command to load the data gotten out of step #1. You may need to cycle back and forth between the 2 till you have something that works manually.
Write a Sybase trigger to insert lookup keys into a to-export table. You want to store at least the tablename & source Sybase table's keys for each inserted row. Use column names like key1_char, key2_char, not the actual column names, that makes it easier to extend to other source tables as needed. keep trigger processing as light as possible. What about updates btw?
Write a scheduled batch on Sybase side to run step #1 for the rows flagged in #3.
Write a scheduled batch on Mysql to import ,via #2, the results of #4. Or kick it off from #4.
Another approach is to do the #3 flagging bit as needed, but use to drive one scheduled batch that SELECTs data from Sybase and INSERTs it into mysql directly.
You'll have to pick up the data from Sybase's SELECT and bind it manually to the INSERT of mysql. But you probably get finer control over whats going on and you don't have to juggle 2 batches. That's what I think a clever ETL would already be doing on your behalf. Any half clever scripting language like php, python or ruby ought to handle it easily. Especially important if you have things like surrogate/auto-generated keys.
Keep in mind that in both cases you'll have to either delete the to-export rows that you've successfully inserted or flag them as done.

Multiple Pentaho Transformations 'Variables?'

I am using Pentaho Data Integration Software.
I am currently running a Pentaho Job as an ETL. I ETL data from multiple places and put them into a single database table. The schema for all of the places i ETL from are exactly the same. So, other than database connections and a single 'variable' that stores where that data came from, the transformation in Pentaho is exactly the same for each one. So i have a job, that runs each of these transformation.
The problem comes in, when i want to make a change. I need to change 6 transformations every time. What i want to do, is somehow set something like a variable in Pentaho, that tells it to run a single transformation, 6 times, with different database connections, and perhaps a single variable.
Is this possible?
Thanks in advanced.
If i have understood your question correctly, you need to loop multiple transformations using a single KTR file (assuming there is only one database type).
PDI provides you with a step called "Copy Rows to Result", where you can store the credentials of your database in multiple rows and for every run of the Job, it will use different connections and run the transformation multiple times (6 in ur case).
Note: I have assumed that you are having only one database type e.g. : mySQL but with different credentials.
Hope this helps :) I would be happy to provide you sample code in case you need it.
Well, why don't you use a job that will pass the host/user/password as variables? That way your whole data flow will be generic.
Hope this answer will lead you into the right direction!

Create and use data tables in netbeans

I have a table with:
vegetable name -- calcium contents -- Potassium contents -- vitamins -- fibers-- price (etc)
Let's say there are 5 entries (rows) in the table and I have to initially feed the data manually, like a first one time data feeding.
My requirement/problem is:
On a GUI when I select a vegetable name from a drop down menu I should get the contents displayed and then all of them should get added to get final score except the 'price'.
On the GUI if I select the 'vegetable name' and any one of the other 'property' (like 'fibers') then only that value should be displayed. e.g query-- spinach, fiber ? answer spinach-fiber = 20 unit., or spinach-vitamins = 40units etc.
I also want help in what type of database I should use here and how to populate the data for accessing it in the program later on. I believe its a simple data table of small size so what is the most efficient way of doing this?
Specific help with code will be of great help as I am absolutely new to java and netbeans.
Also, can I have a separate GUI for adding/appending further data from user in the same table? If yes, how is it done please?
I am using Netbeans 7.1.2.
After some search I got info about MySQL datatables in netbeans. (http://netbeans.org/kb/docs/ide/java-db.html)
I have created and made entries in the table but do not know how to access them for my questions 1 and 2 above. Also not sure if it is the right data table that I should be using for such simple use.
Seems like you need to learn about JDBC first. Just to clarify connecting to a database inside the IDE is generally used for more development/administrative type duties and you WONT be using it in your Swing program.
So for example you need to load a set of test data to test a function you would typically use either the MySQL workbench or load it via the IDE. However you will not connect this way when you run a program.
What you need to learn is how to connect to a database from a front end, how to execute a query and how to display the query. At this point I would suggest getting a couple of books on JDBC or even doing a google search for JDBC introduction tutorials.
Get to learn JDBC without thinking too much about the front end. Do a couple of examples and then once you are familiar with JDBC then work on the front end.
You might want to spend time on learning basic SQL as well as this will be needed to properly query your database. I am assuming you have not done any SQL.
Here is a reasonably good link to a site with information that you might want to use http://infolab.stanford.edu/~ullman/fcdb/oracle/or-jdbc.html
Just remember the IDE(Netbeans) basically uses JDBC to allow you to connect and manipulate data. So while it is based on JDBC the IDE database explorer is NOT the tool you will be using when programming your swing interface.

Entity Framework 4.1 Custom Database Initializer strategy

I would like to implement a custom database initialization strategy so that I can:
generate the database if not exists
if model change create only new tables
if model change create only new fields without dropping the table and losing the data.
Thanks in advance
You need to implement IDatabaseInitializer interface.
Eg
public class MyInitializer : IDatabaseInitializer<MyDbContext>
{
public void InitializeDatabase(MyDbContext context)
{
//your logic here
}
}
And then set your initializer at your application startup
Database.SetInitializer<ProductCatalog>(new MyInitializer());
Here's an example
You will have to manually execute commands to alter the database.
context.ObjectContext.ExecuteStoreCommand("ALTER TABLE dbo.MyTable ADD NewColumn VARCHAR(20) NULL");
You can use a tool like SQL Compare to script changes.
There is a reason why this doesn't exist yet. It is very complex and moreover IDatabaseInitializer interface is not very prepared for such that (there is no way to make such initialization database agnostic). Your question is "too broad" to be answered to your satisfaction. With your reaction to #Eranga's correct answer you simply expect that somebody will tell you step by step how to do that but we will not - that would mean we will write the initializer for you.
What you need to do what you want?
You must have very good knowledge of SQL Server. You must know how does SQL server store information about database, tables, columns and relations = you must understand sys views and you must know how to query them to get data about current database structure.
You must have very good knowledge of EF. You must know how does EF store mapping information. You must be able to explore metadata get information about expected tables, columns and relations.
Once you have old database description and new database description you must be able to write a code which will correctly explore changes and create SQL DDL commands for changing your database. Even this look like the simplest part of the whole process this is actually the hardest one because there are many other internal rules in SQL server which cannot be violated by your commands. Sometimes you really need to drop table to make your changes and if you don't want to lose data you must first push them to temporary table and after recreating table you must push them back. Sometimes you are doing changes in constraints which can require temporarily turning constrains off, etc. There is good reason why tools which do this on SQL level (comparing two databases) are probably all commercial.
Even ADO.NET team doesn't implemented this and they will not implement it in the future. Instead they are working on something called migrations.
Edit:
That is true that ObjectContext can return you script for database creation - that is exactly what default initializers are using. But how it could help you? Are you going to parse that script to see what changed? Are you going to execute that script in another connection to use the same code as for current database to see its structure?
Yes you can create a new database, move data from the old database to a new one, delete the old one and rename a new one but that is the most stupid solution you can ever imagine and no database administrator will ever allow that. Even this solution still requires analysis of changes to create correct data transfer scripts.
Automatic upgrade is a wrong way. You should always prepare upgrade script manually with help of some tools, test it and after that execute it manually or as part of some installation script / package. You must also backup your database before you are going to do any changes.
The best way to achieve this is probably with migrations:
http://nuget.org/List/Packages/EntityFramework.SqlMigrations
Good blog posts here and here.