Approaches in managing data from one database to another - mysql

There are two databases, MAIN and TEMP, used in a website. TEMP database is used to manage data fetched from MAIN for insert/update and on publishing the data moved back to MAIN database. What can be the approaches for error handling while publishing ?
I think of below two approaches :
Rollback script - if error occurred while insert/update then the rollback can help.
Third DB Concept - Introduce a third database same as MAIN and first use this database for insert/update and if it result success then execute the same commands to MAIN database otherwise no need to update MAIN database.
I am not sure which approach is better among the two. Can there be any other approach?
Suggestions are really helpful.

Use a transaction to move/update the data from TEMP to MAIN. You either want it to work, or not, right? Presumably leaving the state in TEMP if it doesn't?
The only case I can see where you might want to do anything different is if you deliberately want to NOT leave the data in TEMP if a publish fails (if for example there's no sensible way to follow such a case up), in which case you could consider having 2 transactions, one that removes it from TEMP followed by a second one that adds it back to MAIN only if the first succeeds, and if either of those transactions fails an error is reported and the whole thing has to be restarted again.
Using a third DB doens't help. You could still succeed with the attempt to the third DB and fail with MAIN, and keeping the third DB up to date with MAIN means you immediately double all your work.

Related

Locking a table with rails

I've got a table that I want to run a pretty long running migration on (~20 mins). During this time the contents of the table should not be changed at all. However, the rails frontend to this table (and many others) will remain up while the migration is running and there is a very real chance that someone will try to modify some data (it's fine if that call ends up throwing an error though).
We use MySQL and allow for 10 connections in our connection pool. Am I right in assuming that it is not enough to wrap this migration in a transaction, but that I would have to lock down the table itself as well?
If you really want to make sure no modifications at all happen to the table, the safest thing is to lock the table on a mysql level.
If, however, you just want to make sure that no competing writes/overwrites happen, you could also use optimistic locking. One thing to mention is, that this could mean, the import script will complain and some saves might fail, because between read and write the front end might have changed the record.
Assuming that would be okay and you could just repeat those individual writes, this is how it would work:
By convention you have to add an integer column called lock_version to the table in question and then you're magically set in the way we love from rails.
There's a bit more to it which I encourage you to read about in the linked documentation and that we can discuss in the comments if you like.

MySQL table locking for a multi user JSP/Servlets site

Hi I am developing a site with JSP/Servlets running on Tomcat for the front-end and with a MySql db for the backend which is accessed through JDBC.
Many users of the site can access and write to the database at the same time ,my question is :
Do i need to explicitly take locks before each write/read access to the db in my code?
OR Does Tomcat handle this for me?
Also do you have any suggestions on how best to implement this ? I have written a significant amount of JDBC code already without taking the locks :/
I think you are thinking about transactions when you say "locks". At the lowest level, your database server already ensure that parallel read writes won't corrupt your tables.
But if you want to ensure consistency across tables, you need to employ transactions. Simply put, what transactions provide you is an all-or-nothing guarantee. That is, if you want to insert a Order in one table and related OrderItems in another table, what you need is an assurance that if insertion of OrderItems fails (in step 2), the changes made to Order tables (step 1) will also get rolled back. This way you'll never end up in a situation where an row in Order table have no associated rows in Order items.
This, off-course, is a very simplified representation of what a transaction is. You should read more about it if you are serious about database programming.
In java, you usually do transactions by roughly with following steps:
Set autocommit to false on your jdbc connection
Do several insert and/or updates using the same connection
Call conn.commit() when all the insert/updates that goes together are done
If there is a problem somewhere during step 2, call conn.rollback()

How to separate these two processes?

I have an asp.net website that stores events inside a database table. Then I have a windows service app that reads those events and performs appropriate actions. Currently its possible for the two processes to insert and remove records from the same table at the same time.
What is a better pattern for developing such a system so to insure the two are never working on the same table simultaneously?
I'm not sure about pattern but I'd do a WCF-service and let both use that to access the data. Then share a common lock object between all methods that alter (or read) the table contents.
For this scenario I use a pattern in that ensures that the data cannot be updated concurrently.
I always add a special column to the table, usually 'LastModified' of type 'timestamp'. When adding or inserting a row I always set this column.
When I come to update a record I make sure that the stored procedure checks the value that I am passing in with that stored in the database. If these are different then another user or process has altered this row, and I raise a concurrency error.
This can be propergated up to the calling process or handled in your service.
This could be an architecture problem more than anything else.
Why would you need two processes that delete records?
You generally don't need two different processes to CRUD data in the same tables. One thing you can do is wrap the database/tables with a service, then let all processes that require working with the data use that service. The service can then take care of the serialization of calls. Either way, there will be only 1 process working with the DB directly.
Additionally, it sounds to me like you're in an event-sourcing type of architecture, which makes me wonder why you'd need to delete records in the first place...

MySQL table modified timestamp

I have a test server that uses data from a test database. When I'm done testing, it gets moved to the live database.
The problem is, I have other projects that rely on the data now in production, so I have to run a script that grabs the data from the tables I need, deletes the data in the test DB and inserts the data from the live DB.
I have been trying to figure out a way to improve this model. The problem isn't so much in the migration, since the data only gets updated once or twice a week (without any action on my part). The problem is having the migration take place only when it needs to. I would like to have my migration script include a quick check against the live tables and the test tables and, if need be, make the move. If there haven't been updates, the script quits.
This way, I can include the update script in my other scripts and not have to worry if the data is in sync.
I can't use time stamps. For one, I have no control over the tables on the live side once it goes live, and also because it seems a bit silly to bulk up the tables more for conviencience.
I tried doing a "SHOW TABLE STATUS FROM livedb" but because the tables are all InnoDB, there is no "Update Time", plus, it appears that the "Create Time" was this morning, leading me to believe that the database is backed up and re-created daily.
Is there any other property in the table that would show which of the two is newer? A "Newest Row Date" perhaps?
In short: Make the development-live updating first-class in your application. Instead of depending on the database engine to supply you with the necessary information to enable you to make a decision (to update or not to update ... that is the question), just implement it as part of your application. Otherwise, you're trying to fit a round peg into a square hole.
Without knowing what your data model is, and without understanding at all what your synchronization model is, you have a few options:
Match primary keys against live database vs. the test database. When test > live IDs, do an update.
Use timestamps in a table to determine if it needs to be updated
Use the md5 hash of a database table and modification date (UTC) to determine if a table has changed.
Long story short: Database synchronization is very hard. Implement a solution which is specific to your application. There is no "generic" solution which will work ideally.
If you have an autoincrement in your tables, you could compare the maximum autoincrement values to see if they're different.
But which version of mysql are you using?
Rather than rolling your own, you could use a preexisting solution for keeping databases in sync. I've heard good things about SQLYog's SJA (see here). I've never used it myself, but I've been very impressed with their other programs.

What is the best way to update (or replace) an entire database table on a live machine?

I'm being given a data source weekly that I'm going to parse and put into a database. The data will not change much from week to week, but I should be updating the database on a regular basis. Besides this weekly update, the data is static.
For now rebuilding the entire database isn't a problem, but eventually this database will be live and people could be querying the database while I'm rebuilding it. The amount of data isn't small (couple hundred megabytes), so it won't load that instantaneously, and personally I want a bit more of a foolproof system than "I hope no one queries while the database is in disarray."
I've thought of a few different ways of solving this problem, and was wondering what the best method would be. Here's my ideas so far:
Instead of replacing entire tables, query for the difference between my current database and what I want to place in the database. This seems like it could be an unnecessary amount of work, though.
Creating dummy data tables, then doing a table rename (or having the server code point towards the new data tables).
Just telling users that the site is going through maintenance and put the system offline for a few minutes. (This is not preferable for obvious reasons, but if it's far and away the best answer I'm willing to accept that.)
Thoughts?
I can't speak for MySQL, but PostgreSQL has transactional DDL. This is a wonderful feature, and means that your second option, loading new data into a dummy table and then executing a table rename, should work great. If you want to replace the table foo with foo_new, you only have to load the new data into foo_new and run a script to do the rename. This script should execute in its own transaction, so if something about the rename goes bad, both foo and foo_new will be left untouched when it rolls back.
The main problem with that approach is that it can get a little messy to handle foreign keys from other tables that key on foo. But at least you're guaranteed that your data will remain consistent.
A better approach in the long term, I think, is just to perform the updates on the data directly (your first option). Once again, you can stick all the updating in a single transaction, so you're guaranteed all-or-nothing semantics. Even better would be online updates, just updating the data directly as new information becomes available. This may not be an option for you if you need the results of someone else's batch job, but if you can do it, it's the best option.
BEGIN;
DELETE FROM TABLE;
INSERT INTO TABLE;
COMMIT;
Users will see the changeover instantly when you hit commit. Any queries started before the commit will run on the old data, anything afterwards will run on the new data. The database will actually clear the old table once the last user is done with it. Because everything is "static" (you're the only one who ever changes it, and only once a week), you don't have to worry about any lock issues or timeouts. For MySQL, this depends on InnoDB. PostgreSQL does it, and SQL Server calls it "snapshotting," and I can't remember the details off the top of my head since I rarely use the thing.
If you Google "transaction isolation" + the name of whatever database you're using, you'll find appropriate information.
We solved this problem by using PostgreSQL's table inheritance/constraints mechanism.
You create a trigger that auto-creates sub-tables partitioned based on a date field.
This article was the source I used.
Which database server are you using? SQL 2005 and above provides a locking method called "Snapshot". It allows you to open a transaction, do all of your updates, and then commit, all while users of the database continue to view the pre-transaction data. Normally, your transaction would lock your tables and block their queries, but snapshot locking would be perfect in your case.
More info here: http://blogs.msdn.com/craigfr/archive/2007/05/16/serializable-vs-snapshot-isolation-level.aspx
But it requires SQL Server, so if you're using something else....
Several database systems (since you didn't specify yours, I'll keep this general) do offer the SQL:2003 Standard statement called MERGE which will basically allow you to
insert new rows into a target table from a source which don't exist there yet
update existing rows in the target table based on new values from the source
optionally even delete rows from the target that don't show up in the import table anymore
SQL Server 2008 is the first Microsoft offering to have this statement - check out more here, here or here.
Other database system probably will have similar implementations - it's a SQL:2003 Standard statement after all.
Marc
Use different table names(mytable_[yyyy]_[wk]) and a view for providing you with a constant name(mytable). Once a new table is completely imported update your view so that it uses that table.