Nested transaction rollback between two savepoints? - mysql

For the three SQL types (MySql, SQLite and PostgreSQL) I want / need to handle save points identically.
Now I have my application to change different entries in the database in one big transaction and need some nested transactions for special behavior of the program.
So the question is, if i create something like:
BEGIN TRANSACTION;
--random insert/update statements
SET SAVEPOINT sp1;
--more random inserts/updates
SET SAVEPOINT sp2;
--inserts n stuff
(yes the syntax may not be correct, its just an example)
So i want to know if it is possible to do a rollback between the two save points sp1 and sp2 without rolling back the inserts/updates after sp2?

Savepoints will not do what you want. When you roll back to a savepoint, everything after that savepoint is rolled back, irrespective of whether later savepoints were created.
Think of savepoints like a "stack". You can't pull something out of the middle of the stack, you have to remove everything down to the layer you want.
You are probably looking for autonomous transactions. None of the databases you want to use support them. In PostgreSQL you can work around this using the dblink module to make a new connection to the database and do work with it; see http://www.postgresql.org/docs/current/static/dblink.html . I don't know what solutions MySQL or SQLite offer, but Google will help now that you know the term you are looking for.
I recommend that you find a way to work around this application design requirement if possible. Have your application use two database connections and two transactions to do what you need, taking care of co-ordinating the two as required.

Related

MySQL query synchronization/locking question

I have a quick question that I can't seem to find online, not sure I'm using the right wording or not.
Do MySql database automatically synchronize queries or coming in at around the same time? For example, if I send a query to insert something to a database at the same time another connection sends a query to select something from a database, does MySQL automatically lock the database while the insert is happening, and then unlock when it's done allowing the select query to access it?
Thanks
Do MySql databases automatically synchronize queries coming in at around the same time?
Yes.
Think of it this way: there's no such thing as simultaneous queries. MySQL always carries out one of them first, then the second one. (This isn't exactly true; the server is far more complex than that. But it robustly provides the illusion of sequential queries to us users.)
If, from one connection you issue a single INSERT query or a single UPDATE query, and from another connection you issue a SELECT, your SELECT will get consistent results. Those results will reflect the state of data either before or after the change, depending on which query went first.
You can even do stuff like this (read-modify-write operations) and maintain consistency.
UPDATE table
SET update_count = update_count + 1,
update_time = NOW()
WHERE id = something
If you must do several INSERT or UPDATE operations as if they were one, you'll need to use the InnoDB engine, and you'll need to use transactions. The transaction will block SELECT operations while it is in progress. Teaching you to use transactions is beyond the scope of a Stack Overflow answer.
The key to understanding how a modern database engine like InnoDB works is Multi-Version Concurrency Control or MVCC. This is how simultaneous operations can run in parallel and then get reconciled into a consistent "view" of the database when fully committed.
If you've ever used Git you know how you can have several updates to the same base happening in parallel but so long as they can all cleanly merge together there's no conflict. The database works like that as well, where you can begin a transaction, apply a bunch of operations, and commit it. Should those apply without conflict the commit is successful. If there's trouble the transaction is rolled back as if it never happened.
This ability to juggle multiple operations simultaneously is what makes a transaction-capable database engine really powerful. It's an important component necessary to meet the ACID standard.
MyISAM, the original engine from MySQL 3.0, doesn't have any of these features and locks the whole database on any INSERT operation to avoid conflict. It works like you thought it did.
When creating a database in MySQL you have your choice of engine, but using InnoDB should be your default. There's really no reason at all to use MyISAM as any of the interesting features of that engine (e.g. full-text indexes) have been ported over to InnoDB.

MySQL Transaction

It could be a dumb question, and tried to search for it and found nothing.
I been using mysql for years(not that to long) but i never had tried mysql transactions.
Now my question is, what would happen if i issue an insert or delete statement from multiple clients using transactions? does it would lock the table and prevent other client to perform there query?
what would happen if other client issue a transaction query while the other client still have unfinished transaction?
I appreciate for any help will come.
P.S. most likely i will use insert using a file or csv it could be a big chunk of data or just a small one.
MySQL automatically performs locking for single SQL statements to keep clients from interfering with each other, but this is not always sufficient to guarantee that a database operation achieves its intended result, because some operations are performed over the course of several statements. In this case, different clients might interfere with each other.
Source: http://www.informit.com/articles/article.aspx?p=2036581&seqNum=12

Is it possible to exclude certain queries/procedures from transaction rollbacks in MySQL?

The Setup
While working on some rather complex procedures I've started logging debug information into a _debug table, via a stored logging procedure: P_Log('message'), which just calls a simple INSERT query into the _debug table.
The complex procedures contain transactions, which are rolled back if an error is encountered. The problem is that any debug information that was logged during the course of the transaction is also rolled back. This is of course a little counter productive, since you want to be able to see the debug logs precisely when the procedure -does- fail.
The Question
Is there any way I can insert into _debug without having the inserts rolled back? The log is really only to be used in development, and I would only ever write to it, so I don't care if it would violate how transactions are intended to be used.
And just out of curiosity, how is this normally handled? it seems like being able to write arbitrary log information from inside transactions, to check states of variables, etc, regardless of said transactions being rolled back, would be absolutely crucial for debugging errors. What's the best practice here?
Possible alternatives
storing logs in variables and only writing them at the end of the procedure.
the problem with this is that I want to be able to insert an arbitrary number of debug entries. creating a text variable and parcing that later would work, but seems very hacky.
Using some built-in log in mysql
I'd actually fine with this, if it means I can write arbitrary text to it at will, but I haven't been able to find something like this so far.
The simplest way would be to change your logs table to MyISAM.
It does not support transactions and will completely ignore them. Also MyISAM is a bit faster when you only insert and select from it.
The only other solution that I know of is to create a separate connection for the logs.

Copying data from PostgreSQL to MySQL

I currently have a PostgreSQL database, because one of the pieces of software we're using only supports this particular database engine. I then have a query which summarizes and splits the data from the app into a more useful format.
In my MySQL database, I have a table which contains an identical schema to the output of the query described above.
What I would like to develop is an hourly cron job which will run the query against the PostgreSQL database, then insert the results into the MySQL database. During the hour period, I don't expect to ever see more than 10,000 new rows (and that's a stretch) which would need to be transferred.
Both databases are on separate physical servers, continents apart from one another. The MySQL instance runs on Amazon RDS - so we don't have a lot of control over the machine itself. The PostgreSQL instance runs on a VM on one of our servers, giving us complete control.
The duplication is, unfortunately, necessary because the PostgreSQL database only acts as a collector for the information, while the MySQL database has an application running on it which needs the data. For simplicity, we're wanting to do the move/merge and delete from PostgreSQL hourly to keep things clean.
To be clear - I'm a network/sysadmin guy - not a DBA. I don't really understand all of the intricacies necessary in converting one format to the other. What I do know is that the data being transferred consists of 1xVARCHAR, 1xDATETIME and 6xBIGINT columns.
The closest guess I have for an approach is to use some scripting language to make the query, convert results into an internal data structure, then split it back out to MySQL again.
In doing so, are there any particular good or bad practices I should be wary of when writing the script? Or - any documentation that I should look at which might be useful for doing this kind of conversion? I've found plenty of scheduling jobs which look very manageable and well-documented, but the ongoing nature of this script (hourly run) seems less common and/or less documented.
Open to any suggestions.
Use the same database system on both ends and use replication
If your remote end was also PostgreSQL, you could use streaming replication with hot standby to keep the remote end in sync with the local one transparently and automatically.
If the local end and remote end were both MySQL, you could do something similar using MySQL's various replication features like binlog replication.
Sync using an external script
There's nothing wrong with using an external script. In fact, even if you use DBI-Link or similar (see below) you probably have to use an external script (or psql) from a cron job to initiate repliation, unless you're going to use PgAgent to do it.
Either accumulate rows in a queue table maintained by a trigger procedure, or make sure you can write a query that always reliably selects only the new rows. Then connect to the target database and INSERT the new rows.
If the rows to be copied are too big to comfortably fit in memory you can use a cursor and read the rows with FETCH, which can be helpful if the rows to be copied are too big to comfortably fit in memory.
I'd do the work in this order:
Connect to PostgreSQL
Connect to MySQL
Begin a PostgreSQL transaction
Begin a MySQL transaction. If your MySQL is using MyISAM, go and fix it now.
Read the rows from PostgreSQL, possibly via a cursor or with DELETE FROM queue_table RETURNING *
Insert them into MySQL
DELETE any rows from the queue table in PostgreSQL if you haven't already.
COMMIT the MySQL transaction.
If the MySQL COMMIT succeeded, COMMIT the PostgreSQL transaction. If it failed, ROLLBACK the PostgreSQL transaction and try the whole thing again.
The PostgreSQL COMMIT is incredibly unlikely to fail because it's a local database, but if you need perfect reliability you can use two-phase commit on the PostgreSQL side, where you:
PREPARE TRANSACTION in PostgreSQL
COMMIT in MySQL
then either COMMIT PREPARED or ROLLBACK PREPARED in PostgreSQL depending on the outcome of the MySQL commit.
This is likely too complicated for your needs, but is the only way to be totally sure the change happens on both databases or neither, never just one.
BTW, seriously, if your MySQL is using MyISAM table storage, you should probably remedy that. It's vulnerable to data loss on crash, and it can't be transactionally updated. Convert to InnoDB.
Use DBI-Link in PostgreSQL
Maybe it's because I'm comfortable with PostgreSQL, but I'd do this using a PostgreSQL function that used DBI-link via PL/Perlu to do the job.
When replication should take place, I'd run a PL/PgSQL or PL/Perl procedure that uses DBI-Link to connect to the MySQL database and insert the data in the queue table.
Many examples exist for DBI-Link, so I won't repeat them here. This is a common use case.
Use a trigger to queue changes and DBI-link to sync
If you only want to copy new rows and your table is append-only, you could write a trigger procedure that appends all newly INSERTed rows into a separate queue table with the same definition as the main table. When you want to sync, your sync procedure can then in a single transaction LOCK TABLE the_queue_table IN EXCLUSIVE MODE;, copy the data, and DELETE FROM the_queue_table;. This guarantees that no rows will be lost, though it only works for INSERT-only tables. Handling UPDATE and DELETE on the target table is possible, but much more complicated.
Add MySQL to PostgreSQL with a foreign data wrapper
Alternately, for PostgreSQL 9.1 and above, I might consider using the MySQL Foreign Data Wrapper, ODBC FDW or JDBC FDW to allow PostgreSQL to see the remote MySQL table as if it were a local table. Then I could just use a writable CTE to copy the data.
WITH moved_rows AS (
DELETE FROM queue_table RETURNING *
)
INSERT INTO mysql_table
SELECT * FROM moved_rows;
In short you have two scenarios:
1) Make destination pull the data from source into its own structure
2) Make source push out the data from its structure to destination
I'd rather try the second one, look around and find a way to create postgresql trigger or some special "virtual" table, or maybe pl/pgsql function - then instead of external script, you'll be able to execute the procedure by executing some query from cron, or possibly from inside postgres, there are some possibilities of operation scheduling.
I'd choose 2nd scenario, because postgres is much more flexible, and manipulating data some special, DIY ways - you will simply have more possibilities.
External script probably isn't a good solution, e.g. because you will need to treat binary data with special care, or convert dates&times from DATE to VARCHAR and then to DATE again. Inside external script, various text-stored data will be probably just strings, and you will need to quote it too.

Which isolation level to use in a basic MySQL project?

Well, I got an assignment [mini-project] in which one of the most important issues is the database consistency.
The project is a web application, which allows multiple users to access and work with it. I can expect concurrent querying and updating requests into a small set of tables, some of them connected one to the other (using FOREIGN KEYS).
In order to keep the database as consistent as possible, we were advised to use isolation levels. After reading a bit (maybe not enough?) about them, I figured the most useful ones for me are READ COMMITTED and SERIALIZABLE.
I can divide the queries into three kinds:
Fetching query
Updating query
Combo
For the first one, I need the data to be consistent of course, I don't want to present dirty data, or uncommitted data, etc. Therefore, I thought to use READ COMMITTED for these queries.
For the updating query, I thought using SERIALIZABLE will be the best option, but after reading a bit, i found myself lost.
In the combo, I'll probably have to read from the DB, and decide whether I need/can update or not, these 2-3 calls will be under the same transaction.
Wanted to ask for some advice in which isolation level to use in each of these query options. Should I even consider different isolation levels for each type? or just stick to one?
I'm using MySQL 5.1.53, along with MySQL JDBC 3.1.14 driver (Requirements... Didn't choose the JDBC version)
Your insights are much appreciated!
Edit:
I've decided I'll be using REPEATABLE READ which seems like the default level.
I'm not sure if it's the right way to do, but I guess REPEATABLE READ along with LOCK IN SHARE MODE and FOR UPDATE to the queries should work fine...
What do you guys think?
I would suggest READ COMMITTED. It seems natural to be able to see other sessions' committed data as soon as they're committed.
Its unclear why MySQL has a default of REPEATABLE READ.
I think you worry too much about the isolation level.
If you have multiple tables to update you need to do:
START TRANSACTION;
UPDATE table1 ....;
UPDATE table2 ....;
UPDATE table3 ....;
COMMIT;
This is the important stuff, the isolation level is just gravy.
The default level of repeatable read will do just fine for you.
Note that select ... for update will lock the table, this can result in deadlocks, which is worse than the problem you may be trying to solve.
Only use this if you are deleting rows in your DB.
To be honest I rarely see rows being deleted in a DB, if you are just doing updates, then just use normal selects.
Anyway see: http://dev.mysql.com/doc/refman/5.0/en/innodb-transaction-model.html