We have just migrated from MySQL to PostgreSQL, a particular row for every minute will be heavily updated. All those period when the product was running in MySQL we had no issues, but when after moving to PostgreSQL we faced so many deadlocks.
Table structure.
Create table tab(col1 int , col2 int , col3 int, PRIMARY KEY(col1));
No index.
Deadlock query -
Update tab set col2=col2+1 where col3=xx;
(yes, there will be more than one row for result).
My question: How has MySQL handled this situation to avoid deadlocks ? (Asking this question assuming that the problem in PostgreSQL with regard to this query is because of getting the rows in different order every time a concurrent update is happening).
I might have faced deadlocks in MySQL also, but definitely it was not to the extent of how it happened with PostgreSQL.
And I have already gone through the question posted in https://dba.stackexchange.com/questions/151813/why-can-mysql-handle-multiple-updates-concurrently-and-postgresql-cant
the answer posted here was not very convincing as the author went all about complaining the update architecture of PostgreSQL and HOT updates.
I want to know the difference in architecture that enabled MySQL to avoid this problem.
At a guess, MySQL (presumably with InnoDB tables) is probably doing the updates in a consistent order each time, while PostgreSQL's access is generally unordered. This makes sense, given that InnoDB uses index-organized tables while PostgreSQL uses heaps.
PostgreSQL unfortunately does not support UPDATE ... ORDER BY. You can take a row-lock before you UPDATE to ensure reliable ordering at the cost of an extra round-trip, e.g.
BEGIN;
SELECT 1 FROM tab WHERE col3 = xx FOR UPDATE;
UPDATE tab SET col2=col2+1 WHERE col3=xx;
COMMIT;
(I'd love to have UPDATE ... ORDER BY support in PostgreSQL. Patches welcome!)
Related
I have a quick question that I can't seem to find online, not sure I'm using the right wording or not.
Do MySql database automatically synchronize queries or coming in at around the same time? For example, if I send a query to insert something to a database at the same time another connection sends a query to select something from a database, does MySQL automatically lock the database while the insert is happening, and then unlock when it's done allowing the select query to access it?
Thanks
Do MySql databases automatically synchronize queries coming in at around the same time?
Yes.
Think of it this way: there's no such thing as simultaneous queries. MySQL always carries out one of them first, then the second one. (This isn't exactly true; the server is far more complex than that. But it robustly provides the illusion of sequential queries to us users.)
If, from one connection you issue a single INSERT query or a single UPDATE query, and from another connection you issue a SELECT, your SELECT will get consistent results. Those results will reflect the state of data either before or after the change, depending on which query went first.
You can even do stuff like this (read-modify-write operations) and maintain consistency.
UPDATE table
SET update_count = update_count + 1,
update_time = NOW()
WHERE id = something
If you must do several INSERT or UPDATE operations as if they were one, you'll need to use the InnoDB engine, and you'll need to use transactions. The transaction will block SELECT operations while it is in progress. Teaching you to use transactions is beyond the scope of a Stack Overflow answer.
The key to understanding how a modern database engine like InnoDB works is Multi-Version Concurrency Control or MVCC. This is how simultaneous operations can run in parallel and then get reconciled into a consistent "view" of the database when fully committed.
If you've ever used Git you know how you can have several updates to the same base happening in parallel but so long as they can all cleanly merge together there's no conflict. The database works like that as well, where you can begin a transaction, apply a bunch of operations, and commit it. Should those apply without conflict the commit is successful. If there's trouble the transaction is rolled back as if it never happened.
This ability to juggle multiple operations simultaneously is what makes a transaction-capable database engine really powerful. It's an important component necessary to meet the ACID standard.
MyISAM, the original engine from MySQL 3.0, doesn't have any of these features and locks the whole database on any INSERT operation to avoid conflict. It works like you thought it did.
When creating a database in MySQL you have your choice of engine, but using InnoDB should be your default. There's really no reason at all to use MyISAM as any of the interesting features of that engine (e.g. full-text indexes) have been ported over to InnoDB.
I've came across the situation, where I need to select huge amount of data (say 100k records which look like ID | {"points":"9","votes":"2","breakdown":"0,0,0,1,1"}), process it in PHP and then put it back. Question is about putting it back efficiently. I saw a solution using INSERT ... ON DUPLICATE KEY UPDATE, I saw a solution with UPDATE using CASE. Are there any other solutions? Which would be the most efficient way to update huge data array?
Better choice is using simple update.
When you try to put data with insert exceptions your DB will do more additional work: try to insert, verify constraints, raise exception, update row, verify constraints again.
Update
Run tests on my local PC for insert into ... ON DUPLICATE KEY UPDATE and UPDATE statements against the table with 43k rows.
the first approach works on 40% faster.
But both worked faster then 1.5s. I suppose, you php code will be bottleneck of your approach and you should not worry about speed of MySQL statements. Of course, it works if you table not huge and does not had dozens millions rows.
Update 2
My local PC uses MySQL 5.6 in default configuration.
CPU: 8Gb
I'm using a 3rd party ETL application (Pentaho/ Kettle/ Spoon) --- so unfortunately I'm not sure of the exact SQL query, but I can try different manual queries.
I'm just wondering why ... MySQL seems to allow multiple processes at once do an insert, but if found, update ... queries.
MS SQL does not ... it "locks" the rows when one query is doing an insert/ update ... and throws an error if another query tries to insert/ update over the same data.
I guess this makes sense ... but I'm just a bit annoyed that MySQL allows this, and MS SQL does not.
Is there any way to get around this?
I just want as fast a way as possible to insert/ update a list of 1000 records into a data table. In the past I just divided this numbers into 20 processes updating 50 records doing insert/ updates ... this worked in parallel because none of the 1000 records are duplicate ... they are only some duplicates of them already in table ... so they can be inserted/ updated in any order, so long as it happens.
Any thoughts? Thanks
MySQL use the ISAM storage engine by default which does not support transactions. SQL Server is a RDBMS and supports transactions as you've observed though you can tweak the isolation levels to do risky things like read uncommitted (very rarely a good idea).
If you want your MySQL database to have transaction support, you need to explicitly create your table with the option ENGINE=INNODB. Older versions also support ENGINE=BDB which is the Berkeley Database engine. See MySQL docs for more details on InnoDB
http://dev.mysql.com/doc/refman/5.7/en/innodb-storage-engine.html
I have 50GB mysql database (80 tables) that I need to delete some contents from it.
I have a reference table that contains list if product ids that needs to be deleted from the the other tables.
Now, the other tables can be 2 GB each, contains the items that needs to be deleted.
My question is: since it is not a small database, what is the safest way to delete
the data in one shot in order to avoid problems?
What is the best method to verify the the entire data was deleted?
Probably this doesn't help anymore. But you should keep this in mind when creating the database. In mysql (depending on the table storage type, for instance in InnoDB) you can specify relations (They are called foreign key constraints). These relations mean that if you delete an entry from one row (for instance products) you can automatically update or delete entries in other tables that have that row as foreign key (such as product_storage). These relations guard that you have a 100% consistent state. However these relations might be hard to add on hindsight. If you plan to do this more often, it is definitely worth researching if you can add these to your database, they will save you a lot of work (all kinds of queries become simpler)
Without these relations you can't be 100% sure. So you'd have to go over all the tables, not which columns you want to check on and write a bunch of sql queries to make sure there are no entries left.
As Thirler has pointed out, it would be nice if you had foreign keys. Without them burnall 's solution can be used to transactions to ensure that no inconsistencies creep.
Regardless of how you do it, this could take a long time, even hours so please be prepared for that.
As pointed out earlier foreign keys would be nice in this place. But regarding question 1 you could perhaps run the changes within a transaction from the MySQL prompt. This assumes you are using a transaction safe storage engine like InnoDB. You can convert from myisam to InnoDB if you need to. Anyway something like this:
START TRANSACTION;
...Perform changes...
...Control changes...
COMMIT;
...or...
ROLLBACK;
Is it acceptable to have any downtime?
When working with PostgreSQL with databases >250Gb we use this technique on production servers in order to perform database changes. If the outcome isn't as expected we just rollback the transaction. Of course there is a penalty as the I/O-system has to work a bit.
// John
I am agree with Thirler that using of foreign keys is preferrable. It guarantees referential integrity and consisitency of the whole database.
I can believe that life sometimes requires more tricky logic.
So you could use manual queries like
delete from a where id in (select id from keys)
You could delete all records at once or by range of keys or using LIMIT in DELETE. Proper index is a must.
To verify consistency you need function or query. For example:
create function check_consistency() returns boolean
begin
return not exists(select * from child where id not in (select id from parent) )
and not exists(select * from child2 where id not in (select id from parent) );
-- and so on
end
Also maybe something to look into is Partitioning in MySQL tables. For more information check out the ref manual:
http://dev.mysql.com/doc/refman/5.1/en/partitioning.html
Comes down that you can divide tables (for example) in different partitions per datetime values or indexsets.
I have a hefty db server with lots of very similar InnoDB databases. A query that I run often simply updates a timestamp on one row in a small table. This takes like 1-2 ms most of the time. Occasionally, at night, probably while backups and maatkit replication tools are running, one or more of these queries may show "Updating" for several minutes. During this time, other queries, including maatkit queries, seem to be proceeding normally, and no other queries seem to be executing. I have been unable to explain or fix this.
We are using mysql 4.1.22 and gentoo 2.6.21 on a pair of 4-way Xeon with 16gig of RAM and RAIDed drives for storage. Replication is in place and operating well with maatkit confirming replication nightly. InnoDB is using most of the RAM and the cpu's are typically 70-80% idle. The table in question has about 100 rows of about 200 bytes each. I've tried with and without an index on the WHERE clause with no discernible change. No unusual log messages have been found (checked system messages and mysql errors).
Has anybody else heard of this? Solved something like this? Any ideas of how to investigate?
When making DML operations, InnoDB places locks on rows and index gaps.
The problem is that it locks all rows examined, not only those affected.
Say, if you run this query:
UPDATE mytable
SET value = 10
WHERE col1 = 1
AND col2 = 2
, the locking will depend on the indexes used for the query:
If an index on col1, col2 was used, then only affected rows will be locked
If an index on col was used, all rows with col1 = 1 will be locked
If an index on col2 was used, all rows with col2 = 2 will be locked
If no index was used, all rows and index gaps will be locked (including that on the PRIMARY KEY, so that even INSERT to an AUTO_INCREMENT column will lock)
To make things worse, EXPLAIN in MySQL does not work on DML operations, so you'll have to guess which index was used, since optimizer can pick any if it considers it to be best.
So it may be so that your replication tools and updates concurrently lock the records (and as you can see this may happen even if the WHERE conditions do not overlap).
If you can get at the server while this query is hanging, try doing a "show innodb status". Part of the mess of data you get from that is the status of all active connections/queries on InnoDB tables. If your query is hanging because of another transaction, it will be indicated in there. There's samples of the lock data here.
As well, you mention that it seems to happen durint backups. Are you using mysqldump for that? That will lock tables while the dump is active so that the dumped data is consistent.
Using some of the information offered in the responses, we continued investigating and found some disturbing behavior on our server. A simple "check table" on any table in any database caused simple update queries to lock in other databases and other tables. I don't have any idea why this would happen, though we could not reproduce it on MySQL v5.1, so we intend to upgrade our database server.
I don't think maatkit's mk-table-checksum does a "check table" but it is having a similar effect. Turning off this script reduced the problems significantly, but we believe that we cannot live without this script.
I'm going to mark this as the answer to my question. Thanks for the help.