Are Concurrent SQL inserts into the same table transactionally safe? - mysql

I have a simple table in MySql whose raison-d'ĂȘtre is to store logs.
The table has an autoincremented sequence and all the other columns has zero referential integrity to other tables. There are no unique keys or indexes on any columns. The column with autoincrement is the primary key.
Will concurrent INSERTs ever interfere with each other ? I define interference to mean losing data.
I am using autocommit=true for this insert.

You'll never lose data just because you do simultaneous inserts. If you use transactions you might "lose" some IDs - but no actual data. (Imagine that you start a transaction, insert a few rows and then do a rollback. InnoDB will have allocated the auto_increment IDs , but there are no rows with those IDs because you did the rollback).
Since you don't need indexes, you should have a look at the ARCHIVE table engine. It's amazingly insanely fast -- and your tables gets much smaller which in turn makes the table scans when you read the table later MUCH faster.

From the MySQL manual for the MyISAM storage engine:
"MyISAM supports concurrent inserts..."

Yes. For InnoDB , more information here

Related

Create index and then insert or insert and then create index?

I'm inserting a big volume of data in a table in Mysql, I need to create an index to access quickly to the data, however, I would like to know if there is a difference (in performance) between these scenarios:
Create an index and then insert all data
Insert all data and then create an index
thanks in advance!
For InnoDB storage engine, for the cluster index, it will be faster to specify the cluster index (i.e. PRIMARY KEY) on the table before inserting data.
This is because if a cluster index (PRIMARY KEY) is not defined on the table, then InnoDB will use a hidden 6-byte auto-incremented counter for the cluster index. If a PRIMARY KEY is later specified, the entire table will need to be rebuilt.
For secondary indexes (i.e. non-cluster indexes) with InnoDB, it is usually faster to insert data without secondary indexes defined, and then build the secondary indexes after the data is loaded.
FOLLOWUP
As far as the speed of loading to a table (in particular, a table that is truncated/emptied, and then reloaded), dropping and re-creating indexes is a well known technique for speeding up processing, not just with MySQL, but with other RDBMS such as Oracle.)
There isn't a guarantee that the processing will be faster; as with most things database, we need tests to determine which is faster.
For a table containing millions of rows, and we're adding a couple dozen hundred rows, then dropping and rebuilding indexes is likely going to be a lot slower, because of all of the extra work to re-index all of the existing rows. It would be faster to do the index maintenance while the rows are being inserted.
In terms of speeding up a load, the "drop and recreate indexes" technique isn't going to give us the kind of dramatic improvements we get from other changes. For example, it won't be anywhere near the improvement we would see by using LOAD DATA in place of INSERT statements, nor using multi-row INSERT statements vs a series of singleton INSERT statements.

Does InnoDB lock the whole table for a delete which uses part of a composed key?

I have a MySQL table (call it 'my_table') with a composed primary key with 4 columns (call them 'a', 'b', 'c' and 'd').
At least one time I encountered a deadlock on parallel asynchronous EJB calls calling 'DELETE FROM my_table where a=? and b=?' with different values, so I started to look into how InnoDB table locking works.
I've found no clear documentation on how table locking works with composed keys. Is the whole table locked by the delete, despite the fact that there's no overlap among the actual rows being deleted?
Do I need to do a select to recover the values for c and d and delete batches using the whole primary key?
This is in the context of a complex application which works with 4 different databases. Only MySQL seems to have this issue.
InnoDB never locks the entire table for DML statements. (Unless the DML is hitting all rows.)
There are other locks for DDL statements, such as when ALTER TABLE is modifying/adding columns/indexes/etc. (Some of these have been greatly sped up in MySQL 8.0.)
There is nothing special about a composite key wrt locking.
There is a thing called a "gap lock". For various reasons, the "gap" between two values in the index will be locked. This prevents potential conflicts such as inserting the same new value that does not yet exist, and there is a uniqueness constraint.
Since the PRIMARY KEY is a unique key, you may have hit something like that.
If practical, do SHOW ENGINE INNODB STATUS; to see whether the lock is "gap" or not.
Another thing that can happen is that a lock can start out being weak, then escalate to "eXclusive". This can lead to a deadlock.
Do I need to do a select to recover the values for c and d and delete batches using the whole primary key?
I think you need to explain more precisely what you are doing. Provide the query. Provide SHOW CREATE TABLE.
InnoDB's lock handling is possibly unique to MySQL. It has some quirks. Sometimes it is a bit greedy about what it locks; to compensate, it is possibly faster than the competition.
In any case, check for deadlocks (and timeouts) and deal with them. The hope that these problems are rare enough that having to deal with them is not too much a performance burden.
DELETE FROM my_table where a=? and b=? means that potentially a large number of rows are being deleted. That means that the undo log and MVCC need to do a lot of work. Hence, I recommend trying not to delete (or update) more than 1K rows at a time.

Why my mysql table has to optimize frequently

I have a mysql table with 12 columns, one primary key and two unique key. I have more or less 86000 rows/records in this table.
I use this mysql code:
INSERT INTO table (col2,col3,-------col12) VALUES ($val2,$val3,----------$val12) ON DUPLICATE KEY UPDATE col2=VALUES($val2), col3=VALUES($val3),----------------col12=VALUES($val12)
When I view the structure of this table from cpanel phpmyadmin, I can see 'Optimize Table' link just below the index information of the table. If I click the link, the table is optimized.
But my question is why I see the 'optimize table' link so frequently (within 3/4 days, it appears) in this table, while the other tables of this database do not show the optimize table link (They show the link once in a month or even once in every two months or more).
As I am not deleting this table row, just inserting and if duplicate key found, just updating, then why optimization is required so frequently?
Short Answer: switch to Innodb
MyISAM storage engine uses BTree for indexes and creates index files. Every time you insert a lot of data this indexes are changed and that is why you need to optimize your table to reorganize the indexes and regain some space.
MyISAM's indexing mechanism takes much more space compared to Innodb.
Read the link below
http://www.mysqlperformanceblog.com/2010/12/09/thinking-about-running-optimize-on-your-innodb-table-stop/
There are a lot of other advantages to Innodb over MyISAM but that is another topic.
I will explain how inserting records affects a MyISAM table and explain what optimizing does, so you'll understand why inserting records has such a large effect.
Data
With MyISAM, when you insert records, data is simply appended to the end of the data file.
Running optimize on a MyISAM table defrags the data, physically reordering it to match the order of the primary key index. This speeds up sequential record reads (and table scans).
Indexes
Inserting records also adds leaves to the B-Tree nodes in the index. If a node fills up, it must be split, in effect rebuilding at least that page of the index.
When optimizing a MyISAM table, the indexes are flattened out, allowing room for more expansion (insertion) before having to rebuild an index page. This flatter index also speeds searches.
Statistics
MySQL also stores statistics for each index about key distribution, and the query optimizer uses this information to help develop a good execution plan. Inserting (or deleting) many records causes these statistics to become out of date.
Optimizing MySQL recalculates the statistics for the table after the defragging and rebuilding of the indexes.
vs. Appending
When you are appending data (adding a record with a higher primary key value such as with auto_increment), that data will not need to be later defragged since it will already be in the proper physical order. Also, when appending (inserting sequentially) into an index, the nodes are kept flat, so there's no rebuilding to be done there either.
vs. InnoDB
InnoDB suffers from the same issues when inserting, but since data is kept in order by primary key due to its clustered index, you take the hit up front (at the time it's inserted) for keeping the data in order, rather than having to defrag it later. Still, optimizing InnoDB does optimize the data by flattening out the B-tree nodes and freeing up unused (deleted) keys, which improves sequential reads (table scans), and secondary indexes are similar to indexes in MyISAM, so they get rebuilt to flatten them out.
Conclusion
I'm not trying to make a case to stick with MyISAM. InnoDB has superior read performance due to the clustered indexes, and better update and append performance due to the record level locking versus MyISAM's table locking (assuming concurrent users). Also, InnoDB has ACID.
Still, my goal was to answer your direct question and provide some technical details rather than conjecture and hearsay.
Neither database storage engine automatically optimizes itself.

Is forking possible w/ InnoDB & unique records?

I am considering moving my MyISAM table to InnoDB. I have a lot of tables w/ columns set to unique values and I use perl. If I switch to InnoDB (and thus take advantage of row-level locking rather than table-level locking) and use forking, will I encounter problems with duplicate entries? (ie, since I will be inserting many rows simultaneously into the table)
As long as you have UNIQUE indexes in place, no rows violating these constraints will be allowed.
You might however run into some concurrency issues when doing inserts within transactions. If two duplicating rows are inserted in two different, concurrent transactions, one of them will fail to commit.
Uniqueness can be achieved by creating unique indexes. In this case DB engine takes care of it. Also, a proper use of transactions helps you to avoid concurrency issues.

When should you choose to use InnoDB in MySQL?

I am rather confused by the hurt-mongering here.
I know how to do them, see below, but no idea why? What are they for?
create table orders (order_no int not null auto_increment, FK_cust_no int not null,
foreign key(FK_cust_no) references customer(cust_no), primary key(order_no)) type=InnoDB;
create table orders (order_no int not null auto_increment, FK_cust_no int not null,
foreign key(FK_cust_no) references customer(cust_no), primary key(order_no));
InnoDB is a storage engine in MySQL. There are quite a few of them, and they all have their pros and cons. InnoDB's greatest strengths are:
Support for transactions (giving you support for the ACID property).
Row-level locking. Having a more fine grained locking-mechanism gives you higher concurrency compared to, for instance, MyISAM.
Foreign key constraints. Allowing you to let the database ensure the integrity of the state of the database, and the relationships between tables.
Always. Unless you need to use MySQL's full-text search or InnoDB is disabled in your shared webhost.
InnoDB:
The InnoDB storage engine in MySQL.
InnoDB is a high-reliability and high-performance storage engine for MySQL. Key advantages of InnoDB include:
Its design follows the ACID model, with transactions featuring commit, rollback, and crash-recovery capabilities to protect user data.
Row-level locking (without escalation to coarser granularity locks) and Oracle-style consistent reads increase multi-user concurrency and performance.
InnoDB tables arrange your data on disk to optimize common queries based on primary keys. Each InnoDB table has a primary key index called the clustered index that organizes the data to minimize I/O for primary key lookups
To maintain data integrity, InnoDB also supports FOREIGN KEY referential-integrity constraints.
You can freely mix InnoDB tables with tables from other MySQL storage engines, even within the same statement. For example, you can use a join operation to combine data from InnoDB and MEMORY tables in a single query.
InnoDB Limitations:
No full text indexing (Below-5.6 mysql version)
Cannot be compressed for fast, read-only
More Details:
Refer this link
I think you are confused about two different issues, when to use InnoDB instead of MyISAM, and when to use foreign key (FK) constraints.
As for the first issue, there have been many answers that do a great job of explaining the differences between MyISAM and InnoDB. I'll just reiterate that, in tvanfosson's quote from an article, MyISAM is better suited for system with mostly reads. This is because it uses table level locking instead of row level like InnoDB, so MyISAM can't handle high concurrency as well, plus it's missing features that help with data integrity such as transactions and foreign keys (again, already mentioned by others).
You don't have to use FK constraints in your data model. If you know what the relationships between your tables is, and your application is free of bugs, then you'll get by without FKs just fine. However, using FKs gives you extra insurance at the database layer because then MySQL won't let your application insert bad data based on the constraints that you created.
In case you aren't clear on why to use primary keys (PK), making a column such as id_order for example the PK of the orders table means that MySQL won't let you INSERT the same value of id_order more than once because every row in a PK column must be unique.
A FK would be used on a table that has a dependency on another table, for example, order_items would have a dependency on orders (below). id_order_items is the PK of order_items and you could make id_order_items the FK of the orders table to establish a one-to-many relationship between orders and order_items. Likewise, id_item could be a FK in the order_items table and a PK in the items table to establish a one-to-many relationship between order_items and items.
**Then, what the FK constraint does is prevent you from adding an id_item value to theorder_itemstable that isn't in theitemstable, or from adding aid_order_itemstoordersthat isn't in theorder_items `table.
All a FK does is insure data integrity, and it also helps convey relationships among your tables to other developers that didn't write the system (and yourself months later when you forget!), but mainly it's for data integrity.**
Extra credit: so why use transactions? Well you already mentioned a quote that says that they are useful for banking system, but they are useful in way more situations than that.
Basically, in a relational database, especially if it's normalized, a routine operation such as adding an order, updating an order, or deleting an order often touches more than 1 table and/or involves more than one SQL statement. You could even end up touching the same table multiple times (as the example below does). Btw Data Manipulation Language (DML) statements (INSERT/UPDATE/DELETE) only involve one table at a time.
An example of adding an order:
I recommend an orders table and an order_items table. This makes it so can you can have a PK on the id_order in the orders table, which means id_order cannot be repeated in orders. Without the 1-to-many orders - order_items relationship, you'd have to have multiple rows in the orders table for every order that had multiple items associated with it (you need an items table too for this e-commerce system btw). This example is going to add an order and touch 2 tables in doing so with 4 different INSERT statements.
(no key constraints for illustration purposes)
-- insert #1
INSERT INTO orders (id_order, id_order_items, id_customer)
VALUES (100, 150, 1)
-- insert #2
INSERT INTO order_items (id_order_items, id_item)
VALUES (4, 1)
-- insert #3
INSERT INTO order_items (id_order_items, id_item)
VALUES (4, 2)
-- insert #4
INSERT INTO order_items (id_order_items, id_item)
VALUES (4, 3)
So what if the insert #1 and insert #2 queries run successfully, but the insert #3 statement did not? You'd end up with an order that was missing an item, and that would be garbage data. If in that case, you want to rollback all the queries so the database is in the same state it was before adding the order and then start over, well that's exactly what transactions are for. **You group together queries that you either want all of them done, or in case of an exception, then none at all, into a transaction.
So like PK/FK constraints, transactions help insure data integrity.**
In your example, you create foreign keys. Foreign keys are only supported for InnoDB tables, not for MyISAM tables.
You may be interested in this article from Database Journal which discusses the InnoDB table type in MySQL.
Excerpt:
Last month we looked at the HEAP table
type, a table type which runs entirely
in memory. This month we look at
setting up the InnoDB table type, the
type of most interest to serious
users. The standard MyISAM table type
is ideal for website use, where there
are many reads in comparison to
writes, and no transactions. Where
these conditions do not apply (and
besides websites, they do not apply
often in the database world), the
InnoDB table is likely to be the table
type of choice. This article is aimed
at users who are familiar with MySQL,
but have only used the default MyISAM
table type.
I wouldn't be put off by the other question. Keep proper backups of your database, of any type -- and don't drop tables by accident ;-) -- and you'll be ok whatever table type you choose.
In general for me the most important point is that InnoDB offers per row locking, while MyISAM does look per table. On big tables with a lot of writes this might make a big performance issue.
On the otherhand MyISAM table have a easier file structure, copying and repairing table on file level is way easier.
A comment has a command to convert your databases to InnoDB here.
Everywhere! Deprecate myisam, innodb is the way to go. Is not only about performance, but data integrity and acid transactions.
A supplement to Machine and knoopx's answer about transactions:
The default MySQL
table type, MyISAM, does not support
transactions. BerkeleyDB and InnoDB
are the transaction-safe table types
available in open source MySQL,
version 3.23.34 and greater.
The definition of the transaction and an banking example
A transaction is a sequence of
individual database operations that
are grouped together. -- A good
example where transactions are useful
is in banking.
Source of the citations