Ensure `INSERT`s are concurrent for a specific MyISAM table? - mysql

I have a MyISAM table that basically contains a log. A cluster of machines does single-record INSERTs on this table at a rate of 50 per second tops, but the same table is also SELECTed from by a web application, and indexed to accommodate for this. There are no UPDATEs or DELETEs, though.
So from what I've gathered, I should be using concurrent inserts. (Right?) MyISAM will normally do this for me without any extra work. (Is this correct?)
But what I can't find is a way to guarantee that a given INSERT is processed concurrently. I know that I can set the global variable concurrent_insert to 2, but I'd rather not set this globally.
So my questions are:
Is there some way I'm missing to guarantee a concurrent insert?
If not, is there a command I can use to see whether a table meets the concurrent insert requirements? (I believe just knowing whether a table has holes should be enough?) Because I will also settle for being able to just monitor the table.
And I'm also curious, is there some other database system that can handle this kind of load better? I'm totally okay with a NoSQL solution, if that happens to be the case. As long as I can talk to it from Ruby and C.

Why don't you want to set concurrent_insert=2 globally? That would give you what you want.
Another option you may want to consider for this type of MyISAM table is INSERT DELAYED:
http://dev.mysql.com/doc/refman/5.1/en/insert-delayed.html

Related

Can I INSERT into table while UPDATING multiple different rows with MariaDB or MySQL?

I am creating a custom analytics system and currently in the database designing process. I'm planning to use MariaDB with the InnoDB engine to be able to handle big loads.
The data I'm expecting could be around 500k clicks/day. I will need to insert these rows into the database, which means that I'll have around 5.8 inserts/sec on average. However, at the same time, I want to record if someone visited a page associated with that click. (basically to record funnels)
So what I'm planning to do is to create additional columns and search for the ID of the specific row then update that column with the exact time of the visit.
My first question: is this generally a recommended approach to design the database like that? If not, how else is it worth to design the database?
My only concern is that while updating rows the Table will be locked, and can't do inserts, therefore slowing down the user experience.
My second question: is this something I should worry about, that the table gets locked while updating, and thus slowing down inserts? Does it hurt performance?
InnoDB doesn't lock the table for insert if you're performing the update. Your users won't experience any weird hanging.
It's an MVCC compliant engine, designed to handle concurrent access to underlying tables.
You can control the engine's behavior by choosing an appropriate isolation level, however the default (REPEATABLE READ) is excellent and does the job more than well.
If a table is being modified by multiple users (not users that connect to your site but connections established towards MySQL via a scripting language or some other service) and there's many inserts/updates/deletes - MySQL can throw an error saying a deadlock occurred.
A deadlock is a warning, not an error, that more than 1 thread tried to access an occupied resource (such as two threads tried to update the same row at the same time, but only 1 will be allowed to do so). It's an indication you should repeat the query.
I'm suggesting that you take care of all possible scenarios in the language of your choice when it comes to handling MySQL that's under heavier I/O.
~6 inserts a second isn't a lot, make sure you're allowing MySQL to access sufficient system resources. For InnoDB, check the value of innodb_buffer_pool_size or google a bit to see what it is and how to use it to make your database run fast.
Good luck!
At a mere 5.6/second, there won't be much problem.
I do, however, suggest vertical partitioning for "Likes", "Upvotes", "Clicks", and similar things. These tend to have a lot of UPDATEs of random single rows, and may interfere with other activity.
That is, have a separate table with (perhaps) just 2 columns:
The id of the item being Liked/Clicked/etc.
A counter.
It is simple enough (and fast enough) to JOIN via that id when you want to display info including the counter.
As already pointed out, the row is locked, not the table.

MySQL query synchronization/locking question

I have a quick question that I can't seem to find online, not sure I'm using the right wording or not.
Do MySql database automatically synchronize queries or coming in at around the same time? For example, if I send a query to insert something to a database at the same time another connection sends a query to select something from a database, does MySQL automatically lock the database while the insert is happening, and then unlock when it's done allowing the select query to access it?
Thanks
Do MySql databases automatically synchronize queries coming in at around the same time?
Yes.
Think of it this way: there's no such thing as simultaneous queries. MySQL always carries out one of them first, then the second one. (This isn't exactly true; the server is far more complex than that. But it robustly provides the illusion of sequential queries to us users.)
If, from one connection you issue a single INSERT query or a single UPDATE query, and from another connection you issue a SELECT, your SELECT will get consistent results. Those results will reflect the state of data either before or after the change, depending on which query went first.
You can even do stuff like this (read-modify-write operations) and maintain consistency.
UPDATE table
SET update_count = update_count + 1,
update_time = NOW()
WHERE id = something
If you must do several INSERT or UPDATE operations as if they were one, you'll need to use the InnoDB engine, and you'll need to use transactions. The transaction will block SELECT operations while it is in progress. Teaching you to use transactions is beyond the scope of a Stack Overflow answer.
The key to understanding how a modern database engine like InnoDB works is Multi-Version Concurrency Control or MVCC. This is how simultaneous operations can run in parallel and then get reconciled into a consistent "view" of the database when fully committed.
If you've ever used Git you know how you can have several updates to the same base happening in parallel but so long as they can all cleanly merge together there's no conflict. The database works like that as well, where you can begin a transaction, apply a bunch of operations, and commit it. Should those apply without conflict the commit is successful. If there's trouble the transaction is rolled back as if it never happened.
This ability to juggle multiple operations simultaneously is what makes a transaction-capable database engine really powerful. It's an important component necessary to meet the ACID standard.
MyISAM, the original engine from MySQL 3.0, doesn't have any of these features and locks the whole database on any INSERT operation to avoid conflict. It works like you thought it did.
When creating a database in MySQL you have your choice of engine, but using InnoDB should be your default. There's really no reason at all to use MyISAM as any of the interesting features of that engine (e.g. full-text indexes) have been ported over to InnoDB.

MySQL Database optimization

I have a table which is frequently updated (insert/delete). I also have a script to periodically count how many records are stored in the table. How can I optimize the performance?
Do nothing: Just use the COUNT function.
Create another field to store the number of records: Whenever a new record's added, we increase that field and vice versa.
If your database's main function is storing (frequently inserting/updating), switch storage engine to InnoDB, which is faster with INSERT and UPDATE queries, but slower with reading.
Read more here, here or here.
Method #2 is pretty much the standard way of doing it (if your table is incredibly huge and COUNT is giving you performance issues). You could also store the COUNT value in a MEMORY table which would make retrieval exceedingly fast.
Increment/decrement as you see fit.
If you need accurate numbers, I would build this into the app the updates the database or use triggers to keep the counts up to date. As others have mentioned, the counts could be kept in a MEMORY table, or a Redis instance if you want performance and persistence. There are counts in the INFORMATION_SCHEMA.TABLES table, but they're not precise for InnoDB (+-20% or more).

mysql deletion efficiency

I have a table with large amount of data. The data need to be updated frequently: delete old data and add new data. I have two options
whenever there is an deletion event, I delete the entry immediately
I marked delete the entries and use an cron job to delete at unpeak time.
any efficiency difference between the two options? or any better solution?
Both delete and update can have triggers, this may affect performance (check if that's your case).
Updating a row is usually faster than deleting (due to indexes etc.)
However, in deleting a single row in an operation, the performance impact shouldn't be that big. If your measurements show that the database spends significant time deleting rows, then your mark-and-sweep approach might help. The key word here is probably measured - unless the single deletes are significantly slower than updates, I wouldn't bother.
You should use low_priority_updates - http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_low_priority_updates. This will give higher priority to your selects than insert/delete/update operations. I used this in production and got a pretty decent speed improvement. The only problem I see with it is that you will lose more data in case of a crashing server.
With MySQL, deletes are simply marked for deletion internally, and when the CPU is (nearly) idle, MySQL then updates the indexes.
Still, if this is a problem, and you are deleting many rows, consider using DELETE QUICK. This tells InnoDB to not update the index, just leave it marked as deleted, so it can be reused.
To recover the unused index space, simply OPTIMIZE TABLE nightly.
In this case, there's no need to implement the functionality in your application that MySQL will do internally.

How to apply an index to a MySQL / MyISAM table without locking?

Having a production table, with a very critical column (date) missing an index, are there any ways to apply said index without user impact?
The table currently gets around 5-10 inserts every second, so full table locking is out; redirecting those inserts to alternative table / database, even temporarily, is also denied (for corporate politics reasons). Any other ways?
As far as I know this is not possible with MyISAM. With 5-10 INSERTs per second you should consider InnoDB anyways, unless you're not reading that much.
Are you using replication, preferable in a Master-Master Setup? (You should!) If that is the case, you could CREATE INDEX on the standby server, switch roles and do the same, then switch back. Be sure to disable replication temporarily (when using master-master) to avoid replicating the CREATE INDEX to the active node.
Depending on whether you use that table primarily to archive Logs or similar, you might aswell look into the Archive Storage engine.