Mysql Lock times in slow query log - mysql

I have an application that has been running fine for quite awhile, but recently a couple of items have started popping up in the slow query log.
All the queries are complex and ugly multi join select statements that could use refactoring. I believe all of them have blobs, meaning they get written to disk. The part that gets me curious is why some of them have a lock time associated with them. None of the queries have any specific locking protocols set by the application. As far as I know, by default you can read against locks unless explicitly specified.
so my question: What scenarios would cause a select statement to have to wait for a lock (and thereby be reported in the slow query log)? Assume both INNODB and MYISAM environments.
Could the disk interaction be listed as some sort of lock time? If yes, is there documentation around that says this?
thanks in advance.

MyISAM will give you concurrency problems, an entire table is completely locked when an insert is in progress.
InnoDB should have no problems with reads, even while a write/transaction is in progress due to it's MVCC.
However, just because a query is showing up in the slow-query log doesn't mean the query is slow - how many seconds, how many records are being examined?
Put "EXPLAIN" in front of the query to get a breakdown of the examinations going on for the query.
here's a good resource for learning about EXPLAIN (outside of the excellent MySQL documentation about it)

I'm not certain about MySql, but I know that in SQL Server select statements do NOT read against locks. Doing so will allow you to read uncommitted data, and potentially see duplicate records or miss a record entirely. The reason for this is because if another process is writing to the table, the database engine may decide it's time to reorganize some data and shifts it around on disk. So it moves a record you already read to the end and you see it again, or it moves one from the end up higher where you've already past.
There's a guy on the net somewhere who actually wrote a couple of scripts to prove that this happens and I tried them once and it only took a few seconds before a duplicate showed up. Of course, he designed the scripts in a fashion that would make it more likely to happen, but it proves that it definitely can happen.
This is okay behaviour if your data doesn't need to be accurate and can certainly help prevent deadlocks. However, if you're working on an application dealing with something like people's money then that's very bad.
In SQL Server you can use the WITH NOLOCK hint to tell your select statement to ignore locks. I'm not sure what the equivalent in MySql would be but maybe someone else here will say.

Related

Can I INSERT into table while UPDATING multiple different rows with MariaDB or MySQL?

I am creating a custom analytics system and currently in the database designing process. I'm planning to use MariaDB with the InnoDB engine to be able to handle big loads.
The data I'm expecting could be around 500k clicks/day. I will need to insert these rows into the database, which means that I'll have around 5.8 inserts/sec on average. However, at the same time, I want to record if someone visited a page associated with that click. (basically to record funnels)
So what I'm planning to do is to create additional columns and search for the ID of the specific row then update that column with the exact time of the visit.
My first question: is this generally a recommended approach to design the database like that? If not, how else is it worth to design the database?
My only concern is that while updating rows the Table will be locked, and can't do inserts, therefore slowing down the user experience.
My second question: is this something I should worry about, that the table gets locked while updating, and thus slowing down inserts? Does it hurt performance?
InnoDB doesn't lock the table for insert if you're performing the update. Your users won't experience any weird hanging.
It's an MVCC compliant engine, designed to handle concurrent access to underlying tables.
You can control the engine's behavior by choosing an appropriate isolation level, however the default (REPEATABLE READ) is excellent and does the job more than well.
If a table is being modified by multiple users (not users that connect to your site but connections established towards MySQL via a scripting language or some other service) and there's many inserts/updates/deletes - MySQL can throw an error saying a deadlock occurred.
A deadlock is a warning, not an error, that more than 1 thread tried to access an occupied resource (such as two threads tried to update the same row at the same time, but only 1 will be allowed to do so). It's an indication you should repeat the query.
I'm suggesting that you take care of all possible scenarios in the language of your choice when it comes to handling MySQL that's under heavier I/O.
~6 inserts a second isn't a lot, make sure you're allowing MySQL to access sufficient system resources. For InnoDB, check the value of innodb_buffer_pool_size or google a bit to see what it is and how to use it to make your database run fast.
Good luck!
At a mere 5.6/second, there won't be much problem.
I do, however, suggest vertical partitioning for "Likes", "Upvotes", "Clicks", and similar things. These tend to have a lot of UPDATEs of random single rows, and may interfere with other activity.
That is, have a separate table with (perhaps) just 2 columns:
The id of the item being Liked/Clicked/etc.
A counter.
It is simple enough (and fast enough) to JOIN via that id when you want to display info including the counter.
As already pointed out, the row is locked, not the table.

Extreme low-priority SELECT query in MySQL

Is it possible to issue an (expensive, but low-priority) SELECT query to mySQL in such a way that if an UPDATE query appears in the queue, mySQL will immediately terminate the query, and re-append it to the end of the queue?
If re-appending to the queue is not possible, I'm happy with simply killing the SELECT query.
No, not really.
I am not sure exactly what you need, but my guess is that you need to either optimize the SELECT to not lock an entire table, or get the replication going and do the SELECT on the slave rather than the master.
You could theoretically find out what the MySQL process ID is of the SELECT query, and in your application send a KILL before you do any update.
Well, sort of maybe.
A client runs an application which occasionally throws out queries that completely kill performance for everything else on the server. We have monitoring and if we've got a suitable person ready to react, we can deal to that query manually, and we learn about the problems in the app by doing things that way.
But to prevent major outages if noone is on the ball, we have an automated script which terminates long running queries, so the server does recover in the event that noone is available to intervene within 15 minutes.
Far from ideal, but that's where things are currently at with this project, and it does prevent the occasional extended outages that used to occur. We can only move just so fast with fixing up the problem queries.
Anyway, you could run something similar, that looks at the running queries and recognises when you have an update waiting on one of your large selects, and in that event it kills the select. Doing this sort of check a few times a minute is not overly expensive. I'd want to do a bit of testing before running.
So, whether you can solve your problem this way depends on what your tolerance is for how long an update can be delayed. Running this every minute (as we do) is no problem at all. Running it every second would noticeably add to the overall load. You'd need to test how far you can reasonably go in between those points.
This approach means some delay before the select gets pushed out of the way, but it saves you having to build this logic into potentially many different places in your application.
--
Regarding breaking up your query, you're most likely better off restricting the chunks by id range from one or more tables in your query rather than by offset and limit.
--
There may also be good solutions available based on partitioning your tables so that the queries don't collide as badly. Make sure you have a very good grasp on what you are doing for this though.

How to predict MySQL tipping points?

I work on a big web application that uses a MySQL 5.0 database with InnoDB tables. Twice over the last couple of months, we have experienced the following scenario:
The database server runs fine for weeks, with low load and few slow queries.
A frequently-executed query that previously ran quickly will suddenly start running very slowly.
Database load spikes and the site hangs.
The solution in both cases was to find the slow query in the slow query log and create a new index on the table to speed it up. After applying the index, database performance returned to normal.
What's most frustrating is that, in both cases, we had no warning about the impending doom; all of our monitoring systems (e.g., graphs of system load, CPU usage, query execution rates, slow queries) told us that the database server was in good health.
Question #1: How can we predict these kinds of tipping points or avoid them altogether?
One thing we are not doing with any regularity is running OPTIMIZE TABLE or ANALYZE TABLE. We've had a hard time finding a good rule of thumb about how often (if ever) to manually do these things. (Since these commands LOCK tables, we don't want to run them indiscriminately.) Do these scenarios sound like the result of unoptimized tables?
Question #2: Should we be manually running OPTIMIZE or ANALYZE? If so, how often?
More details about the app: database usage pattern is approximately 95% reads, 5% writes; database executes around 300 queries/second; the table used in the slow queries was the same in both cases, and has hundreds of thousands of records.
The MySQL Performance Blog is a fantastic resource. Namely, this post covers the basics of properly tuning InnoDB-specific parameters.
I've also found that the PDF version of the MySQL Reference Manual to be essential. Chapter 7 covers general optimization, and section 7.5 covers server-specific optimizations you can toy with.
From the sound of your server, the query cache may be of IMMENSE value to you.
The reference manual also gives you some great detail concerning slow queries, caches, query optimization, and even disk seek analysis with indexes.
It may be worth your time to look into multi-master replication, allowing you to lock one server entirely and run OPTIMIZE/ANALYZE, without taking a performance hit (as 95% of your queries are reads, the other server could manage the writes just fine).
Section 12.5.2.5 covers OPTIMIZE TABLE in detail, and 12.5.2.1 covers ANALYZE TABLE in detail.
Update for your edits/emphasis:
Question #2 is easy to answer. From the reference manual:
OPTIMIZE:
OPTIMIZE TABLE should be used if you have deleted a large part of a table or if you have made many changes to a table with variable-length rows. [...] You can use OPTIMIZE TABLE to reclaim the unused space and to defragment the data table.
And ANALYZE:
ANALYZE TABLE analyzes and stores the key distribution for a table. [...] MySQL uses the stored key distribution to decide the order in which tables should be joined when you perform a join on something other than a constant. In addition, key distributions can be used when deciding which indexes to use for a specific table within a query.
OPTIMIZE is good to run when you have the free time. MySQL optimizes well around deleted rows, but if you go and delete 20GB of data from a table, it may be a good idea to run this. It is definitely not required for good performance in most cases.
ANALYZE is much more critical. As noted, having the needed table data available to MySQL (provided with ANALYZE) is very important when it comes to pretty much any query. It is something that should be run on a common basis.
Question #1 is a bit more of a trick. I would watch the server very carefully when this happens, namely disk I/O. My bet would be that your server is thrashing either your swap or the (InnoDB) caches. In either case, it may be query, tuning, or load related. Unoptimized tables could cause this. As mentioned, running ANALYZE can immensely help performance, and will likely help out too.
I haven't found any good way of predicting MySQL "tipping points" -- and I've run into a few.
Having said that, I've found tipping points are related to table size. But not merely raw table size, rather how big the "area of interest" is to a query. For example, in a table of over 3 million rows and about 40 columns, about three-quarters integers, most queries that would easily select a portion of them based on indices are fast. However, when one value in a query on one indexed column means two-thirds of the rows are now "interesting", the query is now about 5-times slower than normal. Lesson: try to arrange your data so such a scan isn't necessary.
However, such behaviour now gives you a size to look for. This size will be heavily dependant on your server setup, the MySQL server variables and the table's schema and data.
Similarly, I've seen reporting queries run in reasonable time (~45 seconds) if the period is two weeks, but take half-an-hour if the period is extended to four weeks.
Use slow query log that will help you to narrow down the queries you want to optimize.
For time critical queries it sometimes better to keep stable plan by using hints.
It sounds like you have a frustrating situation and maybe not the best code review process and development environment.
Whenever you add a new query to your code you need to check that it has the appropriate indexes ready and add those with the code release.
If you don't do that your second option is to constantly monitor the slow query log and then go beat the developers; I mean go add the index.
There's an option to enable logging of queries that didn't use an index which would be useful to you.
If there are some queries that "works and stops working" (but are "using and index") then it's likely that the query wasn't very good in the first place (low cardinality in the index; inefficient join; ...) and the first rule of evaluating the query carefully when it's added would apply.
For question #2 - On InnoDB "analyze table" is basically free to run, so if you have bad join performance it doesn't hurt to run it. Unless the balance of the keys in the table are changing a lot it's unlikely to help though. It almost always comes down to bad queries. "optimize table" rebuilds the InnoDB table; in my experience it's relatively rare that it helps enough to be worth the hassle of having the table unavailable for the duration (or doing the master-master failover stuff while it's running).

How to tell if a MySQL process is stuck?

I have a long-running process in MySQL. It has been running for a week. There is one other connection, to a replication master, but I have halted slave processing so there's effectively nothing else going on.
How can I tell if this process is still working? I knew it would take a long time which is why I put it on its own database instance, but this is longer than I anticipated. Obviously, if it is still doing work, I don't want to kill it. If it is zombied, then I don't know how to get the work done that it's supposed to be doing.
It's in the "Sending data" state. The table is an InnoDB one but without any FK references that are used by the query. The InnoDB status shows no errors or locks since the query started.
Any thoughts are appreciated.
Try "SHOW PROCESSLIST" to see what's active.
Of course if you kill it, it may then want to take just as much time rolling it back.
You need to kill it and come up with better indices.
I did a job for a guy. Had a table with about 35 million rows. His batch process, like yours, had been running a week, with no end in sight. I added some indexes, made some changes to the order and methods of his batch process, and got the whole thing down to about two and a half hours. On a slower machine.
Given what you've said, it's not stuck. However, the is absolutely no guarantee that it will actually finish in anything resembling a reasonable amount of time. Adding indicies will almost certainly help, and depending on the type of query refactoring it into a series of queries that use temp tables could possibly give you a huge performance boost. I wouldn't suggest waiting around for it to maybe finish.
For better performance on a database that size, you may want to look at a document based database such as mongoDB. It will take more hard drive space to store the database, but depending on your current schema, you may get much better performance.

MySQL slow query log - how slow is slow?

What do you find is the optimal setting for mysql slow query log parameter, and why?
I recommend these three lines
log_slow_queries
set-variable = long_query_time=1
log-queries-not-using-indexes
The first and second will log any query over a second. As others have pointed out a one second query is pretty far gone if you are a shooting for a high transaction rate on your website, but I find that it turns up some real WTFs; queries that should be fast, but for whatever combination of data it was run against was not.
The last will log any query that does not use an index. Unless your doing data warehousing any common query should have the best index you can find so pay attention to its output.
Although its certainly not for production, this last option
log = /var/log/mysql/mysql.log
will log all queries, which can be useful if you are trying to tune a specific page or action.
Whatever time /you/ feel is unacceptably slow for a query on your systems.
It depends on the kind of queries you run and the kind of system; a query taking several seconds might not matter if it's some back-end reporting system doing complex data-mining etc where a delay doesn't matter, but might be completely unacceptable on a user-facing system which is expected to return results promptly.
Set it to whatever you like. The only problem is that in a stock MySQL, it can only be set in increments of 1 second, which is too slow for some people.
Most heavily used production servers execute far too many queries to log them all. The slow log is a way of filtering the log so that we can see the ones which take a long time (most queries are likely to be executed almost instantly). It's a bit of a blunt instrument.
Set it to 1 sec if you like, you're probably not going to run out of disc space or create a performance problem by doing that.
It's really about the risk of enabling the slow log- don't do it if you feel it's likely to cause further disc or performance problems.
Of course you could enable the slow log on a non-production server and put simulated load through, but that is never quite the same.
Peter Zaitsev posted a nice article about using the slow query log. One thing he notes is important is to also consider how often a certain query is used. Reports run once a day are not important to be fast. But something that is run very often might be a problem even if it takes half a second. And you cant detect that without the microslow patch.
Not only is it a blunt instrument as far as resolution is concerned, but also it is MySQL-instance wide, so that if you have different databases with differing performancy requirements you're kind of out of luck. Obviously there are ways around that, but it's important to keep that in mind when setting your slow log setting.
Aside from performance requirements of your application, another factor to consider is what you're trying to log. Are you using the log to catch queries that would threaten the stability of your db instance (ones that cause deadlocks or Cartesian joins, for instance) or queries that affect the performance for specific users and that might require a little tuning? That will influence where you set your threshold.