Extreme low-priority SELECT query in MySQL - mysql

Is it possible to issue an (expensive, but low-priority) SELECT query to mySQL in such a way that if an UPDATE query appears in the queue, mySQL will immediately terminate the query, and re-append it to the end of the queue?
If re-appending to the queue is not possible, I'm happy with simply killing the SELECT query.

No, not really.
I am not sure exactly what you need, but my guess is that you need to either optimize the SELECT to not lock an entire table, or get the replication going and do the SELECT on the slave rather than the master.
You could theoretically find out what the MySQL process ID is of the SELECT query, and in your application send a KILL before you do any update.

Well, sort of maybe.
A client runs an application which occasionally throws out queries that completely kill performance for everything else on the server. We have monitoring and if we've got a suitable person ready to react, we can deal to that query manually, and we learn about the problems in the app by doing things that way.
But to prevent major outages if noone is on the ball, we have an automated script which terminates long running queries, so the server does recover in the event that noone is available to intervene within 15 minutes.
Far from ideal, but that's where things are currently at with this project, and it does prevent the occasional extended outages that used to occur. We can only move just so fast with fixing up the problem queries.
Anyway, you could run something similar, that looks at the running queries and recognises when you have an update waiting on one of your large selects, and in that event it kills the select. Doing this sort of check a few times a minute is not overly expensive. I'd want to do a bit of testing before running.
So, whether you can solve your problem this way depends on what your tolerance is for how long an update can be delayed. Running this every minute (as we do) is no problem at all. Running it every second would noticeably add to the overall load. You'd need to test how far you can reasonably go in between those points.
This approach means some delay before the select gets pushed out of the way, but it saves you having to build this logic into potentially many different places in your application.
--
Regarding breaking up your query, you're most likely better off restricting the chunks by id range from one or more tables in your query rather than by offset and limit.
--
There may also be good solutions available based on partitioning your tables so that the queries don't collide as badly. Make sure you have a very good grasp on what you are doing for this though.

Related

MySQL server very high load

I run a website with ~500 real time visitors, ~50k daily visitors and ~1,3million total users. I host my server on AWS, where I use several instances of different kind. When I started the website the different instances cost rougly the same. When the website started to gain users the RDS instance (MySQL DB) CPU constantly keept hitting the roof, I had to upgrade it several times, now it have started to take up the main part of the performance and monthly cost (around 95% of (2,8k$/month)). I currently use a database server with 16vCPU and 64GiB of RAM, I also use Multi-AZ Deployment to protect against failures. I wonder if it is normal for the database to be that expensive, or if I have done something terribly wrong?
Database Info
At the moment my database have 40 tables with the most of them have 100k rows, some have ~2millions and 1 have 30 millions.
I have a system the archives rows that are older then 21 days when they are not needed anymore.
Website Info
The website mainly use PHP, but also some NodeJS and python.
Most of the functions of the website works like this:
Start transaction
Insert row
Get last inserted id (lastrowid)
Do some calculations
Updated the inserted row
Update the user
Commit transaction
I also run around 100bots wich polls from the database with 10-30sec interval, they also inserts/updates the database sometimes.
Extra
I have done several things to try to lower the load on the database. Such as enable database cache, use a redis cache for some queries, tried to remove very slow queries, tried to upgrade the storage type to "Provisioned IOPS SSD". But nothing seems to help.
This is the changes I have done to the setting paramters:
I have though about creating a MySQL cluster of several smaller instances, but I don't know if this would help, and I also don't know if this works good with transactions.
If you need any more information, please ask, any help on this issue is greatly appriciated!
In my experience, as soon as you ask the question "how can I scale up performance?" you know you have outgrown RDS (edit: I admit my experience that leads me to this opinion may be outdated).
It sounds like your query load is pretty write-heavy. Lots of inserts and updates. You should increase the innodb_log_file_size if you can on your version of RDS. Otherwise you may have to abandon RDS and move to an EC2 instance where you can tune MySQL more easily.
I would also disable the MySQL query cache. On every insert/update, MySQL has to scan the query cache to see if there any results cached that need to be purged. This is a waste of time if you have a write-heavy workload. Increasing your query cache to 2.56GB makes it even worse! Set the cache size to 0 and the cache type to 0.
I have no idea what queries you run, or how well you have optimized them. MySQL's optimizer is limited, so it's frequently the case that you can get huge benefits from redesigning SQL queries. That is, changing the query syntax, as well as adding the right indexes.
You should do a query audit to find out which queries are accounting for your high load. A great free tool to do this is https://www.percona.com/doc/percona-toolkit/2.2/pt-query-digest.html, which can give you a report based on your slow query log. Download the RDS slow query log with the http://docs.aws.amazon.com/cli/latest/reference/rds/download-db-log-file-portion.html CLI command.
Set your long_query_time=0, let it run for a while to collect information, then change long_query_time back to the value you normally use. It's important to collect all queries in this log, because you might find that 75% of your load is from queries under 2 seconds, but they are run so frequently that it's a burden on the server.
After you know which queries are accounting for the load, you can make some informed strategy about how to address them:
Query optimization or redesign
More caching in the application
Scale out to more instances
I think the answer is "you're doing something wrong". It is very unlikely you have reached an RDS limitation, although you may be hitting limits on some parts of it.
Start by enabling detailed monitoring. This will give you some OS-level information which should help determine what your limiting factor really is. Look at your slow query logs and database stats - you may have some queries that are causing problems.
Once you understand the problem - which could be bad queries, I/O limits, or something else - then you can address them. RDS allows you to create multiple read replicas, so you can move some of your read load to slaves.
You could also move to Aurora, which should give you better I/O performance. Or use PIOPS (or allocate more disk, which should increase performance). You are using SSD storage, right?
One other suggestion - if your calculations (step 4 above) takes a significant amount of time, you might want look at breaking it into two or more transactions.
A query_cache_size of more than 50M is bad news. You are writing often -- many times per second per table? That means the QC needs to be scanned many times/second to purge the entries for the table that changed. This is a big load on the system when the QC is 2.5GB!
query_cache_type should be DEMAND if you can justify it being on at all. And in that case, pepper the SELECTs with SQL_CACHE and SQL_NO_CACHE.
Since you have the slowlog turned on, look at the output with pt-query-digest. What are the first couple of queries?
Since your typical operation involves writing, I don't see an advantage of using readonly Slaves.
Are the bots running at random times? Or do they all start at the same time? (The latter could cause terrible spikes in CPU, etc.)
How are you "archiving" "old" records? It might be best to use PARTITIONing and "transportable tablespaces". Use PARTITION BY RANGE and 21 partitions (plus a couple of extras).
Your typical transaction seems to work with one row. Can it be modified to work with 10 or 100 all at once? (More than 100 is probably not cost-effective.) SQL is much more efficient in doing lots of rows at once versus lots of queries of one row each. Show us the SQL; we can dig into the details.
It seems strange to insert a new row, then update it, all in one transaction. Can't you completely compute it before doing the insert? Hanging onto the inserted_id for so long probably interferes with others doing the same thing. What is the value of innodb_autoinc_lock_mode?
Do the "users" interactive with each other? If so, in what way?

Mysql query fast only first time run

I have a mysql SELECT query which is fast (<0.1 sec) but only the first time I run it. It joins 3 tables together (using indices) and has a relatively simple WHERE statement. When I run it by hand in the phpmyadmin (always changing numbers in the WHERE so that it isn't cached) it is always fast but when I have php run several copies of it in a row, the first one is fast and the others hang for ~400 sec. My only guess is that somehow mysql is running out of memory for the connection and then has to do expensive paging.
My general question is how can I fix this behavior, but my specific questions are without actually closing and restarting the connection how can I make these queries coming from php be seen as separate just like the queries coming from phpmyadmin, how can I tell mysql to flush any memory when the request is done, and does this sound like a memory issue to you?
Well I found the answer at least in my case and I'm putting it here for anyone in the future who runs into a similar issue. The query I was running had a lot of results returned and MYSQL's query cache was causing a lot of overhead. When you run a query MYSQL will save it and its output so that it can quickly answer future identical requests quickly. All I had to do was put SQL_NO_CACHE and the speed was back to normal. Just look out if your incoming query is large or the results are very large because it can take considerable resources for MYSQL to decide when to kick things out.

Mysql Lock times in slow query log

I have an application that has been running fine for quite awhile, but recently a couple of items have started popping up in the slow query log.
All the queries are complex and ugly multi join select statements that could use refactoring. I believe all of them have blobs, meaning they get written to disk. The part that gets me curious is why some of them have a lock time associated with them. None of the queries have any specific locking protocols set by the application. As far as I know, by default you can read against locks unless explicitly specified.
so my question: What scenarios would cause a select statement to have to wait for a lock (and thereby be reported in the slow query log)? Assume both INNODB and MYISAM environments.
Could the disk interaction be listed as some sort of lock time? If yes, is there documentation around that says this?
thanks in advance.
MyISAM will give you concurrency problems, an entire table is completely locked when an insert is in progress.
InnoDB should have no problems with reads, even while a write/transaction is in progress due to it's MVCC.
However, just because a query is showing up in the slow-query log doesn't mean the query is slow - how many seconds, how many records are being examined?
Put "EXPLAIN" in front of the query to get a breakdown of the examinations going on for the query.
here's a good resource for learning about EXPLAIN (outside of the excellent MySQL documentation about it)
I'm not certain about MySql, but I know that in SQL Server select statements do NOT read against locks. Doing so will allow you to read uncommitted data, and potentially see duplicate records or miss a record entirely. The reason for this is because if another process is writing to the table, the database engine may decide it's time to reorganize some data and shifts it around on disk. So it moves a record you already read to the end and you see it again, or it moves one from the end up higher where you've already past.
There's a guy on the net somewhere who actually wrote a couple of scripts to prove that this happens and I tried them once and it only took a few seconds before a duplicate showed up. Of course, he designed the scripts in a fashion that would make it more likely to happen, but it proves that it definitely can happen.
This is okay behaviour if your data doesn't need to be accurate and can certainly help prevent deadlocks. However, if you're working on an application dealing with something like people's money then that's very bad.
In SQL Server you can use the WITH NOLOCK hint to tell your select statement to ignore locks. I'm not sure what the equivalent in MySql would be but maybe someone else here will say.

How to tell if a MySQL process is stuck?

I have a long-running process in MySQL. It has been running for a week. There is one other connection, to a replication master, but I have halted slave processing so there's effectively nothing else going on.
How can I tell if this process is still working? I knew it would take a long time which is why I put it on its own database instance, but this is longer than I anticipated. Obviously, if it is still doing work, I don't want to kill it. If it is zombied, then I don't know how to get the work done that it's supposed to be doing.
It's in the "Sending data" state. The table is an InnoDB one but without any FK references that are used by the query. The InnoDB status shows no errors or locks since the query started.
Any thoughts are appreciated.
Try "SHOW PROCESSLIST" to see what's active.
Of course if you kill it, it may then want to take just as much time rolling it back.
You need to kill it and come up with better indices.
I did a job for a guy. Had a table with about 35 million rows. His batch process, like yours, had been running a week, with no end in sight. I added some indexes, made some changes to the order and methods of his batch process, and got the whole thing down to about two and a half hours. On a slower machine.
Given what you've said, it's not stuck. However, the is absolutely no guarantee that it will actually finish in anything resembling a reasonable amount of time. Adding indicies will almost certainly help, and depending on the type of query refactoring it into a series of queries that use temp tables could possibly give you a huge performance boost. I wouldn't suggest waiting around for it to maybe finish.
For better performance on a database that size, you may want to look at a document based database such as mongoDB. It will take more hard drive space to store the database, but depending on your current schema, you may get much better performance.

MySQL slow query log - how slow is slow?

What do you find is the optimal setting for mysql slow query log parameter, and why?
I recommend these three lines
log_slow_queries
set-variable = long_query_time=1
log-queries-not-using-indexes
The first and second will log any query over a second. As others have pointed out a one second query is pretty far gone if you are a shooting for a high transaction rate on your website, but I find that it turns up some real WTFs; queries that should be fast, but for whatever combination of data it was run against was not.
The last will log any query that does not use an index. Unless your doing data warehousing any common query should have the best index you can find so pay attention to its output.
Although its certainly not for production, this last option
log = /var/log/mysql/mysql.log
will log all queries, which can be useful if you are trying to tune a specific page or action.
Whatever time /you/ feel is unacceptably slow for a query on your systems.
It depends on the kind of queries you run and the kind of system; a query taking several seconds might not matter if it's some back-end reporting system doing complex data-mining etc where a delay doesn't matter, but might be completely unacceptable on a user-facing system which is expected to return results promptly.
Set it to whatever you like. The only problem is that in a stock MySQL, it can only be set in increments of 1 second, which is too slow for some people.
Most heavily used production servers execute far too many queries to log them all. The slow log is a way of filtering the log so that we can see the ones which take a long time (most queries are likely to be executed almost instantly). It's a bit of a blunt instrument.
Set it to 1 sec if you like, you're probably not going to run out of disc space or create a performance problem by doing that.
It's really about the risk of enabling the slow log- don't do it if you feel it's likely to cause further disc or performance problems.
Of course you could enable the slow log on a non-production server and put simulated load through, but that is never quite the same.
Peter Zaitsev posted a nice article about using the slow query log. One thing he notes is important is to also consider how often a certain query is used. Reports run once a day are not important to be fast. But something that is run very often might be a problem even if it takes half a second. And you cant detect that without the microslow patch.
Not only is it a blunt instrument as far as resolution is concerned, but also it is MySQL-instance wide, so that if you have different databases with differing performancy requirements you're kind of out of luck. Obviously there are ways around that, but it's important to keep that in mind when setting your slow log setting.
Aside from performance requirements of your application, another factor to consider is what you're trying to log. Are you using the log to catch queries that would threaten the stability of your db instance (ones that cause deadlocks or Cartesian joins, for instance) or queries that affect the performance for specific users and that might require a little tuning? That will influence where you set your threshold.