mySQL Inconsistent Performance - mysql

I'm running a mySQL query that joins various tables of 500,000+ rows. Sometimes it takes a second, other times around 15 seconds! This is on my local machine. I have experienced similarly varied times before on other intensive queries, does anyone know why this is?
Thanks
Thanks for the replies - I am using appropriate indexes, inner and left joins and have a WHERE clause range of one week out of possible 2 year period of invoices. If I keep varying it (so presumably query results are not cached) and re-running, time varies a lot, even if no. of rows retrieved is similar. The server is not busy. A few scheduled queries every minute but not intensive, take around 200ms.
The explain plan shows that a table of around 2000 rows is always fully scanned. So maybe these rows are sometimes cached, or maybe indexes are cached - didnt know indexes could be cached. I will try again with caching turned off.
Editing again - query cache is in fact off, I'm using InnoDB so looks like increasing innodb_buffer_pool_size is way to go

Same query each time?
It's hard to tell, based on what you've posted. If we assume that the schema and data aren't changing, I'd guess that there's something else running on your machine when the queries are long that would explain the difference. It could be that the state of memory is different, so paging is going on; an anti-virus program is running; some other service has started. It's impossible to answer.

Try to do an
Optimize Table
That should help to refresh some data useful for the query planner.
You have not give us much information, if you're using MyISAM tables, it may be a matter of locks.
Are you using ANSI INNER JOINs? Little basic, but don't use "cross joins". Those are the joins with the comma, like
SELECT * FROM t1, t2 WHERE t1.id_t1=t2.id_t1
Last things you may want to try. Increase your buffers (innodb), your key_buffers (myisam), and some query cache buffers.

Here's some common reasons(bar your server simply being too busy)
The slow query is hitting the harddrive. In the fast case the indexes and data are already cached in MySQL or the OS file cache.
Retrieving the data gets locked by updates/inserts, for MyISAM tables the whole table gets locked whenever someone inserts/updates data in it in some cases.
Table statistics are out of date and/or the wrong index gets selected. running analyze oroptimize on the table can help.
You have the query cache enabled, fetching the result of a cached query is fast, fetching it if it's not in the cache might be slow. Try turning off the query cache to check if the query is always slow if its not fetched from the cache.
In any case, you should show the output of EXPLAIN on your queries to verify indexes are getting used properly - even if they're not, queries can be fast if everything is in ram but grinding to a halt if it needs to hit the hardddrive.

Related

MySQL server very high load

I run a website with ~500 real time visitors, ~50k daily visitors and ~1,3million total users. I host my server on AWS, where I use several instances of different kind. When I started the website the different instances cost rougly the same. When the website started to gain users the RDS instance (MySQL DB) CPU constantly keept hitting the roof, I had to upgrade it several times, now it have started to take up the main part of the performance and monthly cost (around 95% of (2,8k$/month)). I currently use a database server with 16vCPU and 64GiB of RAM, I also use Multi-AZ Deployment to protect against failures. I wonder if it is normal for the database to be that expensive, or if I have done something terribly wrong?
Database Info
At the moment my database have 40 tables with the most of them have 100k rows, some have ~2millions and 1 have 30 millions.
I have a system the archives rows that are older then 21 days when they are not needed anymore.
Website Info
The website mainly use PHP, but also some NodeJS and python.
Most of the functions of the website works like this:
Start transaction
Insert row
Get last inserted id (lastrowid)
Do some calculations
Updated the inserted row
Update the user
Commit transaction
I also run around 100bots wich polls from the database with 10-30sec interval, they also inserts/updates the database sometimes.
Extra
I have done several things to try to lower the load on the database. Such as enable database cache, use a redis cache for some queries, tried to remove very slow queries, tried to upgrade the storage type to "Provisioned IOPS SSD". But nothing seems to help.
This is the changes I have done to the setting paramters:
I have though about creating a MySQL cluster of several smaller instances, but I don't know if this would help, and I also don't know if this works good with transactions.
If you need any more information, please ask, any help on this issue is greatly appriciated!
In my experience, as soon as you ask the question "how can I scale up performance?" you know you have outgrown RDS (edit: I admit my experience that leads me to this opinion may be outdated).
It sounds like your query load is pretty write-heavy. Lots of inserts and updates. You should increase the innodb_log_file_size if you can on your version of RDS. Otherwise you may have to abandon RDS and move to an EC2 instance where you can tune MySQL more easily.
I would also disable the MySQL query cache. On every insert/update, MySQL has to scan the query cache to see if there any results cached that need to be purged. This is a waste of time if you have a write-heavy workload. Increasing your query cache to 2.56GB makes it even worse! Set the cache size to 0 and the cache type to 0.
I have no idea what queries you run, or how well you have optimized them. MySQL's optimizer is limited, so it's frequently the case that you can get huge benefits from redesigning SQL queries. That is, changing the query syntax, as well as adding the right indexes.
You should do a query audit to find out which queries are accounting for your high load. A great free tool to do this is https://www.percona.com/doc/percona-toolkit/2.2/pt-query-digest.html, which can give you a report based on your slow query log. Download the RDS slow query log with the http://docs.aws.amazon.com/cli/latest/reference/rds/download-db-log-file-portion.html CLI command.
Set your long_query_time=0, let it run for a while to collect information, then change long_query_time back to the value you normally use. It's important to collect all queries in this log, because you might find that 75% of your load is from queries under 2 seconds, but they are run so frequently that it's a burden on the server.
After you know which queries are accounting for the load, you can make some informed strategy about how to address them:
Query optimization or redesign
More caching in the application
Scale out to more instances
I think the answer is "you're doing something wrong". It is very unlikely you have reached an RDS limitation, although you may be hitting limits on some parts of it.
Start by enabling detailed monitoring. This will give you some OS-level information which should help determine what your limiting factor really is. Look at your slow query logs and database stats - you may have some queries that are causing problems.
Once you understand the problem - which could be bad queries, I/O limits, or something else - then you can address them. RDS allows you to create multiple read replicas, so you can move some of your read load to slaves.
You could also move to Aurora, which should give you better I/O performance. Or use PIOPS (or allocate more disk, which should increase performance). You are using SSD storage, right?
One other suggestion - if your calculations (step 4 above) takes a significant amount of time, you might want look at breaking it into two or more transactions.
A query_cache_size of more than 50M is bad news. You are writing often -- many times per second per table? That means the QC needs to be scanned many times/second to purge the entries for the table that changed. This is a big load on the system when the QC is 2.5GB!
query_cache_type should be DEMAND if you can justify it being on at all. And in that case, pepper the SELECTs with SQL_CACHE and SQL_NO_CACHE.
Since you have the slowlog turned on, look at the output with pt-query-digest. What are the first couple of queries?
Since your typical operation involves writing, I don't see an advantage of using readonly Slaves.
Are the bots running at random times? Or do they all start at the same time? (The latter could cause terrible spikes in CPU, etc.)
How are you "archiving" "old" records? It might be best to use PARTITIONing and "transportable tablespaces". Use PARTITION BY RANGE and 21 partitions (plus a couple of extras).
Your typical transaction seems to work with one row. Can it be modified to work with 10 or 100 all at once? (More than 100 is probably not cost-effective.) SQL is much more efficient in doing lots of rows at once versus lots of queries of one row each. Show us the SQL; we can dig into the details.
It seems strange to insert a new row, then update it, all in one transaction. Can't you completely compute it before doing the insert? Hanging onto the inserted_id for so long probably interferes with others doing the same thing. What is the value of innodb_autoinc_lock_mode?
Do the "users" interactive with each other? If so, in what way?

Where should I focus: Optimize the query, changing database config or what else?

I took over a project and have 2 MyISAM tables.
table1 with approx. 1M rows, and
table2 with approx. 100K rows.
In the project these tables are accessed often, and at first it seems ok.
After I installed the project on a Windows 8.1 for local development I found that every day, the first time I access the site, my query takes 14 seconds. A bit too much.
Afterwards is less than 0.1 second.
Now, since on dev this accumulated with another query runs into a timeout-exception for php, it got me concerned about whether it's recommended to do anything about it or not. On production it seems not to occur (or hard to reproduce).
I heard of things like warm cache or optimize query but don't know what is meant by that.
What do experts like you do in this case?
I had another question set up here trying to see whether I can optimize the query.
Changing to InnoDB doesn't seem to have an impact.
The "first" time you run a query, two things may or may not happen:
Lots of disk I/O may be done to fetch the index blocks and/or data blocks from disk. (If other queries happened to have fetched those blocks, the blocks may be cached already.) (14s vs 0.1s is more than I usually see for this cold/warm cache difference.)
If the "Query cache" was on, the first SELECT and its resultset were stored in the QC. The second call may have found it there and returned the result almost instantly. (Usually this is ~1ms, not the 100ms you mentioned.) The QC can be bypassed for a single query by saying SELECT SQL_NO_CACHE ....
Since it is annoying you daily, you may as well go through the exercise of trying to optimize the query. If the tables are growing daily, it may get slower and slower over time. Note that if production needs to be restarted for any reason, that query may timeout on it. So, yes, try to optimize it.
A million rows is beginning to be "big".
The characteristics of this indicate that you are I/O-bound only initially. So it does not indicate that key_buffer_size and innodb_buffer_pool_size are too low.
If you want to discuss the performance of a particular query, start a new thread and provide SHOW CREATE TABLE and EXPLAIN SELECT ....

Reporting with MySQL - Simplest Query taking too long

I have a MySQL Table on an Amazon RDS Instance with 250 000 Rows. When I try to
SELECT * FROM tableName
without any conditions (just for testing, the normal query specifies the columns I need, but I need most of them) , the query takes between 20 and 60 seconds to execute. This will be the base query for my report, and the report should run in under 60 seconds, so I think this will not work out (it times out the moment I add the joins). The report runs without any problems in our smaller test environments.
Could it be that the Query is taking so long because MySQL is trying to lock the table and waiting for all writes to finish? There might be quite a lot of writes on this table. I am doing the query on a MySQL slave, since I do not want to lockup the production system with my queries.
I have no experience with how much rows are much for a relational DB. Are 250 000 Rows with ~30 columns (varchar, date and integer types) much?
How can I speedup this query (hardware, software, query optimization ...)
Can I tell MySQL that I do not care that the Data might be inconsistent (It is a snapshot from a Reporting Database)
Is there a chance that this query will run under 60 seconds, or do I have to adjust my goals?
Remember that MySQL has to prepare your result set and transport it to your client. In your case, this could be 200MB of data it has to shuttle across the connection, so 20 seconds is not bad at all. Most libraries, by default, wait for the entire result being received before forwarding it to the application.
To speed it up, fetch only the columns you need, or do it in chunks with LIMIT. SELECT * is usually a sign that someone's being super lazy and not optimizing at all.
If your library supports streaming resultsets, use that, as then you can start getting data almost immediately. It'll allow you to iterate on rows as they come in without buffering the entire result.
A table with 250,000 rows is not too big for MySQL at all.
However, waiting for those rows to be returned to the application does take time. That is network time, and there are probably a lot of hops between you and Amazon.
Unless your report is really going to process all the data, check the performance of the database with a simpler query, such as:
select count(*) from table;
EDIT:
Your problem is unlikely to be due to the database. It is probably due to network traffic. As mentioned in another answer, streaming might solve the problem. You might also be able to play with the data formats to get the total size down to something more reasonable.
A last-resort step would be to save the data in a text file, compress the file, move it over, and uncompress it. Although this sounds like a lot of work, you might get 5x - 10x compression on the data, saving oodles of time on the transmission and still have a large improvement in performance with the rest of the processing.
I got updated specs from my client and was able to reduce the amount of users returned to 250, which goes (with a lot of JOINS) though in 60 seconds.
So maybe the answer is really: Try to not dump a whole table with a query, fetch only the exact data your need. The Client has SQL access, and he will have to update his queries, so only relevant users are returned.
I should never really use * as a wildcard. Choose the fields that you actually want and then create an index of these fields combined.
If you have thousands of rows, another option is implement pagination.
If result data directly using for report , no one can look more than 100 rows in single shot.

Mysql query run twice is much faster the second time even with SQL_NO_CACHE

I am trying to profile two different queries that do the same thing to find which one is faster. For testing, I have put SQL_NO_CACHE into both queries to prevent the query cache from messing up the timing.
Query A is consistently 50ms.
Query B is 100ms the first time it is run and 10ms if I run it a second time shortly after.
Why is Query B faster the second time? The query cache should not be speeding up the queries. Could it be that the first run of query B loads the data from disk into memory so that the second query is running in memory and faster? Is there a way to test this? I tried to test this myself by doing select * from the table before I ran Query B, but it still exhibited the same behavior. Is SQL_NO_CACHE perhaps not working to disable the query cache?
Query B looks something like this:
SELECT SQL_NO_CACHE foo,bar FROM table1 LEFT JOIN table2 ON table1.foo=table2.foo WHERE bar=1
Depending on the storage engine you're using, yes it is most probably being loaded from a data cache and not a query cache.
MyISAM provides no storage engine level caching for data, and only caches indexes. However, the operating system often serves up data from its own caches which may well be speeding up your query execution.
You can try benchmarking the query in a real scenario, just log that specific query to the database every time its executed (along with its execution time).
Depending on the size of your indexes and your table type, it may be that indexes are not in memory the first time the query is run. So MySQL will pull indexes into memory the first time the query is run, causing a significant slowdown. The next time, most of what MySQL needs may in memory, resulting in the performance gain.
Is your app making a connection and doing the authentication handshake on the first query? If so the 2nd query will already have an open/authenticated connection to execute from. Try running it a 3rd time and see if the 2nd and 3rd tries are close to the same time.

How to predict MySQL tipping points?

I work on a big web application that uses a MySQL 5.0 database with InnoDB tables. Twice over the last couple of months, we have experienced the following scenario:
The database server runs fine for weeks, with low load and few slow queries.
A frequently-executed query that previously ran quickly will suddenly start running very slowly.
Database load spikes and the site hangs.
The solution in both cases was to find the slow query in the slow query log and create a new index on the table to speed it up. After applying the index, database performance returned to normal.
What's most frustrating is that, in both cases, we had no warning about the impending doom; all of our monitoring systems (e.g., graphs of system load, CPU usage, query execution rates, slow queries) told us that the database server was in good health.
Question #1: How can we predict these kinds of tipping points or avoid them altogether?
One thing we are not doing with any regularity is running OPTIMIZE TABLE or ANALYZE TABLE. We've had a hard time finding a good rule of thumb about how often (if ever) to manually do these things. (Since these commands LOCK tables, we don't want to run them indiscriminately.) Do these scenarios sound like the result of unoptimized tables?
Question #2: Should we be manually running OPTIMIZE or ANALYZE? If so, how often?
More details about the app: database usage pattern is approximately 95% reads, 5% writes; database executes around 300 queries/second; the table used in the slow queries was the same in both cases, and has hundreds of thousands of records.
The MySQL Performance Blog is a fantastic resource. Namely, this post covers the basics of properly tuning InnoDB-specific parameters.
I've also found that the PDF version of the MySQL Reference Manual to be essential. Chapter 7 covers general optimization, and section 7.5 covers server-specific optimizations you can toy with.
From the sound of your server, the query cache may be of IMMENSE value to you.
The reference manual also gives you some great detail concerning slow queries, caches, query optimization, and even disk seek analysis with indexes.
It may be worth your time to look into multi-master replication, allowing you to lock one server entirely and run OPTIMIZE/ANALYZE, without taking a performance hit (as 95% of your queries are reads, the other server could manage the writes just fine).
Section 12.5.2.5 covers OPTIMIZE TABLE in detail, and 12.5.2.1 covers ANALYZE TABLE in detail.
Update for your edits/emphasis:
Question #2 is easy to answer. From the reference manual:
OPTIMIZE:
OPTIMIZE TABLE should be used if you have deleted a large part of a table or if you have made many changes to a table with variable-length rows. [...] You can use OPTIMIZE TABLE to reclaim the unused space and to defragment the data table.
And ANALYZE:
ANALYZE TABLE analyzes and stores the key distribution for a table. [...] MySQL uses the stored key distribution to decide the order in which tables should be joined when you perform a join on something other than a constant. In addition, key distributions can be used when deciding which indexes to use for a specific table within a query.
OPTIMIZE is good to run when you have the free time. MySQL optimizes well around deleted rows, but if you go and delete 20GB of data from a table, it may be a good idea to run this. It is definitely not required for good performance in most cases.
ANALYZE is much more critical. As noted, having the needed table data available to MySQL (provided with ANALYZE) is very important when it comes to pretty much any query. It is something that should be run on a common basis.
Question #1 is a bit more of a trick. I would watch the server very carefully when this happens, namely disk I/O. My bet would be that your server is thrashing either your swap or the (InnoDB) caches. In either case, it may be query, tuning, or load related. Unoptimized tables could cause this. As mentioned, running ANALYZE can immensely help performance, and will likely help out too.
I haven't found any good way of predicting MySQL "tipping points" -- and I've run into a few.
Having said that, I've found tipping points are related to table size. But not merely raw table size, rather how big the "area of interest" is to a query. For example, in a table of over 3 million rows and about 40 columns, about three-quarters integers, most queries that would easily select a portion of them based on indices are fast. However, when one value in a query on one indexed column means two-thirds of the rows are now "interesting", the query is now about 5-times slower than normal. Lesson: try to arrange your data so such a scan isn't necessary.
However, such behaviour now gives you a size to look for. This size will be heavily dependant on your server setup, the MySQL server variables and the table's schema and data.
Similarly, I've seen reporting queries run in reasonable time (~45 seconds) if the period is two weeks, but take half-an-hour if the period is extended to four weeks.
Use slow query log that will help you to narrow down the queries you want to optimize.
For time critical queries it sometimes better to keep stable plan by using hints.
It sounds like you have a frustrating situation and maybe not the best code review process and development environment.
Whenever you add a new query to your code you need to check that it has the appropriate indexes ready and add those with the code release.
If you don't do that your second option is to constantly monitor the slow query log and then go beat the developers; I mean go add the index.
There's an option to enable logging of queries that didn't use an index which would be useful to you.
If there are some queries that "works and stops working" (but are "using and index") then it's likely that the query wasn't very good in the first place (low cardinality in the index; inefficient join; ...) and the first rule of evaluating the query carefully when it's added would apply.
For question #2 - On InnoDB "analyze table" is basically free to run, so if you have bad join performance it doesn't hurt to run it. Unless the balance of the keys in the table are changing a lot it's unlikely to help though. It almost always comes down to bad queries. "optimize table" rebuilds the InnoDB table; in my experience it's relatively rare that it helps enough to be worth the hassle of having the table unavailable for the duration (or doing the master-master failover stuff while it's running).