As I'm looking to optimize some of my mysql queries and improve performance, I'm realizing that I'm just not sure what sort of response times I should be aiming for.
Obviously lower is better, but what do you normally target as a 'slow' query for a consumer site?
My queries are currently running between 0.00 to 0.70 seconds.
I assume 0.00 is pretty good, but what about 0.70 seconds? I assume I should be looking for improved performance on that? If so, what do you usually aim for?
-----A bit more info about my query to answer your questions (though I really am interested in yours & what you do ----
My queries are on joined tables with between 1-3 million rows (sometimes more).
I run my queries on the same box which has 4gb of RAM and an SATA drive.
That box runs 6 different databases two of which are regularly being queried, one regularly being updated.
The times I provided are for the 'read' database which is rarely written to.
For web requests, I am running between 1 & 3 queries (but I cache the data using memcache as well for performance).
The times I gave were from the mysql response from command line terminal to the machine (so I assume that is actually runtime).
The queries have many joins, between 4 & 7 depending on the query.
it depends - if you're in the same machine as the DB or in another machine.
it also depende which kind of query, how many joins, how many rows, do you use index to pull the data, So there is no good answer: "A Query should take this and that long".
I'm running some complicated queries, between two machines, and it takes between 780 and 850ms- and that means nothing to you....
(probably on the same machine they will be shorter..., over an internet connection - longer ect...)
It depends. Normally on smallish queries (under 50 000 rows queried), I'd expect < 150-200ms, but it depends on the hardware and environment.
In your example, you mention 700ms. What is the query doing? How much RAM is in the box? What disks are you using? These all affect the outcome.
It really depends on how many queries you are executing on a page and what acceptable response times are for your application.
A 100 second query might be okay for a data mining application, whereas a 2 second query is probably unacceptable for a web application.
When working on web applications, I have a debugging feature that I can turn out that outputs queries and execution times at the bottom of the page. Anything .5-1 second gets flagged with yellow and anything higher gets flagged with red. I have a link that I can click to get explain output for quick optimization.
Of course, if you have a web page with dozens of queries running at .3 seconds each, you have a problem.
If you are good at reading the explain for a query, then you can get an idea of how well you have optimized your queries. Red flags are things like temporary tables and lack of indexes. As far as absolute times, I think you will have a hard time finding a hard and fast rule on what 'fast' is.
Related
This is a general question that applies to MySQL, Oracle DB or whatever else might be out there.
I know for MySQL there is LIMIT offset,size; and for Oracle there is 'ROW_NUMBER' or something like that.
But when such 'paginated' queries are called back to back, does the database engine actually do the entire 'select' all over again and then retrieve a different subset of results each time? Or does it do the overall fetching of results only once, keeps the results in memory or something, and then serves subsets of results from it for subsequent queries based on offset and size?
If it does the full fetch every time, then it seems quite inefficient.
If it does full fetch only once, it must be 'storing' the query somewhere somehow, so that the next time that query comes in, it knows that it has already fetched all the data and just needs to extract next page from it.
In that case, how will the database engine handle multiple threads? Two threads executing the same query?
I am very confused :(
I desagree with #Bill Karwin. First of all, do not make assumptions in advance whether something will be quick or slow without taking measurements, and complicate the code in advance to download 12 pages at once and cache them because "it seems to me that it will be faster".
YAGNI principle - the programmer should not add functionality until deemed necessary.
Do it in the simplest way (ordinary pagination of one page), measure how it works on production, if it is slow, then try a different method, if the speed is satisfactory, leave it as it is.
From my own practice - an application that retrieves data from a table containing about 80,000 records, the main table is joined with 4-5 additional lookup tables, the whole query is paginated, about 25-30 records per page, about 2500-3000 pages in total. Database is Oracle 12c, there are indexes on a few columns, queries are generated by Hibernate.
Measurements on production system at the server side show that an average time (median - 50% percentile) of retrieving one page is about 300 ms. 95% percentile is less than 800 ms - this means that 95% of requests for retrieving a single page is less that 800ms, when we add a transfer time from the server to the user and a rendering time of about 0.5-1 seconds, the total time is less than 2 seconds. That's enough, users are happy.
And some theory - see this answer to know what is purpose of Pagination pattern
Yes, the query is executed over again when you run it with a different OFFSET.
Yes, this is inefficient. Don't do that if you have a need to paginate through a large result set.
I'd suggest doing the query once, with a large LIMIT — enough for 10 or 12 pages. Then save the result in a cache. When the user wants to advance through several pages, then your application can fetch the 10-12 pages you saved in the cache and display the page the user wants to see. That is usually much faster than running the SQL query for each page.
This works well if, like most users, your user reads only a few pages and then changes their query.
Re your comment:
By cache I mean something like Memcached or Redis. A high-speed, in-memory key/value store.
MySQL views don't store anything, they're more like a macro that runs a predefined query for you.
Oracle supports materialized views, so that might work better, but querying the view would have the overhead of interpreting an SQL query.
A simpler in-memory cache should be much faster.
My company has a mySQL server used by a team of analysts (usually 3-4 at a time). Lately the queries have slowed down, with some of them taking even days, for a database with tables up to 1 billion rows (10^9 records).
Server main features: Linux OS-64 GB of memory- 3 Terabytes of hard drive.
We know nothing of fine tuning, so any tool/rule of thumb to find out what is causing the trouble or at least to narrow it down, would be welcome.
Going to Workbench studio>Table inspector I found these key values for the DB that we use the most:
DB size: ~500 Gbytes
Largest table size: ~80 Gbytes
Index length (for largest table): ~230 Gbytes. This index relies on 6 fields.
Almost no MyISAM tables, all InnoDB
Ideally I would like to fine tune the server (better), the DB (worse), or both (in the future), in the simplest possible way, to speed it up.
My questions:
Are these values (500, 80, 230 GB) normal and manageable for a
medium size server?
Is it normal to have indexes of this size -230Gb-, way larger than the table itself?
What parameters/strategy can be tweaked to fix this? I'm thinking memory logs, or buying server RAM, but happy to investigate any sensible answers.
Many thanks.
If you're managing a MySQL instance of this scale, it would be worth your time to read High Performance MySQL which is the best book on MySQL tuning. I strongly recommend you get this book and read it.
Your InnoDB buffer pool is probably still at its default size, not taking advantage of the RAM on your Linux system. It doesn't matter how much RAM you have if you haven't configured MySQL to use it!
There are other important tuning parameters too. MySQL 5.7 Performance Tuning Immediately After Installation is a great introduction to the most important tuning options.
Indexes can be larger than the table itself. The factor of nearly 4 to 1 is unusual, but not necessarily bad. It depends on what indexes you need, and there's no way to know that unless you consider the queries you need to run against this data.
I did a presentation How to Design Indexes, Really a few years ago (it's just as relevant to current versions of MySQL). Here's the video: https://www.youtube.com/watch?v=ELR7-RdU9XU
Here's the order you want to check things:
1) Tune your indexes. Pick a commonly-used slow query and analyze it. Learn about EXPLAIN ANALYZE so that you can tell if your query is using indexes properly. It is entirely possible that your tables are not indexed correctly, and your days-long queries might run in minutes. Literally. Without proper indexes, your queries will be doing full table scans in order to do joins, and with billions of rows, that's going to be very, very slow.
A good introduction to indexes is at http://use-the-index-luke.com/ but there are zillions of books and articles on the topic.
1a) Repeat #1 with other slow queries. See if you can improve them. If you've worked on a number of slow queries and you're not able to speed them up, then proceed to server tuning.
2) Tune your server. Bill Karwin's links will be helpful there.
3) Look at increasing hardware/RAM. This should only be last resort.
Spend time with #1. It is likely to return the best bang for the buck. There is much you can do to improve things without spending a dime. You'll also learn how to write better queries and create better indexes and prevent these problems in the future.
Also: Listen to Bill Karwin and his knowledge. He is an Expert with a capital E.
In a survey of 600 rather random tables (a few were much bigger than yours), your 230GB:80GB ratio would be at about the 99th percentile. Please provide SHOW CREATE TABLE so we can discuss whether you are "doing something wrong", or it is simply an extreme situation. (Rarely is a 6-column index advisable. And if it is a single index adding up to 230GB, something is 'wrong'.)
I've seen bigger tables run fine in smaller machines. If you are doing mostly "point queries", there is virtually no size limitation. If you are using UUIDs, you are screwed. That is, it really depends on the data, the queries, the schema, the phase of the moon, your karma, etc.
A cross-join can easily get to a trillion things to do. A join with eq_ref is often not much slower than a query with no joins.
"You can't tune your way out of a performance problem." "Throwing hardware at a performance problem either wastes money, or delays the inevitable." Instead, let's see the "queries that are slowing down", together with EXPLAIN SELECT ... and SHOW CREATE TABLE.
Is this a Data Warehouse application? Do you have Summary Tables?
Here is my Cookbook on creating indexes . But it might be faster if you show us your code.
And I can provide another Tuning Analysis .
EXPLAIN SELECT ..... is a critical part of information needed to investigate your request for assistance.
SHOW CREATE TABLE for each table involved would also be helpful.
At this point in time, neither are visible in the data available from user......
I will try to answer your question but keep in mind that I am no MySQL expert.
1) It is quite large DB with large table, but nothing fairly sized server couldn't handle. But it really depends on the workload you have.
2) The index size greater than table itself is interesting but it will probably be size of all indexes on that table. In that case it is completely normal.
3) 64 GB of RAM in your server means that there will be probably lot of disk operations going on and it will definitely slow you down. So adding some memory will surely help. Maybe check how the server behaves when the query is running with iotop. And compare it with information from top to see if the server is waiting on disks.
My application is using JPA with Hibernate and I see that hibernate generates some interesting SQL queries with a lot of joins in my log files. The application does not have a lot of users right now and I am worried that some of the queries being generated by hibernate are going to cause problems when the database grows in size.
I have run some of the sql queries generated by hibernate through the EXPLAIN command to look at the query plans that are generated.
Is the output of EXPLAIN dependent on the size of the database? When my database grows in size will the query planner generate different plans for the same SQL queries?
At what point in the development / deployment cycle should I be looking at SQL query plans for sql queries generated by hibernate? When is the right time to use EXPLAIN.
How can the output of explain be used to determine if a query will become a problem, when the database is so small that every query no matter how complex looking runs in under 0.5 seconds?
I am using Postgres 9.1 as the database for my application but I am interested in the general answer to the above questions.
Actually, #ams you are right in your comment - it is generally pointless to use explain with tiny amounts of data.
If a table only has 10 rows then it's quite likely all in one page and it costs (roughly) the same to read one row as all 10. Going to an index first and then fetching the page will be more expensive than just reading the lot and ignoring what you don't want. PostgreSQL's planner has configured costs for things like index reads, table reads, disk accesses vs cache accesses, sorting etc. It sizes these according to the (approximate) size of the tables and distribution of values within them. What it doesn't do (as of the pending 9.2 release) is account for cross-column or cross-table correlations. It also doesn't offer manual hints that let you override the planner's choices (unlike MS-SQL or Oracle).
Each RDBMS' planner has different strengths and weaknesses but I think it's fair to say that MySQL's is the weakest (particularly in older releases).
So - if you want to know how your system will perform with 100 concurrent users and billions of rows you'll want to generate test data and load for a sizeable fraction of that. Worse, you'll want to have roughly the same distribution of values too. If most clients have about 10 invoices but a few have 1000 then that's something your test data will need to reflect. If you need to maintain performance across multiple RDBMS then repeat testing across all of them.
This is all separate from the overall performance of the system of course, which depends on the size and capabilities of your server vs its required load. A system can cope with a steady increase in load and then suddenly you will see performance drop rapidly as cache sizes are exceeded etc.
HTH
1 Is the output of EXPLAIN dependent on the size of the database? When my database grows in size will the query planner generate
different plans for the same SQL queries?
It all depends on your data and the statistics about the data. Many performance problems occur because lack of statistics, when somebody forgot to ANALYZE or turned auto_vacuum (incl. analyze) off.
2 At what point in the development / deployment cycle should I be looking at SQL query plans for sql queries generated by hibernate?
When is the right time to use EXPLAIN.
Hibernate has a habit of sending lots and lots of queries to the database, even for simple joins. Turn your querylog on, and keep an eye on that one. Later on, you could run an auto-explain on all queries from your log.
3 How can the output of explain be used to determine if a query will become a problem, when the database is so small that every query
no matter how complex looking runs in under 0.5 seconds?
No, because it all depends on the data. When 95% of your users are male, an index on gender won't be used when searching for a man. When you're looking for a woman, the index makes sense and will be used. A functional index on records where gender = female, is even better: It's useless to index something that will never benefit from an index and the index will be much smaller.
The only thing you can do to predict the usage of indexes, is testing with set enable_seqscan = off; that will show that it is possible to use some index, but that's all.
Can someone tell me whether the number of MySQL procs (average) at 0.39 is a lot??? I've gotten it down from 1.18 - 1.24 range. Our shared hosting provider says their limit is .4. Our site is not real big, though does make use of 5 DBs and has several pages that connect to the various DBs. To me it doesn't seem like a lot, but I have no frame of reference. Site traffic has pretty much been the same for the last 9 months. We’re averaging 108/visits per day and 5,234 page views a month. Most DB tables are small, the two largest are around 6,200 records and the other is around 350,000 records. The large one I was planning on archiving some old un-necessary data from several years ago and re-indexing to help performance, though not sure if this is an issue.
It seems like kind of a lot to me, yeah, for the kind of traffic you're talking about. I suspect you have some slow queries going on that would benefit from better indexing, or maybe some lengthy maintenance jobs. You might want to look into MySQL's slow query logging faculty.
At what point does a MySQL database start to lose performance?
Does physical database size matter?
Do number of records matter?
Is any performance degradation linear or exponential?
I have what I believe to be a large database, with roughly 15M records which take up almost 2GB. Based on these numbers, is there any incentive for me to clean the data out, or am I safe to allow it to continue scaling for a few more years?
The physical database size doesn't matter. The number of records don't matter.
In my experience the biggest problem that you are going to run in to is not size, but the number of queries you can handle at a time. Most likely you are going to have to move to a master/slave configuration so that the read queries can run against the slaves and the write queries run against the master. However if you are not ready for this yet, you can always tweak your indexes for the queries you are running to speed up the response times. Also there is a lot of tweaking you can do to the network stack and kernel in Linux that will help.
I have had mine get up to 10GB, with only a moderate number of connections and it handled the requests just fine.
I would focus first on your indexes, then have a server admin look at your OS, and if all that doesn't help it might be time to implement a master/slave configuration.
In general this is a very subtle issue and not trivial whatsoever. I encourage you to read mysqlperformanceblog.com and High Performance MySQL. I really think there is no general answer for this.
I'm working on a project which has a MySQL database with almost 1TB of data. The most important scalability factor is RAM. If the indexes of your tables fit into memory and your queries are highly optimized, you can serve a reasonable amount of requests with a average machine.
The number of records do matter, depending of how your tables look like. It's a difference to have a lot of varchar fields or only a couple of ints or longs.
The physical size of the database matters as well: think of backups, for instance. Depending on your engine, your physical db files on grow, but don't shrink, for instance with innodb. So deleting a lot of rows, doesn't help to shrink your physical files.
There's a lot to this issues and as in a lot of cases the devil is in the details.
The database size does matter. If you have more than one table with more than a million records, then performance starts indeed to degrade. The number of records does of course affect the performance: MySQL can be slow with large tables. If you hit one million records you will get performance problems if the indices are not set right (for example no indices for fields in "WHERE statements" or "ON conditions" in joins). If you hit 10 million records, you will start to get performance problems even if you have all your indices right. Hardware upgrades - adding more memory and more processor power, especially memory - often help to reduce the most severe problems by increasing the performance again, at least to a certain degree. For example 37 signals went from 32 GB RAM to 128GB of RAM for the Basecamp database server.
I'm currently managing a MySQL database on Amazon's cloud infrastructure that has grown to 160 GB. Query performance is fine. What has become a nightmare is backups, restores, adding slaves, or anything else that deals with the whole dataset, or even DDL on large tables. Getting a clean import of a dump file has become problematic. In order to make the process stable enough to automate, various choices needed to be made to prioritize stability over performance. If we ever had to recover from a disaster using a SQL backup, we'd be down for days.
Horizontally scaling SQL is also pretty painful, and in most cases leads to using it in ways you probably did not intend when you chose to put your data in SQL in the first place. Shards, read slaves, multi-master, et al, they are all really shitty solutions that add complexity to everything you ever do with the DB, and not one of them solves the problem; only mitigates it in some ways. I would strongly suggest looking at moving some of your data out of MySQL (or really any SQL) when you start approaching a dataset of a size where these types of things become an issue.
Update: a few years later, and our dataset has grown to about 800 GiB. In addition, we have a single table which is 200+ GiB and a few others in the 50-100 GiB range. Everything I said before holds. It still performs just fine, but the problems of running full dataset operations have become worse.
I would focus first on your indexes, than have a server admin look at your OS, and if all that doesn't help it might be time for a master/slave configuration.
That's true. Another thing that usually works is to just reduce the quantity of data that's repeatedly worked with. If you have "old data" and "new data" and 99% of your queries work with new data, just move all the old data to another table - and don't look at it ;)
-> Have a look at partitioning.
2GB and about 15M records is a very small database - I've run much bigger ones on a pentium III(!) and everything has still run pretty fast.. If yours is slow it is a database/application design problem, not a mysql one.
It's kind of pointless to talk about "database performance", "query performance" is a better term here. And the answer is: it depends on the query, data that it operates on, indexes, hardware, etc. You can get an idea of how many rows are going to be scanned and what indexes are going to be used with EXPLAIN syntax.
2GB does not really count as a "large" database - it's more of a medium size.
I once was called upon to look at a mysql that had "stopped working". I discovered that the DB files were residing on a Network Appliance filer mounted with NFS2 and with a maximum file size of 2GB. And sure enough, the table that had stopped accepting transactions was exactly 2GB on disk. But with regards to the performance curve I'm told that it was working like a champ right up until it didn't work at all! This experience always serves for me as a nice reminder that there're always dimensions above and below the one you naturally suspect.
Also watch out for complex joins. Transaction complexity can be a big factor in addition to transaction volume.
Refactoring heavy queries sometimes offers a big performance boost.
A point to consider is also the purpose of the system and the data in the day to day.
For example, for a system with GPS monitoring of cars is not relevant query data from the positions of the car in previous months.
Therefore the data can be passed to other historical tables for possible consultation and reduce the execution times of the day to day queries.
Performance can degrade in a matter of few thousand rows if database is not designed properly.
If you have proper indexes, use proper engines (don't use MyISAM where multiple DMLs are expected), use partitioning, allocate correct memory depending on the use and of course have good server configuration, MySQL can handle data even in terabytes!
There are always ways to improve the database performance.
It depends on your query and validation.
For example, i worked with a table of 100 000 drugs which has a column generic name where it has more than 15 characters for each drug in that table .I put a query to compare the generic name of drugs between two tables.The query takes more minutes to run.The Same,if you compare the drugs using the drug index,using an id column (as said above), it takes only few seconds.
Database size DOES matter in terms of bytes and table's rows number. You will notice a huge performance difference between a light database and a blob filled one. Once my application got stuck because I put binary images inside fields instead of keeping images in files on the disk and putting only file names in database. Iterating a large number of rows on the other hand is not for free.
No it doesnt really matter. The MySQL speed is about 7 Million rows per second. So you can scale it quite a bit
Query performance mainly depends on the number of records it needs to scan, indexes plays a high role in it and index data size is proportional to number of rows and number of indexes.
Queries with indexed field conditions along with full value would be returned in 1ms generally, but starts_with, IN, Between, obviously contains conditions might take more time with more records to scan.
Also you will face lot of maintenance issues with DDL, like ALTER, DROP will be slow and difficult with more live traffic even for adding a index or new columns.
Generally its advisable to cluster the Database into as many clusters as required (500GB would be a general benchmark, as said by others it depends on many factors and can vary based on use cases) that way it gives better isolation and gives independence to scale specific clusters (more suited in case of B2B)