what is the meaning of execution time of a query in mysql? - mysql

I want to know about what is actually meant by the execution time of a query in MYSQL. And how to reduce the execution time using indexing techniques?

"Execution time" is simply a stopwatch. That is, looking at a clock on the wall. It is a reasonable metric for (1) stressing impatience of user waiting for data, and (2) burden on the server.
There are hundreds of web pages going into details about indexing. And probably there are books written on the subject. This forum can help you with one query at a time.
I'll plug my index cookbook which is a distillation of tips for the thousands of questions I have answered about indexing.

Execution time depend on many factor like wait statistics, I/O, compiles, recompiles, network time, etc.
where as,
Indexing is a way to optimize performance of a database by minimizing the number of disk accesses required when a query is processed.
An index or database index is a data structure which is used to quickly locate and access the data in a database table.
So, Basically both are two different things and you can not combine both one thing
I have some useful link which can give you better clearity on this topic please look once at when you have lot of time with peace mind-
https://www.programmerinterview.com/index.php/database-sql/what-is-an-index/
https://dzone.com/articles/measuring-query-execution-time-what-is-most-accura

Related

Getting the cost of every MySQL query

We have a web application backed by MySQL serving hundreds of queries per second. I'm looking for a way to measure the "cost" of every query in production. I'm imagining some option where, for every query, MySQL returns the query results along with the CPU and I/O cost of executing that query.
The end goal is to aggregate those costs by endpoint (e.g. "/search") and by the logged-in user ID. That way, when we're having issues with site, we can quickly see if there's a particular action or user ID that is using up a large chunk of our MySQL resources.
Close but not quite (AFAICT):
This answer comes close: https://stackoverflow.com/a/12880997/163832
It describes the precision and accuracy problems with EXPLAIN and recommends an alternative that measures what actually happened rather than estimating what will happen.
The alternative does seem better for my use case, but there are still problems:
I looked at the available stats and can't find ones that measure CPU or I/O.
I don't think I can afford to do FLUSH STATUS and then SHOW SESSION STATUS ... on every query.
This doesn't work when many queries are running concurrently.

Why does MySQL not automatically create a temporary index for appropriate queries?

I realize this is a sort of meta-programming question, but I'm assuming there are enough experienced people here to give a decent answer.
I was just building a query again, to retrieve some data from a table.
SELECT pl.field1, pl.field2
FROM table pl
LEFT JOIN table2 dp on pl.field1 = dp.field1
WHERE dp.field1 IS NULL
Executing this query took ages (1800+ seconds).
After I got sick of waiting, and made the effort to EXPLAIN the query, it turned out that a full table scan was done.
I created an index on dp.field1 and the query was almost instant thereafter, creating that index took less than a second.
Judging from the EXPLAIN, this wasn't too difficult to determine. Why can't, or won't, MySQL do this automatically? Spending just a second to create that index will make the query instant, so MySQL could theoretically create a temporary index, use it to do the query and then remove it again, which would still be orders of magnitude faster than the alternative.
I'm expecting the usual answers of 'to make sure you design a good schema' or 'mysql just does what you tell it to do', but I'm wondering if there might be a technical reason why this is a bad idea.
For columns with low cardinality it is not a good idea to use a B-Tree Index. B-Trees become degenerated for low cardinalities and do in fact increase query time in comparison to a full table scan.
So always creating a B-Tree index is not a good idea. At least it have to consider cardinality, too. And maybe several other things, too.
Quite simply - because the idea doesn't really scale using the current design of RDBMS engines.
It's okay for a single user, but databases are designed to support many concurrent users, and having each user's query also run a speculative optimization step ("can I speed up this query by creating an index?"), and creating that index, which in some circumstances is a very expensive operation, would become slow at any degree of scale. Having the index be "single use" would be wasteful of both computation time and disk space, but having lots of permanent indices in turn would slow down the query optimizer by having to investigate many indices for a given query. It would also slow down data modification operations.
Admittedly, on modern hardware, these concerns are a lot less significant - basic design of RDBMS engines dates back to the days when disk space was expensive, CPUs were several orders of magnitude slower, and memory was an unimaginable luxury.
I'm only speaking for MySQL because there may be a database system out there that automatically modifies your database design.
The simple answer is, MySQL simply does what you tell it to do.
MySQL cannot predict the future. Only you can. You know much more about your data than MySQL does. MySQL keeps some statistics, but it's guessing the best way to execute your query on very sparse information (that is sometimes outdated) before it actually tries to do so. Once it starts executing, it doesn't change its plan, no matter how wrong the guess was.
The methods that it uses to guess are all very well documented. It's our job to provide the indexes that will provide the most benefit, and even, at times, hint that it should use those indexes.
If you tell MySQL to perform a query that requires a table scan, it assumes you know that it's going to do a table scan, because it told you in its documentation that it would. It simply obeys.
Database systems that don't allow the DBA to make decisions don't scale well. There are always tradeoffs to be made, and you're the one to make them. MySQL is a hammer, not a carpenter.

MySQL query speed or rows read

Sorry for lots of useless text. Most important stuff is told on last 3 paragraphs :D
Recently we had some mysql problem in one of client servers. Something out of blue starts sky-rocking CPU of mysql process. This problem lead us to finding and optimizing bad queries and here is a problem.
I was thinking that optimization is speeding up queries (total time needed for a query to execute). But after optimizing several queries towards it my colleague starting colleague started complaining that some queries read too many rows, all rows from table (as shown with EXPLAIN).
After rewriting a query I noticed that, if I want a query to read less rows - query speed suffers, if I query is made for speed - more rows are read.
And that didn't make me a sense: less rows read, but execution time is longer
And that made me wonder what should be done. Of course it would be perfect to have fast query which reads least rows. But since it doesn't seem to be possible for me, I'm searching for some answers. Which approach should I take - speed or less rows read? What are pros&cons when query is fast but with more rows read and when less rows are read with speed suffer? What happens with server at different cases?
After googling all I could find was articles and discussions about how to improve speed, but neither covered those different cases I mentioned before.
I'm looking forward to seeing even personal choices of course with some reasoning.
Links which could direct me right way are welcome too.
I think your problem depends on how you are limiting the amount of rows read. If you read less rows by implementing more WHERE clauses that MySQL needs to run against, then yes, performance will take a hit.
I would look at perhaps indexing some of your columns that make your search more complex. Simple data types are faster to lookup than complex ones. See if you are searching toward indexed columns.
Without more data, I can give you some hints:
Be sure your tables are properly indexed. Create the appropriate indexes for each of your tables. Also drop the indexes that are not needed.
Decide the best approach for each query. For example, if you use group by only to deduplicate rows, you are wasting resources; it is better to use select distinct (on an indexed field).
"Divide and conquer". Can you split your process in two, three or more intermediate steps? If the answer is "yes", then: Can you create temporary tables for some of these steps? I've split proceses using temp tables, and they are very useful to speed things up.
The count of rows read reported by EXPLAIN is an estimate anyway -- don't take it as a literal value. Notice that if you run EXPLAIN on the same query multiple times, the number of rows read changes each time. This estimate can even be totally inaccurate, as there have been bugs in EXPLAIN from time to time.
Another way to measure query performance is SHOW SESSION STATUS LIKE 'Handler%' as you test the query. This will tell you accurate counts of how many times the SQL layer made requests for individual rows to the storage engine layer. For examples, see my presentation, SQL Query Patterns, Optimized.
There's also an issue of whether the rows requested were already in the buffer pool (I'm assuming you use InnoDB), or did the query have to read them from disk, incurring I/O operations. A small number of rows read from disk can be orders of magnitude slower than a large number of rows read from RAM. This doesn't necessarily account for your case, but it points out that such a scenario can occur, and "rows read" doesn't tell you if the query caused I/O or not. There might even be multiple I/O operations for a single row, because of InnoDB's multi-versioning.
Insight into the difference between logical row request vs. physical I/O reads is harder to get. In Percona Server, enhancements to the slow query log include the count of InnoDB I/O operations per query.

Storing and analysis of historical data - What kind of Database?

I'm currently designing a system that watches the ranks / views of youtube videos. of LOTS of youtube videos (> 500.000 and growing) on a daily basis.
I'm currently considering storing this in a MySQL database, but what disturbs me, is that the table would grow into billions and trillions of rows, which I don't think would perform well.
I need to analyse this data, for example:
Which videos grew a lot in the time between X and Y
Plot the clicks per day
Plot the clicks per week ...
some more things I don't know yet about
So, what came into my web 2.0 mind was, is there a way a NoSQL database could handle this better? I didn't quite learn these (almost) new databases and don't know what they are capable of.
What would your advice be, what type of database to use?
Relational or not? If not, which NoSQL database?
PS: first priority is the fast evaluation and insertion of the results, second is high availability (or just replication)
It is very difficult to give an advice for a database system, because it always depends. However, considering that Facebook is built on MySQL, it shows that there probably performance is not a limit on MySQL for you.
What is helpful and you'll probably have done, is creating a structure of how your table structure should look like. Then also think of queries you would like to run against the tables.
If you have the right indexes (which is the main and crucial factor query speed relies on), you will not have to worry about performance in MySQL. What you should consider are (what I've had to experience), that there are many interesting things how MySQL deals with indexes. Let me give a few examples I had to figure out during the time:
if you want to use an index for a range scan, the index cannot be used for ORDER BY anymore
a range column has to be the last in an concatenated index for the full index to be used, same for ORDER BY again
For more information, a useful link on mysqlperformanceblog.com: http://www.mysqlperformanceblog.com/2009/09/12/3-ways-mysql-uses-indexes/
In general, if the structure of the database is well thought and the indexing is good, in my experience it does not matter actually if you only have 10.000 rows or 10 billion, the query time would be about the same.

How big can a MySQL database get before performance starts to degrade

At what point does a MySQL database start to lose performance?
Does physical database size matter?
Do number of records matter?
Is any performance degradation linear or exponential?
I have what I believe to be a large database, with roughly 15M records which take up almost 2GB. Based on these numbers, is there any incentive for me to clean the data out, or am I safe to allow it to continue scaling for a few more years?
The physical database size doesn't matter. The number of records don't matter.
In my experience the biggest problem that you are going to run in to is not size, but the number of queries you can handle at a time. Most likely you are going to have to move to a master/slave configuration so that the read queries can run against the slaves and the write queries run against the master. However if you are not ready for this yet, you can always tweak your indexes for the queries you are running to speed up the response times. Also there is a lot of tweaking you can do to the network stack and kernel in Linux that will help.
I have had mine get up to 10GB, with only a moderate number of connections and it handled the requests just fine.
I would focus first on your indexes, then have a server admin look at your OS, and if all that doesn't help it might be time to implement a master/slave configuration.
In general this is a very subtle issue and not trivial whatsoever. I encourage you to read mysqlperformanceblog.com and High Performance MySQL. I really think there is no general answer for this.
I'm working on a project which has a MySQL database with almost 1TB of data. The most important scalability factor is RAM. If the indexes of your tables fit into memory and your queries are highly optimized, you can serve a reasonable amount of requests with a average machine.
The number of records do matter, depending of how your tables look like. It's a difference to have a lot of varchar fields or only a couple of ints or longs.
The physical size of the database matters as well: think of backups, for instance. Depending on your engine, your physical db files on grow, but don't shrink, for instance with innodb. So deleting a lot of rows, doesn't help to shrink your physical files.
There's a lot to this issues and as in a lot of cases the devil is in the details.
The database size does matter. If you have more than one table with more than a million records, then performance starts indeed to degrade. The number of records does of course affect the performance: MySQL can be slow with large tables. If you hit one million records you will get performance problems if the indices are not set right (for example no indices for fields in "WHERE statements" or "ON conditions" in joins). If you hit 10 million records, you will start to get performance problems even if you have all your indices right. Hardware upgrades - adding more memory and more processor power, especially memory - often help to reduce the most severe problems by increasing the performance again, at least to a certain degree. For example 37 signals went from 32 GB RAM to 128GB of RAM for the Basecamp database server.
I'm currently managing a MySQL database on Amazon's cloud infrastructure that has grown to 160 GB. Query performance is fine. What has become a nightmare is backups, restores, adding slaves, or anything else that deals with the whole dataset, or even DDL on large tables. Getting a clean import of a dump file has become problematic. In order to make the process stable enough to automate, various choices needed to be made to prioritize stability over performance. If we ever had to recover from a disaster using a SQL backup, we'd be down for days.
Horizontally scaling SQL is also pretty painful, and in most cases leads to using it in ways you probably did not intend when you chose to put your data in SQL in the first place. Shards, read slaves, multi-master, et al, they are all really shitty solutions that add complexity to everything you ever do with the DB, and not one of them solves the problem; only mitigates it in some ways. I would strongly suggest looking at moving some of your data out of MySQL (or really any SQL) when you start approaching a dataset of a size where these types of things become an issue.
Update: a few years later, and our dataset has grown to about 800 GiB. In addition, we have a single table which is 200+ GiB and a few others in the 50-100 GiB range. Everything I said before holds. It still performs just fine, but the problems of running full dataset operations have become worse.
I would focus first on your indexes, than have a server admin look at your OS, and if all that doesn't help it might be time for a master/slave configuration.
That's true. Another thing that usually works is to just reduce the quantity of data that's repeatedly worked with. If you have "old data" and "new data" and 99% of your queries work with new data, just move all the old data to another table - and don't look at it ;)
-> Have a look at partitioning.
2GB and about 15M records is a very small database - I've run much bigger ones on a pentium III(!) and everything has still run pretty fast.. If yours is slow it is a database/application design problem, not a mysql one.
It's kind of pointless to talk about "database performance", "query performance" is a better term here. And the answer is: it depends on the query, data that it operates on, indexes, hardware, etc. You can get an idea of how many rows are going to be scanned and what indexes are going to be used with EXPLAIN syntax.
2GB does not really count as a "large" database - it's more of a medium size.
I once was called upon to look at a mysql that had "stopped working". I discovered that the DB files were residing on a Network Appliance filer mounted with NFS2 and with a maximum file size of 2GB. And sure enough, the table that had stopped accepting transactions was exactly 2GB on disk. But with regards to the performance curve I'm told that it was working like a champ right up until it didn't work at all! This experience always serves for me as a nice reminder that there're always dimensions above and below the one you naturally suspect.
Also watch out for complex joins. Transaction complexity can be a big factor in addition to transaction volume.
Refactoring heavy queries sometimes offers a big performance boost.
A point to consider is also the purpose of the system and the data in the day to day.
For example, for a system with GPS monitoring of cars is not relevant query data from the positions of the car in previous months.
Therefore the data can be passed to other historical tables for possible consultation and reduce the execution times of the day to day queries.
Performance can degrade in a matter of few thousand rows if database is not designed properly.
If you have proper indexes, use proper engines (don't use MyISAM where multiple DMLs are expected), use partitioning, allocate correct memory depending on the use and of course have good server configuration, MySQL can handle data even in terabytes!
There are always ways to improve the database performance.
It depends on your query and validation.
For example, i worked with a table of 100 000 drugs which has a column generic name where it has more than 15 characters for each drug in that table .I put a query to compare the generic name of drugs between two tables.The query takes more minutes to run.The Same,if you compare the drugs using the drug index,using an id column (as said above), it takes only few seconds.
Database size DOES matter in terms of bytes and table's rows number. You will notice a huge performance difference between a light database and a blob filled one. Once my application got stuck because I put binary images inside fields instead of keeping images in files on the disk and putting only file names in database. Iterating a large number of rows on the other hand is not for free.
No it doesnt really matter. The MySQL speed is about 7 Million rows per second. So you can scale it quite a bit
Query performance mainly depends on the number of records it needs to scan, indexes plays a high role in it and index data size is proportional to number of rows and number of indexes.
Queries with indexed field conditions along with full value would be returned in 1ms generally, but starts_with, IN, Between, obviously contains conditions might take more time with more records to scan.
Also you will face lot of maintenance issues with DDL, like ALTER, DROP will be slow and difficult with more live traffic even for adding a index or new columns.
Generally its advisable to cluster the Database into as many clusters as required (500GB would be a general benchmark, as said by others it depends on many factors and can vary based on use cases) that way it gives better isolation and gives independence to scale specific clusters (more suited in case of B2B)