I'm trying to use the MySQL performance reports to try and find what is bottlenecking my read and write performance under special situations and It's cluttered up with loads of old statistics regarding other queries to my tables.
I want to clear all the performance data so I can get a fresh look.
The Clear Event Tables button doesn't actually seem to clear anything.
How do I do this?
Using MySQL Workbench
Go to Performance schema setup
Click Clear Event Tables
Refresh Reports page. All events will be cleared
(This does not answer the question as asked. Instead, it addresses the broader question of "how do I improve query performance".)
Here's a simple-minded way to get useful metrics, even for fast queries:
FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';
The numbers may match the table size or the resultset size. These are good indications of table scan and some action (eg, sort) on the resultset.
A table scan probably means that no index was being used. Figuring out why is something that metrics probably can't tell you. But, sometimes there is a way to find that out:
EXPLAIN FORMAT=JSON SELECT ...
This will give you (in newer versions) the "cost" of various options. This may help you understand why one index is used over another.
optimizer_trace is another tool.
But nothing will give you any clue that INDEX(a,b), which you don't have, would be better than INDEX(a), which you do have. And that is one of the main points in my index cookbook.
Here's another example of what is hard to deduce from numbers. There was a production server with MySQL chewing up 100% of the CPU. The slowlog pointed to a simple SELECT that was being performed a lot. It had
WHERE DATE(indexed_column) = '2011-11-11'
Changing that dropped the CPU to a mere 2%:
WHERE indexed_column >= '2011-11-11'
AND indexed_column < '2011-11-11' + INTERVAL 1 DAY
The table was fully cached in RAM. (Hence, high CPU, low I/O.) The query had to do a full table scan, applying the DATE function to that indexed column for each row. After changing the code, the index did what it is supposed to do.
Related
I realize this is a sort of meta-programming question, but I'm assuming there are enough experienced people here to give a decent answer.
I was just building a query again, to retrieve some data from a table.
SELECT pl.field1, pl.field2
FROM table pl
LEFT JOIN table2 dp on pl.field1 = dp.field1
WHERE dp.field1 IS NULL
Executing this query took ages (1800+ seconds).
After I got sick of waiting, and made the effort to EXPLAIN the query, it turned out that a full table scan was done.
I created an index on dp.field1 and the query was almost instant thereafter, creating that index took less than a second.
Judging from the EXPLAIN, this wasn't too difficult to determine. Why can't, or won't, MySQL do this automatically? Spending just a second to create that index will make the query instant, so MySQL could theoretically create a temporary index, use it to do the query and then remove it again, which would still be orders of magnitude faster than the alternative.
I'm expecting the usual answers of 'to make sure you design a good schema' or 'mysql just does what you tell it to do', but I'm wondering if there might be a technical reason why this is a bad idea.
For columns with low cardinality it is not a good idea to use a B-Tree Index. B-Trees become degenerated for low cardinalities and do in fact increase query time in comparison to a full table scan.
So always creating a B-Tree index is not a good idea. At least it have to consider cardinality, too. And maybe several other things, too.
Quite simply - because the idea doesn't really scale using the current design of RDBMS engines.
It's okay for a single user, but databases are designed to support many concurrent users, and having each user's query also run a speculative optimization step ("can I speed up this query by creating an index?"), and creating that index, which in some circumstances is a very expensive operation, would become slow at any degree of scale. Having the index be "single use" would be wasteful of both computation time and disk space, but having lots of permanent indices in turn would slow down the query optimizer by having to investigate many indices for a given query. It would also slow down data modification operations.
Admittedly, on modern hardware, these concerns are a lot less significant - basic design of RDBMS engines dates back to the days when disk space was expensive, CPUs were several orders of magnitude slower, and memory was an unimaginable luxury.
I'm only speaking for MySQL because there may be a database system out there that automatically modifies your database design.
The simple answer is, MySQL simply does what you tell it to do.
MySQL cannot predict the future. Only you can. You know much more about your data than MySQL does. MySQL keeps some statistics, but it's guessing the best way to execute your query on very sparse information (that is sometimes outdated) before it actually tries to do so. Once it starts executing, it doesn't change its plan, no matter how wrong the guess was.
The methods that it uses to guess are all very well documented. It's our job to provide the indexes that will provide the most benefit, and even, at times, hint that it should use those indexes.
If you tell MySQL to perform a query that requires a table scan, it assumes you know that it's going to do a table scan, because it told you in its documentation that it would. It simply obeys.
Database systems that don't allow the DBA to make decisions don't scale well. There are always tradeoffs to be made, and you're the one to make them. MySQL is a hammer, not a carpenter.
I'm looking for some advice regarding a multilingual MySQL Database Structure which can handle huge amounts of data.
We are using the following method at the moment:
Articles <- Article_translations -> Languages
id id id
date language_id (fk) locale
category article_id (fk)
content
Ok, lets just say we've got like 100.000 Articles and 5 languages...well..you see the problem. The larger the data, the slower the database (just a guess here, but complex JOIN queries which are absolutely necessary probably won't be O(log(n)) but rather something like O(n^2)).
Our current solution is to split the Article_translations into [locale]_article_translations (e.g. en_us_article_translation) in which case we would need to synchronize the structure between those tables easily. Is this an appropriate method to solve this problem or are there better ones? If this is a good solution, is there something out there which could help to monitor changes (only structural, no data synch!) and synchronize those structures?
you are half right, larger data slower database, but if the DB don't have a good design it will be slow even with small data.
I can't tell you what is the best way or the best solution, remember that you need to make multiple things to find the "best solution". I just can recommend you some tools and some tips that could help you.
First, check your index, index types, no only PK and FK, you need also see which type of index do you need, I.E, do you need text index? or hashtree??.
Check also your engine, MyISAM or InnoDB?. You said that you split the table, check this post about split.
Also your query will be faster if you avoid things like '%word%' remember that a bad query will make a huge difference about time of response.
You can use Show create table or Describe select ...... or explain to see what's going on, or use the command benchmark to see the approximate time of a function that you are applying to improve it
Some tools for MySQL I'll recommend you to take a look to this program that will help you with this part of performance.
Mysqlslap (it's like benchmark but you can customize more the result).
SysBench (test CPUperformance, I/O performance, mutex contention, memory speed, database performance).
Mysqltuner (with this you can analize general statistics, Storage engine Statistics, performance metrics).
mk-query-profiler (perform analysis of a SQL Statement).
mysqldumpslow (good to know witch queries are causing problems).
Assuming if you tune your query properly
Check query execution plan with the large set of data
Make sure if you use DB level parameter as "large set" instead of row level
See if you make your table denormalized(or vise versa) enough.
I would suggest belows although I am not sure which version of MySQL your are using
Partitioing at DB level
Fast hard disk in the DB server
I would suggest to use partitioning first and then you might consider to upgrade hard disk.
Partitioning
Partitioning is data spliting provided by database level.
Based on your query usage, you can divide data, for example, by language in your case.
The good thing to use DB partitioning is that
it could be treated by a single table from application side
Depends on the data volumn and frequncy, it can be rearranged by DB level. No impact to the apps.
Hard disk quality
Also the hard disk quality is important to handle large set of data.
Even if the query is tunned at best, if you deal with lots of data in a single query, you need fast data access. But this is costy.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Can select * usage ever be justified?
Curious to hear this from folks with more DBA insight, but what performance implications does an application face from when you see a query like:
select * from some_large_table;
You have to do a full table scan since no index is being hit, and I believe if we're talking O notation, we're speaking O(N) here where N is the size of the table. Is this typically considered not optimal behavior? What if you really do need everything from the table at certain times? Yes we have tools such as pagination etc, but I'm talking strictly from a database perspective here. Is this type of behavior normally frowned upon?
What happens if you don't specify columns, is that the DB Engine has to query the master table data for the column list. This query is really fast, but causes a minor performance issue. As long as you're not doing a sloppy SELECT * with a JOIN statement or nested queries, you should be fine. However, note the small performance impact of letting the DB Engine doing a query to find the columns.
MySQL server opens a cursor on server-side to read that table. The client of the query may read none or all records and performance for the client will only depend on the number of records it actually fetched. Also the performance of the query on server-side can acutally be faster than query with some conditions as it involves also some index reading. Only if client fetched all records, it will be equivalent to full table scan.
Selecting more columns than you need (select *) is always bad. Don't do more than you have to
If you're selecting from the whole table, it doesn't matter if you have an index.
Some other issues you're going to run into is how you want to lock the table. If this is a busy application you might not want to prevent locking entirely because of the inconsistent data that might be returned. But if you lock too tightly it could slow the query further. O(n) is considered acceptable in any computer science application. However in databases we measure in time & number of reads/writes. This is a huge number of reads and will probably take a long time to execute. Therefore it's unacceptable.
I have a large mysql MyISAM table with 1.5mil rows and 4.5GB big, still increasing everyday.
I have done all the necessary indexing and the performance has been greatly optimized. Yet, the database occasionally break down (showing 500 Internal Server error) usually due to query overload. Whenever there is a break down, the table will start to work very slowly and I'll have to do a silly but effective task : copy the entire table over to a new table and replace the new one with the old one!!
You may ask why such a stupid action. Why not repair or optimize the table? I've tried that but the time to do repair or optimization may be more than the time to simply duplicate the table and more importantly the new table performs much faster.
Newly built table usually work very well. But over time, it will become sluggish (maybe after a month) and eventually lead to another break down (500 internal server). That's when everything slow down significantly and I need to repeat the silly process of replacing table.
For your info:
- The data in the table seldom get deleted. So there isn't a lot of overhead in the table.
- Under optimal condition, each query takes 1-3 secs. But when it becomes sluggish, the same query can take more than 30 seconds.
- The table has 24 fields, 7 are int, 3 are text, 5 are varchar and the rest are smallint. It's used to hold articles.
If you can explain what cause the sluggishness or you have suggestion on how to improve the situation, feel free to share it. I will be very thankful.
Consider moving to InnoDB. One of its advantages is that it's crash safe. If you need full text capabilities, you can achieve that by implementing external tools like Sphinx or Lucene.
Partitioning is a common strategy here. You might be able to partition the articles by what month they were committed to the database (for example) and then have your query account for returning results from the month of interest (how you partition the table would be up to you and your application's design/behavior). You can union results if you will need your results to come from more than one table.
Even better, depending on your MySQL version, partitioning may be supported by your server. See this for details.
I once had a MySQL database table containing 25 million records, which made even a simple COUNT(*) query takes minute to execute. I ended up making partitions, separating them into a couple tables. What i'm asking is, is there any pattern or design techniques to handle this kind of problem (huge number of records)? Is MSSQL or Oracle better in handling lots of records?
P.S
the COUNT(*) problem stated above is just an example case, in reality the app does crud functionality and some aggregate query (for reporting), but nothing really complicated. It's just that it takes quite a while (minutes) to execute some these queries because of the table volume
See Why MySQL could be slow with large tables and COUNT(*) vs COUNT(col)
Make sure you have an index on the column you're counting. If your server has plenty of RAM, consider increasing MySQL's buffer size. Make sure your disks are configured correctly -- DMA enabled, not sharing a drive or cable with the swap partition, etc.
What you're asking with "SELECT COUNT(*)" is not easy.
In MySQL, the MyISAM non-transactional engine optimises this by keeping a record count, so SELECT COUNT(*) will be very quick.
However, if you're using a transactional engine, SELECT COUNT(*) is basically saying:
Exactly how many records exist in this table in my transaction ?
To do this, the engine needs to scan the entire table; it probably knows roughly how many records exist in the table already, but to get an exact answer for a particular transaction, it needs a scan. This isn't going to be fast using MySQL innodb, it's not going to be fast in Oracle, or anything else. The whole table MUST be read (excluding things stored separately by the engine, such as BLOBs)
Having the whole table in ram will make it a bit faster, but it's still not going to be fast.
If your application relies on frequent, accurate counts, you may want to make a summary table which is updated by a trigger or some other means.
If your application relies on frequent, less accurate counts, you could maintain summary data with a scheduled task (which may impact performance of other operations less).
Many performance issues around large tables relate to indexing problems, or lack of indexing all together. I'd definitely make sure you are familiar with indexing techniques and the specifics of the database you plan to use.
With regards to your slow count(*) on the huge table, i would assume you were using the InnoDB table type in MySQL. I have some tables with over 100 million records using MyISAM under MySQL and the count(*) is very quick.
With regards to MySQL in particular, there are even slight indexing differences between InnoDB and MyISAM tables which are the two most commonly used table types. It's worth understanding the pros and cons of each and how to use them.
What kind of access to the data do you need? I've used HBase (based on Google's BigTable) loaded with a vast amount of data (~30 million rows) as the backend for an application which could return results within a matter of seconds. However, it's not really appropriate if you need "real time" access - i.e. to power a website. Its column-oriented nature is also a fairly radical change if you're used to row-oriented DBMS.
Is count(*) on the whole table actually something you do a lot?
InnoDB will have to do a full table scan to count the rows, which is obviously a major performance issue if counting all of them is something you actually want to do. But that doesn't mean that other operations on the table will be slow.
With the right indexes, MySQL will be very fast at retrieving data from tables much bigger than that. The problem with indexes is that they can hurt insert speeds, particularly for large tables as insert performance drops dramatically once the space required for the index reaches a certain threshold - presumably the size it will keep in memory. But if you only need modest insert speeds, MySQL should do everything you need.
Any other database will have similar tradeoffs between retrieve speed and insert speed; they may or may not be better for your application. But I would look first at getting the indexes right, and maybe rewriting your queries, before you try other databases. For what it's worth, we picked MySQL originally because we found it performed best.
Note that MyISAM tables in MySQL store the total size of the table. They maintain this because it's useful to the optimiser in some cases, but a side effect is that count(*) on the whole table is really fast. That doesn't necessarily mean they're faster than InnoDB at anything else.
I answered a similar question in This Stackoverflow Posting in some detail, describing the merits of the architectures of both systems. To some extent it was done from a data warehousing point of view but many of the differences also matter on transactional systems.
However, 25 million rows is not a VLDB and if you are having performance problems you should look to indexing and tuning. You don't need to go to Oracle to support a 25 million row database - you've got about 3 orders of magnitude to go before you're truly in VLDB territory.
You are asking for a books worth of answer and I therefore propose you get a good book on databases. There are many.
To get you started, here are some database basics:
First, you need a great data model based not just on what data you need to store but on usage patterns. Good database performance starts with good schema design.
Second, place indicies on columns based upon expected lookup AND update needs as update performance is often overlooked.
Third, don't put functions in where clauses if at all possible.
Fourth, use an -ahem- RDBMS engine that is of quality design. I would respectfully submit that while it has improved greatly in the recent past, mysql does not qualify. (Apologies to those who wish to argue it has finally made the grade in recent times.) There is no longer any need to choose between high-price and quality; Postgres (aka PostgreSql) is available open-source and is truly fantastic - and has all the plug-ins available to meet your needs.
Finally, learn what you are asking a database engine to do - gain some insight into internals - so you can better judge what kinds of things are expensive and why.
I'm going to second #Mark Baker, and say that you need to build indices on your tables.
For other queries than the one you selected, you should also be aware that using constructs such as IN() is faster than a series of OR statements in the query. There are lots of little steps you can take to speed-up individual queries.
Indexing is key to performance with this number of records, but how you write the queries can make a big difference as well. Specific performance tuning methods vary by database, but in general, avoid returning more records or fields than you actually need, make sure all join fields are indexed (as well as common where clause fields), avoid cursors (although I think this is less true in Oracle than SQL Server I don't know about mySQL).
Hardware can also be a bottleneck especially if you are running things besides the database server on the same machine.
Performance tuning is a very technical subject and can't really be answered well in a format like this. I suggest you get a performance tuning book and read it. Here is a link to one for mySQL
http://www.amazon.com/High-Performance-MySQL-Optimization-Replication/dp/0596101716