MySQL was shut down in the middle of an indexing operation.
It still works but some of the queries seem much slower than before.
Is there anything particular we can check?
Is it possible that an index gets half way through?
Thanks much
As a suggested in my comment, you could try a repair on the relevant table(s).
That said, there's a section of the MySQL manual dedicated to this precise topic, which details how to use the REPAIR <table> statement and indeed dump/re-import.
Is this doesn't make any difference, you may need to check the database settings (if it's a InnoDB engined table/database, it'll love being able to be resident in memory for example) and perhaps try to see what specific indexes are being used via an EXPLAIN on the queries that are causing pain.
There are also commercial tools such as New Relic that'll show what specific queries are being sluggish in quite a lot of detail as well as monitoring other aspects of your system, which may be worth exploring if this is a commercial project/web site.
Related
We have some large indexes that we suspect are not being used in our Rails site, and would like to drop them to save space and computation. However, doing so could be catastrophic if it turns out they were being used. How can we confirm they are not being used?
One option is to log all queries for a time and run 'explain plan' on any of them that use the table in question. But I've heard 'explain plan' can occasionally be inaccurate. We would also have to collect queries for a few hours to be sure, which is quite a lot of log to store and process.
If there was a way to temporarily disable an index, we'd be willing to do that, as long as we could quickly enable it if problems arose. But I don't see a way to do that universally; you can only specify an 'ignore index' hint to individual sql statements.
Short Answer:
With MySQL 5.6 it is possible to do this by using the PERFORMANCE_SCHEMA and ps_helper.
ps_helper is a series of views and routines that present the data in PERFORMANCE_SCHEMA in more useful ways. The VIEW you are after is this one: http://www.markleith.co.uk/ps_helper/#schema_unused_indexes
More detailed:
The idea of disabling an index is called 'invisible indexes' in Oracle. MySQL doesn't support them, but I'd love to see this feature as well - I filed http://bugs.mysql.com/bug.php?id=70299 a couple of months ago on it.
Removing unused indexes is very important, as it can help optimizer performance. I have a war-story of using the ps_helper + unused_indexes view here: http://www.tocker.ca/2013/09/05/migrating-from-postgresql-to-mysql.html
There's only one procedure for this: Testing, testing, testing and benchmarking.
The primary function of indexes, apart form ensuring uniqueness, is to expedite data access. If all operations were O(1) there would be no need for indexes in the first place.
You need to have another instance of your application where you can experiment with adding, removing, and adjusting your indexes. It's impossible to replicate both real-world hardware and real-world loads, but you can come pretty close if you pay careful attention to how your hardware is configured, how your application is exercised, and can produce that's reasonably similar.
If you have a application logs that are sufficiently detailed, sometimes you can replay these operations. Read operations are easier to replay than writes, but both can be simulated if you've got enough time to invest in this.
For any application running at scale, you want to know where performance falls off a cliff. So long as your production load is well below this level you'll be okay. If you don't know where the cliff is, you might hit it without any warning.
Remember that indexes not only take up space, which is a minor issue, but the size of the index has an effect on how expensive it is to update, making writes more costly. It's ideal to have only the ones you need, but it's almost impossible to identify which are actually used. There are many that might be used in theory, but never are, and some that shouldn't be used, but which are because the query optimizer is a little dumb sometimes.
I'm still struggling with the performance of my MySQL database using the InnoDB engine. Especially the performance of data insertion, minor the performance of running queries.
I've been googling for information, how-to's and so on, but I found most information rather profound matter. Can I find somewhere on the net some basic information for "newbies", a starting point for performance optimization? The first, most import steps for InnoDB optimization, explained in a less complicated way.
I'm using the Windows platform
I used to manage a couple very large MySQL Databases (like, 1TB+). They were huge, unforgiving beasts with an endless appetite to cause me stomach problems.
I read everything I could find on MySQL Performance Tuning and innodb. Here's a summary of what helped me:
The book High Performance MySQL is good, but only gets you so far.
The blog MySQL Performance Blog (this link is to their posts tagged 'innodb') was the most useful overall resource I found on the net. They go into detail on a lot of innodb tuning issues. It gets 'ranty' at times, but overall it's great. Here's another link there on InnoDB Performance Optimization Basics that's good.
The last main thing I did to learn it was to simply read the MySQL Docs themselves. I read how every last parameter works, changed them on my server and then did some basic profiling. After a while you figure out what works by running big queries and seeing what happens. Here's a good place to start:
InnoDB Performance Tuning and Troubleshooting
In the end, it's just experimenting and working through things until you gain enough knowledge to know what works.
For newbies: innodb_flush_log_at_trx_commit=0, if you can afford to lose up to 1 second of your work if server crashes. This is the performace vs reliability tradeoff, but it will improve your write performance hugely. If you can afford battery backed write cache, use it.
Specifically on Windows, and for write performance, MariaDB 5.3 might be a better idea than stock MySQL from Oracle, since MariaDB is able to better utilize asynchronous IO on Windows. I wrote a note about it some time ago here, on standard synthetic benchmark it performs up to 500% better than stock MySQL 5.5 (see pictures at the end of the note).
However, the first and foremost thing that kills performance is the disk flushing. This is solvable if you relax durability with *innodb_flush_log_at_trx_commit* parameter, of with battery backed write cache. Also you might consider using larger transactions, they reduce the amount of disk flushes.
Try the MySQL Primer script: http://day32.com/MySQL/
I didn't use the 'net, I used books. :)
The book I used to learn MySQL is "Beginning MySQL" from Wrox Press, by Robert Sheldon and Geoff Moes. Chapter 15 goes into some basics of optimization. I liked this book a lot and think it would be good reading and has been my #1 reference. But it isn't very storage engine specific.
I have another book, Pro MySQL from apress that goes into a lot more detail about particular storage engines, but it is also much harder to read. Still a good reference though.
I'd like to know if it is any kind of issue having 200+ MySQL databases on the same server. None of them are probably going to be very used, I'm just wondering if there is any issue having so much databases.
Thanks in advance
Not necessarily. Shared-hosting services will usually have many hundreds of databases on each server, all relatively small. Just be sure you're not confusing "Databases" with "Tables," as is a problem for those new to that area of development.
No issues, they are just taking some disk space. If you don't need them, you can delete them or take backup of them then delete them.
Multiple tables is the problem, not databases.
Having enough tables will result in very poor performance as they'll need to be closed and reopened. With some engines (MyISAM) this also blows away some of the cache, which makes for very poor performance.
Whether you put them in multiple databases or a single one, makes no difference from a performance point of view.
It does however, make permissions management much easier.
Creation of objects like tables and indexes are fairly essential, even if the code has to be authorized or created by the dba. What other areas normally carried out by dbas should the accomplished developer be aware of?
A developer is responsible for doing everything that makes his code a) correct and b) fast.
This of course includes creating indexes and tables.
Making a DBA responsible for indexes is a bad idea. What if the code runs slowly? Who is to be blamed: a developer with bad code or a DBA with a bad index?
A DBA should convey database supporting operations like making backups, building the infrastructure etc, and report the lack of resources.
He or she should not be a sole person for making the decicions that affect the performance of the whole database system.
Relational databases, as for now, are not yet in that state that would allow splitting of responsibility so that developers could make the queries right and the DBA could make them fast. That's a myth.
If there is a lack of resources (say, an index makes some query fast at the expence of some DML operation being slow), this should be reported by a DBA, not fixed.
Now, it is a decision making time. What do we need more, fast query or a fast insert?
This decision should be made by the program manager (and not the DBA or developer).
And when the decision is made, the developer should be given the new task: "make the SELECT query to be as fast as possible, taking in account that you don't have this index". Or "make an INSERT query to be as fast as possible, taking in account that you will have this index".
A developer should know everything about how a database works, when it works normally.
A DBA should know everything about how to make a database to work normally.
The latter includes ability to make a backup, ability to restore from a backup and ability to detect and report a resource contention.
The ins and outs of database storage and optimization are huge. Knowing how to index and partition tables well is invaluable knowledge.
Also, how to read a query execution plan. SQL is such a cool language in that it will tell you exactly how it's going to run your code, so long as you ask nicely. This is absolutely essential in optimizing your code and finding bottlenecks.
Database maintenance (backups, shrinking files, etc) is always important to keep your server running smoothly. It's something that's often overlooked, too.
Developers should know all about triggers and stored procedures--getting the database to work for you. These things can help automate so many tasks, and often developers overlook them and try to handle it all app side, when they should really be handled by something that thinks in sets.
Which brings me to the most important point, database developers need to think in sets. To often I hear, "For each row, I want to..." and this is generally an alarm in my head. You should be thinking about how the set interacts and the actions you want to take on entire columns.
Optimization. Your code allways should use as little resources as you can achieve.
I would recommend developing an understanding of the security architecture for the relevant DBMS.
Doing so could facilitate your development of secure code.
With SQL Server specifically in mind for example:
Understand why your “managed code” (such as .NET CLR) should not be granted elevated privileges. What would be the implications of doing so?
What is Cross-Database ownership chaining? How does it work?
Understand execution context.
How does native SQL Server encryption work?
How can you sign a stored procedure? Why would you even want to do this?
Etc.
As a general rule, the more you understand about the engine you are working with, the more performance you can squeeze from it.
One thing that currently springs to mind is how to navigate and understand the information that database "system" tables/views gives to you. E.g. in sql server the views that are under the master database. These views hold information such as current logins, lists of tables and partitions etc. which is all useful stuff in trying to track down things such as hung logins or whether users are currently connected etc.
Relationships of your tables. You should always have a recent printout and soft copy of your database. You need to know the primary keys, foreign keys, required and auto filled columns, without that I think you can't write efficient queries or make sure your database is carrying only what it needs.
I think everyone else covered it.
Having a good understanding of the architecture of your database system will definitely be helpful. Can you draw a diagram by heart to show components of your DBMS and their interactions?
Why don't databases automatically index tables based on query frequency? Do any tools exist to analyze a database and the queries it is receiving, and automatically create, or at least suggest which indexes to create?
I'm specifically interested in MySQL, but I'd be curious for other databases as well.
That is a best question I have seen on stackoverflow. Unfortunately I don't have an answer. Google's bigtable does automatially index the right columns, but BigTable doesn't allow arbitrary joins so the problem space is much smaller.
The only answer I can give is this:
One day someone asked, "Why can't the computer just analyze my code and and compile & statically type the pieces of code that run most often?"
People are solving this problem today (e.g. Tamarin in FF3.1), and I think "auto-indexing" relational databases is the same class of problem, but it isn't as much a priority. A decade from now, manually adding indexes to a database will be considered a waste of time. For now, we are stuck with monitoring slow queries and running optimizers.
There are database optimizers that can be enabled or attached to databases to suggest (and in some cases perform) indexes that might help things out.
However, it's not actually a trivial problem, and when these aids first came out users sometimes found it actually slowed their databases down due to inferior optimizations.
Lastly, there's a LOT of money in the industry for database architects, and they prefer the status quo.
Still, databases are becoming more intelligent. If you use SQL server profiler with Microsoft SQL server you'll find ways to speed your server up. Other databases have similar profilers, and there are third party utilities to do this work.
But if you're the one writing the queries, hopefully you know enough about what you're doing to index the right fields. If not then having the right indexes is likely the least of your problems...
-Adam
MS SQL 2005 also maintains an internal reference of suggested indexes to create based on usage data. It's not as complete or accurate as the Tuning Advisor, but it is automatic. Research dm_db_missing_index_groups for more information.
There is a script on I think an MS SQL blog with a script for suggesting indexes in SQL 2005 but I can't find the exact script right now! Its just the thing from the description as I recall. Here's a link to some more info http://blogs.msdn.com/bartd/archive/2007/07/19/are-you-using-sql-s-missing-index-dmvs.aspx
PS just for SQL Server 2005 +
There are tools out there for this.
For MS SQL, use the SQL Profiler (to record activity against the database), and the Database Engine Tuning Advisor (SQL 2005) or the Index Tuning Wizard (SQL 2000) to analyze the activities and recommend indexes or other improvements.
Yes, some engines DO support automatic indexing. One such example for mysql is Infobright, their engine does not support "conventional" indexes and instead implicitly indexes everything - this is a column-based storage engine.
The behaviour of such engines tends to be very different from what developers (And yes, you need ot be a DEVELOPER to even be thinking about using Infobright; it is not a plug-in replacement for a standard engine) expect.
I agree with what Adam Davis says in his comment. I'll add that if such a mechanism existed to create indexes automatically, the most common reaction to this feature would be, "That's nice... How do I turn it off?"
Part of the reason may be that indexes don't just give a small speedup. If you don't have a suitable index on a large table queries can run so slowly that the application is entirely unusable, and possibly if it is interacting with other software it simply won't work. So you really need the indexes to be right before you start trying to use the application.
Also, rather than building an index in the background, and slowing things down further while it's being built, it is better to have the index defined before you start adding significant amounts of data.
I'm sure we'll get more tools that take sample queries and work out what indexes are necessary; also probably we will eventually get databases that do as you suggest and monitor performance and add indexes they think are necessary, but I don't think they will be a replacement for starting off with the right indexes.
Seems that MySQL doesn't have a user-friendly profiler. Maybe you want to try something like this, a php class based in MySQL profiler.
Amazon's SimpleDB has automatic indexing on all columns based on your usage:
http://aws.amazon.com/simpledb/
It has other limitations though:
It's a key-value store, not an RDB. Obviously that means slow joins (and no built-in join support).
It has a 10gb limit on table size. There are libraries that will handle partitioning big data for you although this locks you into that library's way of doing things, which can have its own problems.
It stores all values as strings, even numbers, which makes sorting a column with a 1,9, and 10 come out like 1,10,9 unless you use a library which hacks this by 0 padding. This also impacts negative numbers.
The 10gb limit is bigger than many might assume, so you could proceed with this for a simple site that you plan on rewriting if it ever hits big.
It's unfortunate this kind of automatic indexing didn't make it into DynamoDb, which appears to have replaced it - they don't even mention SimpleDb in their Product list anymore, you have to find it through old links to it.
Google App Engine does that (see the index.yaml file).