The question says it all. Which one is faster? And when should we use view instead subquery and vice verse when it comes to speed optimisation?
I do not have a certain situation but was thinking about that while trying some stuff with views in mysql.
A smart optimizer will come up with the same execution plan either way. But if there were to be a difference, it would be because the optimizer was for some reason not able to correctly predict how the view would behave, meaning a subquery might, in some circumstances, have an edge.
But that's beside the point; this is a correctness issue. Views and subquerys serve different purposes. You use views to provide code re-use or security. Reaching for a subquery when you should use a view without understanding the security and maintenance implications is folly. Correctness trumps performance.
Neither are particularly efficient in MySQL. In any case, MySQL does no caching of data in views, so the view simply adds another step in query execution. This makes views slower than subqueries. Check out this blog post for some extra info http://www.mysqlperformanceblog.com/2007/08/12/mysql-view-as-performance-troublemaker/
One possible alternative (if you can deal with slightly outdated data) is materialized views. Check out Flexviews for more info and an implementation.
Related
We have some large indexes that we suspect are not being used in our Rails site, and would like to drop them to save space and computation. However, doing so could be catastrophic if it turns out they were being used. How can we confirm they are not being used?
One option is to log all queries for a time and run 'explain plan' on any of them that use the table in question. But I've heard 'explain plan' can occasionally be inaccurate. We would also have to collect queries for a few hours to be sure, which is quite a lot of log to store and process.
If there was a way to temporarily disable an index, we'd be willing to do that, as long as we could quickly enable it if problems arose. But I don't see a way to do that universally; you can only specify an 'ignore index' hint to individual sql statements.
Short Answer:
With MySQL 5.6 it is possible to do this by using the PERFORMANCE_SCHEMA and ps_helper.
ps_helper is a series of views and routines that present the data in PERFORMANCE_SCHEMA in more useful ways. The VIEW you are after is this one: http://www.markleith.co.uk/ps_helper/#schema_unused_indexes
More detailed:
The idea of disabling an index is called 'invisible indexes' in Oracle. MySQL doesn't support them, but I'd love to see this feature as well - I filed http://bugs.mysql.com/bug.php?id=70299 a couple of months ago on it.
Removing unused indexes is very important, as it can help optimizer performance. I have a war-story of using the ps_helper + unused_indexes view here: http://www.tocker.ca/2013/09/05/migrating-from-postgresql-to-mysql.html
There's only one procedure for this: Testing, testing, testing and benchmarking.
The primary function of indexes, apart form ensuring uniqueness, is to expedite data access. If all operations were O(1) there would be no need for indexes in the first place.
You need to have another instance of your application where you can experiment with adding, removing, and adjusting your indexes. It's impossible to replicate both real-world hardware and real-world loads, but you can come pretty close if you pay careful attention to how your hardware is configured, how your application is exercised, and can produce that's reasonably similar.
If you have a application logs that are sufficiently detailed, sometimes you can replay these operations. Read operations are easier to replay than writes, but both can be simulated if you've got enough time to invest in this.
For any application running at scale, you want to know where performance falls off a cliff. So long as your production load is well below this level you'll be okay. If you don't know where the cliff is, you might hit it without any warning.
Remember that indexes not only take up space, which is a minor issue, but the size of the index has an effect on how expensive it is to update, making writes more costly. It's ideal to have only the ones you need, but it's almost impossible to identify which are actually used. There are many that might be used in theory, but never are, and some that shouldn't be used, but which are because the query optimizer is a little dumb sometimes.
When composing a view - should I stick with the base tables or can I feel confident that including a view within a view will not hurt performance. I want to include the view because it would allow me to change one base view if i have a change to the table design opposed to updating every single view that is dependent on the table changed. It just seems like the smarter thing to do, but want to make sure I am not doing something considered bad practice or hurts performance.
In SQL Server views get resolved at compile time. There is a very small performance impact during the compilation. There is no impact on the actual query execution. That assumes however that the same plan will be selected. If you nest views that contain complex joins, you might run into a situation where you access a table more often than necessary. The optimizer wont be able to figure that out and the system will end up doing a lot more work than necessary. So be careful to put only views into a query that do not contain more tables than you would have accessed by writing the query without the view.
Historically, some platforms have had "trouble" optimizing queries that incorporated multiple levels of views. I say "trouble", because most of the time even the poorly optimized queries were fast enough for me. (Almost all the time. But I try not to live near the bleeding edge.)
Several years ago, I decided I'd use views whenever it made sense to. Thoughtful use of views can greatly simplify complex databases; we all know that. But I decided to trust the optimizer to do a good enough job, and to trust the developers to release upgrades that made the optimizer better before my queries buried the server.
So if I thought a view would reduce the mental load on me, I created a view. If I needed to query a view of a view of a view, I just did it.
So far, that decision has proven to be a good one for me. I've never killed a server with a query, and I still understand my tables and views. (I still look at execution plans and test performance before I move a query to production, though.)
I'm struggling with MySQL index optimization for some queries that should be simple but are taking forever. Rather than post the specific problem, I wanted to ask if there is an automated way of dealing with these.
I searched around but couldn't find anything. Surely, if query/ index optimization is just following a set of steps, then someone must have written an app to automate it for a given query... or am I not appreciating the complexities involved?
Well, I can offer a SQL indexing tutorial. Let us know if you succeed with automation ;)
Not so sure about MySQL, but there are tools for Oracle and SQL Server. They cover the trivial cases, but they tend to give a false sense of safety regarding non-trivial cases. Nor do they consider the overall workload very will, they are usually limited to suggesting indexes for particular statements.
IF it was that simple, you'd have automated index builder within MySQL.
Actually there is a query optimizer build into MySQL and it transparently rewrites your queries into what it finds most optimal form before executing them. It doesn't always work all that well though, and has it's own quirks. Knowing these helps avoid some common pitfalls (like using dependent subqueries)
There are tools, that can, with the query log given, show you which indexes are not used, and you can, by enabling logging of queries, that do not use index, see which one need an index. The problem is that indexes are expensive and you cannot just index everything, and which indexes you need depends on your queries.
I was considering using MySQL views to provide an abstraction when pulling data from the DB. As I was looking for material on this, I came across this article, which ends with:
MySQL has long way to go getting
queries with VIEWs properly optimized.
The article is from 2007. Is this still applicable? Eg: Has MySQL solved these issues?
MySQL views work fine functionally, but they perform badly in the majority of cases.
This is because introducing even quite a simple view tends to cause a much worse query plan to be used by the optimiser. This makes using views impractical in the general case.
Often the server will materialise the entire view as a temporary table, which is not helpful for good performance (unless it happens to have only a very small number of rows in it).
So if you thought the explain plan was ok without a view, with a view it can turn terrible. This is still unresolved in the latest dev version of MySQL as far as I know.
You can have views, but don't expect decent (i.e. acceptable) performance.
Why don't databases automatically index tables based on query frequency? Do any tools exist to analyze a database and the queries it is receiving, and automatically create, or at least suggest which indexes to create?
I'm specifically interested in MySQL, but I'd be curious for other databases as well.
That is a best question I have seen on stackoverflow. Unfortunately I don't have an answer. Google's bigtable does automatially index the right columns, but BigTable doesn't allow arbitrary joins so the problem space is much smaller.
The only answer I can give is this:
One day someone asked, "Why can't the computer just analyze my code and and compile & statically type the pieces of code that run most often?"
People are solving this problem today (e.g. Tamarin in FF3.1), and I think "auto-indexing" relational databases is the same class of problem, but it isn't as much a priority. A decade from now, manually adding indexes to a database will be considered a waste of time. For now, we are stuck with monitoring slow queries and running optimizers.
There are database optimizers that can be enabled or attached to databases to suggest (and in some cases perform) indexes that might help things out.
However, it's not actually a trivial problem, and when these aids first came out users sometimes found it actually slowed their databases down due to inferior optimizations.
Lastly, there's a LOT of money in the industry for database architects, and they prefer the status quo.
Still, databases are becoming more intelligent. If you use SQL server profiler with Microsoft SQL server you'll find ways to speed your server up. Other databases have similar profilers, and there are third party utilities to do this work.
But if you're the one writing the queries, hopefully you know enough about what you're doing to index the right fields. If not then having the right indexes is likely the least of your problems...
-Adam
MS SQL 2005 also maintains an internal reference of suggested indexes to create based on usage data. It's not as complete or accurate as the Tuning Advisor, but it is automatic. Research dm_db_missing_index_groups for more information.
There is a script on I think an MS SQL blog with a script for suggesting indexes in SQL 2005 but I can't find the exact script right now! Its just the thing from the description as I recall. Here's a link to some more info http://blogs.msdn.com/bartd/archive/2007/07/19/are-you-using-sql-s-missing-index-dmvs.aspx
PS just for SQL Server 2005 +
There are tools out there for this.
For MS SQL, use the SQL Profiler (to record activity against the database), and the Database Engine Tuning Advisor (SQL 2005) or the Index Tuning Wizard (SQL 2000) to analyze the activities and recommend indexes or other improvements.
Yes, some engines DO support automatic indexing. One such example for mysql is Infobright, their engine does not support "conventional" indexes and instead implicitly indexes everything - this is a column-based storage engine.
The behaviour of such engines tends to be very different from what developers (And yes, you need ot be a DEVELOPER to even be thinking about using Infobright; it is not a plug-in replacement for a standard engine) expect.
I agree with what Adam Davis says in his comment. I'll add that if such a mechanism existed to create indexes automatically, the most common reaction to this feature would be, "That's nice... How do I turn it off?"
Part of the reason may be that indexes don't just give a small speedup. If you don't have a suitable index on a large table queries can run so slowly that the application is entirely unusable, and possibly if it is interacting with other software it simply won't work. So you really need the indexes to be right before you start trying to use the application.
Also, rather than building an index in the background, and slowing things down further while it's being built, it is better to have the index defined before you start adding significant amounts of data.
I'm sure we'll get more tools that take sample queries and work out what indexes are necessary; also probably we will eventually get databases that do as you suggest and monitor performance and add indexes they think are necessary, but I don't think they will be a replacement for starting off with the right indexes.
Seems that MySQL doesn't have a user-friendly profiler. Maybe you want to try something like this, a php class based in MySQL profiler.
Amazon's SimpleDB has automatic indexing on all columns based on your usage:
http://aws.amazon.com/simpledb/
It has other limitations though:
It's a key-value store, not an RDB. Obviously that means slow joins (and no built-in join support).
It has a 10gb limit on table size. There are libraries that will handle partitioning big data for you although this locks you into that library's way of doing things, which can have its own problems.
It stores all values as strings, even numbers, which makes sorting a column with a 1,9, and 10 come out like 1,10,9 unless you use a library which hacks this by 0 padding. This also impacts negative numbers.
The 10gb limit is bigger than many might assume, so you could proceed with this for a simple site that you plan on rewriting if it ever hits big.
It's unfortunate this kind of automatic indexing didn't make it into DynamoDb, which appears to have replaced it - they don't even mention SimpleDb in their Product list anymore, you have to find it through old links to it.
Google App Engine does that (see the index.yaml file).