BACKGROUND
I'm working with a MySQL InnoDB database with 60+ tables and I'm creating different views in order to make dynamic queries fast and easier in the code. I have a couple of views with INNER JOINS (without many-to-many relationships) of 20 to 28 tables SELECTING 100 to 120 columns with row count below 5,000 and it works lighting fast.
ACTUAL PROBLEM
I'm creating a master view with INNER JOINS (without many-to-many relationships) of 34 tables and SELECTING about 150 columns with row count below 5,000 and it seems like it's too much. It takes forever to do a single SELECT. I'm wondering if I hit some kind of view-size limit and if there is any way of increasing it, or any tricks that would help me pass through this apparent limit.
It's important to note that I'm NOT USING Aggregate functions because I know about their negative impact on performance, which, by the way I'm very concerned about.
MySql does not use the "System R algorithm" (used by Postgresql, Oracle, and SQL Server, I think), which considers not only different merge algorithms (MySQL only has nested-loop, although you can fake a hash join by using a hash index), but also the possible ways of joining the tables and possible index combinations. The result seems to be that parsing of queries - and query execution - can be very quick upto a point, but performance can dramatically drop off as the optimizer chooses the wrong path through the data.
Take a look at your explain plans and try to see if a) the drop in performance is due to the number of columns you are returning (just do SELECT 1 or something) or b) if it is due to the optimizer choosing a table scan instead of index usage.
A view is just a named query. When you refer to a view in MySQL it just replaces the name with the actual query and run it.
It seems that you confuse it with materialized views, which are tables you create from a query. Afterwards you can query that table, and does not have to do the original query again.
Materialized views are not implemented in MySQL.
To improve the performance try to use the keyword explain to see where you can optimize your query/view.
Related
I have a view (say 'v') that is the combination of 10 tables using several Joins and complex calculations. In that view, there are around 10 Thousand rows.
And then I select 1 row based on row as WHERE id = 23456.
Another possible way to use a larger query in which I can cut short the dataset to 1% before the complex calculation starts.
Question: Are SQL views optimized in some form?
MySQL Views are just syntactic sugar. There is not special optimization. Think of views as being textually merged; then optimized. That is, you could get the same optimizations (or not) by manually writing the equivalent SELECT.
If you would like to discuss the particular query further, please provide SHOW CREATE TABLE/VIEW and EXPLAIN SELECT .... It may be that you are missing a useful 'composite' index.
This is somewhat of a conceptual question. In terms of query optimization and speed, I am wondering which route would have the best performance and be the fastest. Suppose I am using JFreeChart (this bit is somewhat irrelevant). The entire idea of using JFreeChart with a MYSQL database is to query for two values, an X and a Y. Suppose the database is full of many different tables, and usually, the X and the Y come from two different tables. Would it be faster, in the query for the chart, to use joins and unions to get the two values...or... first create a table with the joined/union values, and then run queries on this new table (no joins or unions needed)? This would all be in one code mind you. So, overall: joins and unions to get X and Y values, or create a temporary table joining the values and then querying the chart with those.
It would, of course, be faster to pre-join the data and select from a single table than to perform a join. This assumes that you're saving one lookup per row and are properly using indexes in the first place.
However, even though you get performance improvements from dernormalization in such a manner, it's not commonly done. A few of the reason why it's not common include:
Redundant data takes up more space
With redundant data, you have to update both copies whenever something changes
JOINs are fast
JOINs on multiple rows can improve (they don't always require a lookup per row) with such things as the new Batched Key Access joins in MySQL 5.6, but it only helps with some queries, hence you have to tell MySQL which join type you want. It's not automatic.
This is not a problem but it belongs to site optimization. I have 110K records of hotels. When I use SELECT something query it will pulled out data from 110k records.
If I search a hotel list with more than 3 star rating, price between 100 - 300 $ and within Mexico City. Suppose I got 45 matching results.
Is there any other way when I add more refinement, it will pulled out data from just only the 45 matching and not go with the 110K data?
The key is indexes my friend... make sure you have indexes of all items used in the WHERE and this will reduce cardinality when selecting...
On a side not... 110k rows is still an extremely small data set for MySQL so shouldn't pose much of a performance issue if you haven't got correct indexing on the table anyway.
It is more depend on how often your data updates.
See.
The MySQL Query Cache
Query Caching in MySQL
Caching question MySQL or Filesystem
I am saying that is there any other way when I add more refinement, it
will pulled out data from just only the 45 matching and not go with
the 110K data.
Then make view of those 45 rows and apply query to it.
Create a view using query
Create view refined as select * from ....
And after that add more select queries to that view
like
Select * from refined where ...
Firs of all, i tend to agree with Brian, indexes matter.
Check what kind(s) of queries are most frequent, and construct multi-column indexes on the table accordingly. Note that the order of columns in the indexes does matter (as the index is a tree, first column appears in tree root, so if your query does not use that column - the whole tree is useless).
Enable slow query log to see what queries actually take long (if any), or not use indexes, so you can improve indexes over time.
Having said this, query cache is a real performance boost, if your table data is mostly read. Here is a useful article on mysql query cache.
I have multiple views in my database that I am trying to perform a JOIN on when certain queries get very complex. As a worst case I would have to join 3 views with the following stats:
View 1 has 60K+ rows with 26 fields.
View 2 has 60K+ rows with 15 fields.
View 3 has 80K+ rows with 8 fields.
Joining views 1 and 2 seem to be no problem, but anytime I try to join the third view the query hangs. I'm wondering if there are any best practices I should be following to keep these querys from hanging. I've tried to use the smallest fields possible (medium/small ints where possible, ect).
We are using MySQL 5.0.92 community edition with MyISAM tables. Not sure if InnoDB would be more efficient.
As a last resort I thinking of splitting the one query into two, hitting views 1 & 2 with the first query, and then view 3 separately with the 3rd. Is there any downside to this other than making 2 queries?
Thanks.
You need to use EXPLAIN to understand why the performance is poor.
I wouldn't think you need to worry about MyISAM vs. InnoDB for this particular read performance just yet. MyISAM versus InnoDB
I am going to post my comments as an answer:
1) Take a look at the EXPLAIN command and see what it says.
2) Check the performance of the individual views. Are they as fast as you think on their own?
3) The columns you are using in your WHERE or JOIN clauses, do the underlying tables have indexes that apply to them? Something to have in mind:
A composite index (an index with more than one column) with columns
(a, b) would not help when you query only for b. It helps with a, and
a + b, but not with only b. That's why the single index you added
improved the situation
4) Are you using the all the columns and all the views? If you don't wouldn't it be simpler to take a look at the views and come up with a query instead?
If its possible to get what how the original VIEWs are defined, then use that as a basis to create your own single query might be a better approach... Way back, another person had similar issues on their query. He needed to get back to the raw table of one such view to ensure it had proper indexes to accept the optimization of the query he was trying to perform. Remember a view is a subset of something else and does not have an index to work with. So, if you can't take advantage of an index at the root table of a view, you could see such a performance hit.
Suppose I have a table A, creating a view V from that table.
Then I do several queries from V. I wonder if V will be re-constructed each time I query? or it will be constructed only 1 time, and being saved somewhere in memory by DBMS for next queries (which I think similar to query from a table)?
In general, no. V is a transient set of rows that is computed when requested by a query. Because you can apply additional WHERE and ORDER BY criteria when querying from a view, the execution plan for two queries against the same view could conceivably be quite different. The database generally cannot reuse the results of a previous query against a view to satisfy the next query against that view.
That said, there is a relatively new technology in some engines called Materialized Views. I have never used them myself, but my understanding is that these views are pre-computed based on updates that are made to the underlying tables. So with Materialize Views you do get improved SELECT performance, but at the expense of decrease INSERT, UPDATE, and DELETE performance.
You should also be aware that multi-column indexes can be used to precompute certain selections and sort orders involving individual tables. If you issue a query against a table that can be satisfied using a compound index (only the columns in the index are required by the query, and the sort order matches the index) then the table itself need never be read, only the index.
Views in MySQL are not a de facto caching solution.
MySQL runs the query against the base tables every time you query a view on those base tables. The results of the query are not stored for the view.
As a result, there is no need to "refresh" the view as there is when using materialized views in Oracle Microsoft SQL Server. Even the SQL in a MySQL view definition is re-evaluated every time you query the view.
If you need something like materialized views in MySQL, one tool that might help is FlexViews. This stores the results of a query in an ordinary base table, and then monitors changes recorded in MySQL's binary log, applying relevant changes to the base table. This tool can be quite useful, but it has some caveats:
FlexViews is written in PHP, and as such it has some performance limitations. Depending on your write traffic load, FlexViews may not be able to keep up.
It doesn't support every possible type of SELECT query.
FlexViews-managed materialized view tables are not updateable. That is, you can UPDATE this view table, but the change will not apply to the base tables.
According to Pinal Dave, a view must be refreshed in order to reflect changes made to its referenced table(s). I'm not sure this makes a view of a simple 1-table query any more efficient than querying the table directly (it probably doesn't) but I think it means that views containing complex joins and subqueries may be more efficient than their non-view counterparts.
Pinal Dave has more to say about the other limitations of SQL views (or features, if you like). Maybe you can learn something useful there.
Mysql Views do not support Indexes. (as like in Oracle, where you can create index in Oracle Views) But mysql views can use the indexes in underlying table when created with Merge Algorithm.
If you have to use views, then adjust your JOIN BUFFER.
Using, Something like this
set global join_buffer_size=314572800;
Do profile the differences before and after changing the buffer size.
I have seen after increasing join buffers, the view query executes in same time (in ms) as the table of the same size will do.