I have the following query:
SELECT t.*, a.hits AS ahits
FROM t, a
WHERE (t.TRACK LIKE 'xxx')
AND a.A_ID = t.A_ID
ORDER BY t.hits DESC, a.hits DESC
which runs very frequently. Table t has around 15M+ rows and a has around 3M+ rows.
When I did an EXPLAIN on the above query, I received a note saying that it always created a temp table. I noticed that creating a temp table based on the above query took quite a while. And, this is done plenty of time.
Thus, I am wondering if I create a view using the above say:
CREATE VIEW v_t_a
SELECT t.*, a.hits AS ahits
FROM t, a
WHERE a.A_ID = t.A_ID
And change my code to:
SELECT * FROM v_t_a WHERE TRACK LIKE 'xxx' ORDER BY hits DESC, ahits DESC
Will it improve the performance? Will it remove the create temp table time?
Thank you so much for your suggestions!
It is very dangerous if you assume MySQL would optimize your VIEWs same way as more advanced database systems would. Same as with subqueries and derived tables MySQL 5.0 will fail and perform very inefficiently in many counts.
MySQL has two ways of handling the VIEWS – query merge, in which case VIEW is simply expanded as a macro or Temporary Table in which case VIEW is materialized to temporary tables (without indexes !) which is later used further in query execution.
There does not seems to be any optimizations applied to the query used for temporary table creation from the outer query and plus if you use more then one Temporary Tables views which you join together you may have serious issues because such tables do not get any indexes.
So be very careful implementing MySQL VIEWs in your application, especially ones which require temporary table execution method. VIEWs can be used with very small performance overhead but only in case they are used with caution.
MySQL has long way to go getting queries with VIEWs properly optimized.
VIEW internally JOINS the TWO tables everytime you QUERY a VIEW...!!
To prevent this, create MATERIALIZED VIEW...
It is a view that is more of a TABLE ...You can query it directly as other table..
But you have to write some TRIGGERS to update it automatically, if any underlying TABLE data changes...
See this : http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views
It's rare that doing exactly the same operations in a view will be more efficient than doing it as a query.
The views are more to manage complexity of queries rather than performance, they simply perform the same actions at the back end as the query would have.
One exception to this is materialised query tables which actually create a separate long-lived table for the query so that subsequent queries are more efficient. I have no idea whether MySQL has such a thing, I'm a DB2 man myself :-)
But, you could possibly implement such a scheme yourself if performance of the query is an issue.
It depends greatly on the rate of change of the table. If the data is changing so often that a materialised query would have to be regenerated every time anyway, it won't be worth it.
Related
I have about 4 tables, one of them is about 10 millions rows ( it increase to about 500k lines per month ), the tables are fully optimized, my query is like this :
SELECT COUNT(id), select... FROM table1
INNER JOIN table2 on ...,
INNER JOIN table3 on ...
INNER JOIN table 4 on...
WHERE [ different conditions every time ]
This query takes about 1 minute execution time ( wich is far too long ) and what I want is to cache the first part of the query ( all but not the WHERE ) and then, once the cache is done, apply the where to the query cached.
The general idea is to execute the first part of the query every morning ( for exemple ) to put in cache this query to have minimal execution time when users will execute their own queries ( with the WHERE clause )
I think its possible because I tried for 'bechnmarking' to execute the query without WHERE ( about 1 minute execution time ), then I ran it with the WHERE clause and got a very low execution time so I think it seems working.
But I need help at this point, I dont know how to increase performances, how to put in the cache the query without the where or if you have a better solution...
Thank you in advance for your attention
I think you can prepare Materialized view for this query
WHAT IS A MATERIALIZED VIEW?
A Materialized View (MV) is the pre-calculated (materialized) result of a query. Unlike a simple VIEW the result of a Materialized View is stored somewhere, generally in a table. Materialized Views are used when immediate response is needed and the query where the Materialized View bases on would take to long to produce a result. Materialized Views have to be refreshed once in a while. It depends on the requirements how often a Materialized View is refreshed and how actual its content is. Basically a Materialized View can be refreshed immediately or deferred, it can be refreshed fully or to a certain point in time. MySQL does not provide Materialized Views by itself. But it is easy to build Materialized Views yourself.
For query cache to work, queries must match exactly. Consider creating a table, complete with indexes, then insert data from the first query with INSERT ... SELECT, then use that table.
You could also use CREATE TABLE ... SELECT to make the table the first time.
Here is a screenshot of mysql explain command on a common query:
http://cl.ly/3r34251M320A1P2s3e1Y
I have 3 different tables I have to join together to extract the data I want. This is the main CI model code using activerecord
$this->db->select('articles.id, articles.title, articles.link, articles.updated_time, articles.created_time, shows.network,shows.show_id, shows.name');
$this->db->from('articles')->order_by('updated_time','desc')->offset($offset)->limit($limit);
$this->db->join('labels', 'articles.remote_id = labels.articleid');
$this->db->join('shows', 'shows.show_id = labels.showid');
Can anyone suggest any ways to improve the schema or query performance?
Thanks
What makes your query slower is mysql use of temporary tables and filesort which means it can't efficiently use the index and creates a temporary table (many times on disk!).
It usually happen when you join a table using one index and sort it by another index or use a condition on another index.
First thing you can do is read about this issue and see if you can, at least, avoid the use of temporary disk tables.
How mysql uses temp tables
I often break my complicated queries into temporary tables so I can comment them and help me comprehend the steps.
Does this differ greatly from how mysql handle's nested joins internally?
e.g. select * from t1,t2,t3,t4 where t1.id=t2.id and t2.id2 = t3.id2, t4.id3 = t3.id3
Does an index(s) defined on t1 get "carried over" to whatever internal table mysql creates to hold intermediate results?
Is there any major difference performance wise between explicitly defining temporary tables or using one single query full of nested joins?
Indexes are not carried over to temporary tables you explicitly create. They're just like regular tables, except that they'll disappear when you no longer have the session open. The database has no knowledge that the data in the table originally came from a query on some other tables that had indexes; for one thing you might have inserted/deleted/updated rows in the temp table since the query.
I would expect mysql to make use of any indexes it thought would be beneficial when you run joins, but the index has to be on a table in the join.
I'm joining a number of tables and want to create some tables or views that are easier to query against to do quick analysis of our data. What are the implications of creating a new table or new view with the combined data.
Currently the tables I'm joining are static, but this code may be moved to our live tables in the future.
This is a slight oversimplification, but a view is basically a saved query on a table returning a result (in rows and columns), which you can then query as if it were its own table.
As of MySQL 5.0, views weren't all that great because it executed the underlying query every time it was used, so there really wasn't much point to them (although they could be useful for code reuse). That may have changed since 5.0, though.
A Table stores the data
A View is a stored procedure like select * from table saved in the database for later use
you could have a view joining two tables and then select from that view without a join clause but get a joined result
Be careful with views, as they don't necessarily use the indexes correctly in the underlying tables!
See this article for more information
http://www.mysqlperformanceblog.com/2007/08/12/mysql-view-as-performance-troublemaker/
In addition to Rob's explanation:
You can grant privileges not only on tables, but also on views. With this you can give access only to a compiled subset of a databases data.
I have a table where I have around 1.5 million+ results that I need to delete. Previously, I was using a temporary table and this caused the transaction log to increase in size quite quickly. Problem is, once I have done one result set, I need to move onto another where there is another 1.5 million+ results. The performance of this is rather slow and I'm wondering if I should use a table variable rather than writing a table to the temp database.
EDIT
I use the temporary table when I select the initial 1.5million+ records.
Side-stepping the table variable vs. temp table question, you're probably better off batching your deletes into smaller groups inside of a while loop. That's your best bet for keeping the transaction log size reasonable.
Something like:
while (1=1) begin
delete top(1000)
from YourTable
where ...
if ##rowcount < 1000 break
end /* while */
In general, I prefer using table variables over temp tables, if only because they're easier to use. I find few cases where the use of temp tables is warranted. You don't talk about how you're using temp tables in your routines, but I suggest benchmarking the two options.
A table variable is often not suitable for such large resultsets, being more appropriate for small numbers. You'd likely find that the table variable's data would be written to tempdb anyway due to its size.
Personally I have found table variables to be much slower than temporary tables when dealing with large resultsets. In an example mentioned at the end of this article on SQL Server Central using 1 million rows in a table of each time, the query using the temporary table took less than a sixth of the time to complete.
Personally I've found table variables to often suffer performance-wise when I have to join them to real tables in a query.
If the performance is slow it may be at least partly the settings on the database itself. Is it set to automatically grow? What's the recovery model of it?