I often break my complicated queries into temporary tables so I can comment them and help me comprehend the steps.
Does this differ greatly from how mysql handle's nested joins internally?
e.g. select * from t1,t2,t3,t4 where t1.id=t2.id and t2.id2 = t3.id2, t4.id3 = t3.id3
Does an index(s) defined on t1 get "carried over" to whatever internal table mysql creates to hold intermediate results?
Is there any major difference performance wise between explicitly defining temporary tables or using one single query full of nested joins?
Indexes are not carried over to temporary tables you explicitly create. They're just like regular tables, except that they'll disappear when you no longer have the session open. The database has no knowledge that the data in the table originally came from a query on some other tables that had indexes; for one thing you might have inserted/deleted/updated rows in the temp table since the query.
I would expect mysql to make use of any indexes it thought would be beneficial when you run joins, but the index has to be on a table in the join.
Related
I'm experimenting with various indexing settings for my mysql database.
I wonder though, by removing or adding indexes is there any possibility to damage data rows in any way? Obviously I realise that if I make any application queries fail, that can cause bad rows. I'm more talking just about the structural queries themselves.
Or will I simply affect the efficiency of the database?
I just want to know if I have safety to experiment or if I have to be cautious?
The data isn't in phpmyadmin, it's in mysql. Adding/removing an index will not affect your data integrity by default. With a unique index, and using the ignore keyword it can.
That said - you should always have a backup of your data, it's easy to run a test like:
CREATE TABLE t1 LIKE t;
INSERT INTO t1 SELECT * FROM t;
ALTER TABLE t1 CREATE INDEX ...;
Then compare the difference in tables (perhaps a COUNT is fine in your case).
Adding/removing indexes is safe in terms of the rows in your table. However as you note, too many indexes or poorly constructed indexes can be (very) detrimental to performance. Likewise, adding indexes on large tables can be a very expensive process, and can bring a MySQL server to its knees, so you're better off not "experimenting" on production tables.
Here is a screenshot of mysql explain command on a common query:
http://cl.ly/3r34251M320A1P2s3e1Y
I have 3 different tables I have to join together to extract the data I want. This is the main CI model code using activerecord
$this->db->select('articles.id, articles.title, articles.link, articles.updated_time, articles.created_time, shows.network,shows.show_id, shows.name');
$this->db->from('articles')->order_by('updated_time','desc')->offset($offset)->limit($limit);
$this->db->join('labels', 'articles.remote_id = labels.articleid');
$this->db->join('shows', 'shows.show_id = labels.showid');
Can anyone suggest any ways to improve the schema or query performance?
Thanks
What makes your query slower is mysql use of temporary tables and filesort which means it can't efficiently use the index and creates a temporary table (many times on disk!).
It usually happen when you join a table using one index and sort it by another index or use a condition on another index.
First thing you can do is read about this issue and see if you can, at least, avoid the use of temporary disk tables.
How mysql uses temp tables
I have the following query:
SELECT t.*, a.hits AS ahits
FROM t, a
WHERE (t.TRACK LIKE 'xxx')
AND a.A_ID = t.A_ID
ORDER BY t.hits DESC, a.hits DESC
which runs very frequently. Table t has around 15M+ rows and a has around 3M+ rows.
When I did an EXPLAIN on the above query, I received a note saying that it always created a temp table. I noticed that creating a temp table based on the above query took quite a while. And, this is done plenty of time.
Thus, I am wondering if I create a view using the above say:
CREATE VIEW v_t_a
SELECT t.*, a.hits AS ahits
FROM t, a
WHERE a.A_ID = t.A_ID
And change my code to:
SELECT * FROM v_t_a WHERE TRACK LIKE 'xxx' ORDER BY hits DESC, ahits DESC
Will it improve the performance? Will it remove the create temp table time?
Thank you so much for your suggestions!
It is very dangerous if you assume MySQL would optimize your VIEWs same way as more advanced database systems would. Same as with subqueries and derived tables MySQL 5.0 will fail and perform very inefficiently in many counts.
MySQL has two ways of handling the VIEWS – query merge, in which case VIEW is simply expanded as a macro or Temporary Table in which case VIEW is materialized to temporary tables (without indexes !) which is later used further in query execution.
There does not seems to be any optimizations applied to the query used for temporary table creation from the outer query and plus if you use more then one Temporary Tables views which you join together you may have serious issues because such tables do not get any indexes.
So be very careful implementing MySQL VIEWs in your application, especially ones which require temporary table execution method. VIEWs can be used with very small performance overhead but only in case they are used with caution.
MySQL has long way to go getting queries with VIEWs properly optimized.
VIEW internally JOINS the TWO tables everytime you QUERY a VIEW...!!
To prevent this, create MATERIALIZED VIEW...
It is a view that is more of a TABLE ...You can query it directly as other table..
But you have to write some TRIGGERS to update it automatically, if any underlying TABLE data changes...
See this : http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views
It's rare that doing exactly the same operations in a view will be more efficient than doing it as a query.
The views are more to manage complexity of queries rather than performance, they simply perform the same actions at the back end as the query would have.
One exception to this is materialised query tables which actually create a separate long-lived table for the query so that subsequent queries are more efficient. I have no idea whether MySQL has such a thing, I'm a DB2 man myself :-)
But, you could possibly implement such a scheme yourself if performance of the query is an issue.
It depends greatly on the rate of change of the table. If the data is changing so often that a materialised query would have to be regenerated every time anyway, it won't be worth it.
I have a table where I have around 1.5 million+ results that I need to delete. Previously, I was using a temporary table and this caused the transaction log to increase in size quite quickly. Problem is, once I have done one result set, I need to move onto another where there is another 1.5 million+ results. The performance of this is rather slow and I'm wondering if I should use a table variable rather than writing a table to the temp database.
EDIT
I use the temporary table when I select the initial 1.5million+ records.
Side-stepping the table variable vs. temp table question, you're probably better off batching your deletes into smaller groups inside of a while loop. That's your best bet for keeping the transaction log size reasonable.
Something like:
while (1=1) begin
delete top(1000)
from YourTable
where ...
if ##rowcount < 1000 break
end /* while */
In general, I prefer using table variables over temp tables, if only because they're easier to use. I find few cases where the use of temp tables is warranted. You don't talk about how you're using temp tables in your routines, but I suggest benchmarking the two options.
A table variable is often not suitable for such large resultsets, being more appropriate for small numbers. You'd likely find that the table variable's data would be written to tempdb anyway due to its size.
Personally I have found table variables to be much slower than temporary tables when dealing with large resultsets. In an example mentioned at the end of this article on SQL Server Central using 1 million rows in a table of each time, the query using the temporary table took less than a sixth of the time to complete.
Personally I've found table variables to often suffer performance-wise when I have to join them to real tables in a query.
If the performance is slow it may be at least partly the settings on the database itself. Is it set to automatically grow? What's the recovery model of it?
I'm new to SQL 2008. As I look at the Common Table Expressions (WITH keyword), how is the performance compared to using a temp table. Besides syntax / readability, are there any other benefits to using CTEs?
I have not done exesive measuring, but temp tables are stored in the temp database. CTEs are not, so in most simple cases they should be faster. But in some cases you might create big temp tables and would create indexes on them to speed up further calculations. That's not possible with CTEs. In such cases they are probably slower. But as usually: I don't think that there is a general answer. It always depends on your query and the resulting query plan.