I have a Drupal 6 application that requires more joins than that 61 table join mySQL limit allows. I understand that this is an excessive number, but it is ran only once a day, and the results are cached for further reference.
Are there any mySQL configuration parameters that could be of help, or any other approaches short of changing the logic behind collecting the data?
My approach would be to split the humongous query into smaller, simpler queries, and use temporary tables to store the intermediate steps. I use this approach frequently and it helps me a lot (sometimes it is even faster to create some temp tables than to join all the tables in one big query).
Something like this:
drop table if exists temp_step01;
create temporary table temp_step01
select t1.*, t2.someField
from table1 as t1 inner join table2 as t2 on t1.id = t2.table1_id;
-- Add the appropriate indexes to optimize the subsequent queries
alter table temp_step01
add index idx_1 (field1);
-- Create all the temp tables that you need, and finally show the results
select sXX.*
from temp_stepXX as sXX;
Remember: Temporary tables are visible only to the connection that creates them. If you need to make the result visible to other connections, you'll need to create a "real" table (of course, that is only worth with the last step of your process).
Related
I try to inner join multiple tables (table_A, table_B and table_C) with a table_X. The table_X is selected from anther table (table_Y) using LIKE. table_X takes a very long time to create. How do I do the task efficiently?
Currently, I do the following query for table_A. And repeat the process for table_B and Table_C.
SELECT * FROM
Table_A INNER JOIN
(SELECT ID FROM table_Y where ID LIKE "%keyword%") as table_X
USING (ID)
Since table_X takes a lot of time to create, I would like to select from table_A, table_B and table_C in one query. How do I do it?
Several things to note:
My expected result is three separate tables, not one combined table.
I do not have permission to create a temporary table in the database.
A query returns a result set not a table, and a query can only return single result set.
You will need three separate queries to get your desired results.
If your core goal is to reduce the cost of your table_X subquery, you could create a temporary table to store the results of the table_X subquery, and then join to the that table for your queries with table_A, table_B, and table_C.
Edit, things to keep in mind:
True TEMPORARY tables are only visible by the connection in which they are created, and will be automatically dropped when the connection is closed; but will still be persist if a connection is reused (from a connection pool for a example), so it is still good practice to drop them explicitly. True temporary tables also have limits on how they can be used, most noticeably that they can only be referenced once in any given query (no self joins, or joins to multiple references, or unions that have multiple parts referencing the same table).
Assuming you have the proper permissions, you can create normal tables that you intend to drop when finished; but care must be taken because such tables can generally be seen by all connections and a disconnect will not "clean up" such tables. They can perform better, and do not have the limitations of true temporary tables, but you need to weigh the risks vs the benefits.
If you do not have any create table permissions, most of your data processing is happening client side, and you do not expect enormous results from the costly subquery, you could collect the subquery results first and use them in dynamic construction of the later queries.
very pseudo code:
query: SELECT ID FROM table_Y WHERE [expensive condition(s)];
code: convert ID values received into a comma separated list
query: SELECT [stuff] FROM Table_A WHERE ID IN ([ID values from expensive query]);
query: SELECT [other_stuff] FROM Table_B WHERE ID IN ([ID values from expensive query]);
query: SELECT [more_stuff] FROM Table_C WHERE ID IN ([ID values from expensive query]);
I create a temporary table variable (#table). Then I inner join it with another table. It takes a long time to display result. I try again with #table. It's normal. What's wrong with it?
If you store too much data in temp table or temp table variable then it result in poor performance. Because these temp tables variables not allow indexing and in temp table most of the time developer forget to use proper indexing therefore during join complete table scan happen which slow the query output.
Another important point to notice, avoid joins on varchar column.
I have two tables T1 and T2 and want to update one field of T1 from T2 where T2 holds massive data.
What is more efficient?
Updating T1 in a for loop iteration over the values
or
Left join it with T2 and update.
Please note that i'm updating these tables in a shell script
In general, the JOIN will always work much better than a loop. The size should not be an issue if it is properly indexed.
There is no simple answer which will be more effective, it will depend on table size and data size to which you are going to update in one go.
Suppose you are using innodb engine and trying to update 1,000 or more rows in one go with 2 heavy tables join and it is quite frequent then it will not be good idea on production server as it will lock your table for some time and due to this locking some other operations also can be hit on your production server.
Option1: If you are trying to update few rows and based on proper indexed fields (preferred based on primary key) then you can go with join.
Option2: If you are trying to update a large amount of data based on multiple tables join then below option will be better:
Step1: Create a stored procedure.
Step2: Keep below query results in a cursor.
suppose you want TO UPDATE corresponding field2 DATA of TABLE table2 IN field1 of TABLE table1:
SELECT a.primary_key,b.field2 FROM table1 a JOIN table2 b ON a.primary_key=b.foreign_key WHERE [place CONDITION here IF any...];
Step3: Now update all rows one by one based on primary key using stored values in cursor.
Step4: You can call this stored procedure from your script.
I searched Internet and Stack Overflow for my trouble, but couldn't find a good solution.
I have a table (MySql MyISAM) containing 300,000 rows (one column is blob field).
I must use:
DELETE FROM tablename WHERE id IN (1,4,7,88,568,.......)
There are nearly 30,000 id's in the IN syntax.
It takes nearly 1 hour. Also It does not make the .MYD file smaller although I delete 10% of it, so I run OPTIMIZE TABLE... command. It also lasts long...(I should use it, because disk space matters for me).
What's a way to improve performance when deleting the data as above and recover space? (Increasing buffer size? which one? or else?)
With IN, MySQL will scan all the rows in the table and match the record against the IN clause. The list of IN predicates will be sorted, and all 300,000 rows in the database will get a binary search against 30,000 ids.
If you do this with JOIN on a temporary table (no indexes on a temp table), assuming id is indexed, the database will do 30,000 binary lookups on a 300,000 record index.
So, 300,000 binary searches against 30,000 records, or 30,000 binary searches against 300,000 records... which is faster? The second one is faster, by far.
Also, delaying the index rebuilding with DELETE QUICK will result in much faster deletes. All records will simply be marked deleted, both in the data file and in the index, and the index will not be rebuilt.
Then, to recover space and rebuild the indexes at a later time, run OPTIMIZE TABLE.
The size of the list in your IN() statement may be the cause. You could add the IDs to a temporary table and join to do the deletes. Also, as you are using MyISAM you can use the DELETE QUICK option to avoid the index hit whilst deleting:
For MyISAM tables, if you use the QUICK keyword, the storage engine
does not merge index leaves during delete, which may speed up some
kinds of delete operations.
I think the best approach to make it faster is to create a new table and insert into it the rows which you dont want to delete and then drop the original table and then you can copy the content from the table to the main table.
Something like this:
INSERT INTO NewTable SELECT * FROM My_Table WHERE ... ;
Then you can use RENAME TABLE to rename the copy to the original name
RENAME TABLE My_Table TO My_Table_old, NewTable TO My_Table ;
And then finally drop the original table
DROP TABLE My_Table_old;
try this
create a table name temptable with a single column id
insert into table 1,4,7,88,568,......
use delete join something like
DELETE ab, b FROM originaltable AS a INNER JOIN temptable AS b ON a.id= b.id where b.id is null;
its just an idea . the query is not tested . you can check the syntax on google.
I have a large table with what are essentially log entries. For most of my queries, I need a table with the most recent entries, so I created a 'view' from the following query:
SELECT t1.store_id, t1.code_id, t1.working, t1.expiration, t1.details, t1.price
FROM code_stores t1
LEFT OUTER JOIN code_stores t2
ON (t1.store_id = t2.store_id AND t1.code_id = t2.code_id AND t1.id < t2.id)
WHERE t2.store_id IS NULL
Then I use this 'view' in my query. Unfortunately this is leading to slow queries, so I'd like to cache the results of this view somehow. Since this view will only change a few times a day (when I can run a query to update the cache table), I would like to create a temporary table that stores the results of this view, and update this throughout the day.
How do I go about doing this? I read about materialized views, but it appears that they don't work with mysql. More help would be greatly appreciated.
With the idea of using a temporary table, I suggest using trigger so that temporary table is updated each time the code_store table changes.