I try to inner join multiple tables (table_A, table_B and table_C) with a table_X. The table_X is selected from anther table (table_Y) using LIKE. table_X takes a very long time to create. How do I do the task efficiently?
Currently, I do the following query for table_A. And repeat the process for table_B and Table_C.
SELECT * FROM
Table_A INNER JOIN
(SELECT ID FROM table_Y where ID LIKE "%keyword%") as table_X
USING (ID)
Since table_X takes a lot of time to create, I would like to select from table_A, table_B and table_C in one query. How do I do it?
Several things to note:
My expected result is three separate tables, not one combined table.
I do not have permission to create a temporary table in the database.
A query returns a result set not a table, and a query can only return single result set.
You will need three separate queries to get your desired results.
If your core goal is to reduce the cost of your table_X subquery, you could create a temporary table to store the results of the table_X subquery, and then join to the that table for your queries with table_A, table_B, and table_C.
Edit, things to keep in mind:
True TEMPORARY tables are only visible by the connection in which they are created, and will be automatically dropped when the connection is closed; but will still be persist if a connection is reused (from a connection pool for a example), so it is still good practice to drop them explicitly. True temporary tables also have limits on how they can be used, most noticeably that they can only be referenced once in any given query (no self joins, or joins to multiple references, or unions that have multiple parts referencing the same table).
Assuming you have the proper permissions, you can create normal tables that you intend to drop when finished; but care must be taken because such tables can generally be seen by all connections and a disconnect will not "clean up" such tables. They can perform better, and do not have the limitations of true temporary tables, but you need to weigh the risks vs the benefits.
If you do not have any create table permissions, most of your data processing is happening client side, and you do not expect enormous results from the costly subquery, you could collect the subquery results first and use them in dynamic construction of the later queries.
very pseudo code:
query: SELECT ID FROM table_Y WHERE [expensive condition(s)];
code: convert ID values received into a comma separated list
query: SELECT [stuff] FROM Table_A WHERE ID IN ([ID values from expensive query]);
query: SELECT [other_stuff] FROM Table_B WHERE ID IN ([ID values from expensive query]);
query: SELECT [more_stuff] FROM Table_C WHERE ID IN ([ID values from expensive query]);
Related
I have a use case where I have a customer table which joins with a couple of other tables. It’s a many to one relationship so joining with these two tables will result in over 100+ rows for each customer. The SELECT statement with over 10000 customers doesn’t perform well.
I was wondering if there is a way to loop over each customer, append the result of the SELECT statement and then provide the output in a MySQL stored procedure. This can be done using a CURSOR and a temporary table but is there a way to do it without a temporary table?
I have a MySQL database with just 1 table:
Fields are: blocknr (not unique), btcaddress (not unique), txid (not unique), vin, vinvoutnr, netvalue.
Indexes exist on both btcaddress and txid.
Data in it looks like this:
I need to delete all "deletable" record pairs. An example is given in red.
Conditions are:
txid must be the same (there can be more than 2 records with same txid)
vinvoutnr must be the same
vin must be different (can have only 2 values 0 and 1, so 1 must be 0 other must be 1)
In a table of 36M records, about 33M records will be deleted.
I've used this:
delete t1
from registration t1
inner join registration t2
where t1.txid=t2.txid and t1.vinvoutnr=t2.vinvoutnr and t1.vin<>t2.vin;
It works but takes 5 hours.
Maybe this would work too (not tested yet):
delete t1
from registration as t1, registration as t2
where t1.txid=t2.txid and t1.vinvoutnr=t2.vinvoutnr and t1.vin<>t2.vin;
Or do I forget about a delete query and try to make a new table with all non-delatables in and then drop the original ?
Database can be offline for this delete query.
Based on your question, you are deleting most of the rows in the table. That is just really expensive. A better approach is to empty the table and re-populate it:
create table temp_registration as
<query for the rows to keep here>;
truncate table registration;
insert into registration
select *
from temp_registration;
Your logic is a bit hard to follow, but I think the logic on the rows to keep is:
select r.*
from registration r
where not exists (select 1
from registration r2
where r2.txid = r.txid and
r2.vinvoutnr = r.vinvoutnr and
r2.vin <> r.vin
);
For best performance, you want an index on registration(txid, vinvoutnr, vin).
Given that you expect to remove the majority of your data it does sound like the simplest approach would be to create a new table with the correct data and then drop the original table as you suggest. Otherwise ADyson's corrections to the JOIN query might help to alleviate the performance issue.
I have two tables T1 and T2 and want to update one field of T1 from T2 where T2 holds massive data.
What is more efficient?
Updating T1 in a for loop iteration over the values
or
Left join it with T2 and update.
Please note that i'm updating these tables in a shell script
In general, the JOIN will always work much better than a loop. The size should not be an issue if it is properly indexed.
There is no simple answer which will be more effective, it will depend on table size and data size to which you are going to update in one go.
Suppose you are using innodb engine and trying to update 1,000 or more rows in one go with 2 heavy tables join and it is quite frequent then it will not be good idea on production server as it will lock your table for some time and due to this locking some other operations also can be hit on your production server.
Option1: If you are trying to update few rows and based on proper indexed fields (preferred based on primary key) then you can go with join.
Option2: If you are trying to update a large amount of data based on multiple tables join then below option will be better:
Step1: Create a stored procedure.
Step2: Keep below query results in a cursor.
suppose you want TO UPDATE corresponding field2 DATA of TABLE table2 IN field1 of TABLE table1:
SELECT a.primary_key,b.field2 FROM table1 a JOIN table2 b ON a.primary_key=b.foreign_key WHERE [place CONDITION here IF any...];
Step3: Now update all rows one by one based on primary key using stored values in cursor.
Step4: You can call this stored procedure from your script.
I have a Drupal 6 application that requires more joins than that 61 table join mySQL limit allows. I understand that this is an excessive number, but it is ran only once a day, and the results are cached for further reference.
Are there any mySQL configuration parameters that could be of help, or any other approaches short of changing the logic behind collecting the data?
My approach would be to split the humongous query into smaller, simpler queries, and use temporary tables to store the intermediate steps. I use this approach frequently and it helps me a lot (sometimes it is even faster to create some temp tables than to join all the tables in one big query).
Something like this:
drop table if exists temp_step01;
create temporary table temp_step01
select t1.*, t2.someField
from table1 as t1 inner join table2 as t2 on t1.id = t2.table1_id;
-- Add the appropriate indexes to optimize the subsequent queries
alter table temp_step01
add index idx_1 (field1);
-- Create all the temp tables that you need, and finally show the results
select sXX.*
from temp_stepXX as sXX;
Remember: Temporary tables are visible only to the connection that creates them. If you need to make the result visible to other connections, you'll need to create a "real" table (of course, that is only worth with the last step of your process).
I have a left join query that shows all the fields from a primary table (tblMarkers) and the values from a second table (tblLocations) where there is matching record.
tblLocations does not have a record for every id in tblMarkers
$query ="SELECT `tblMarkers`.*,`tblLocation`.*,`tblLocation`.`ID` AS `markerID`
FROM
`tblMarkers`
LEFT JOIN `tblLocation` ON `tblMarkers`.`ID` = `tblLocation`.`ID`
WHERE
`tblMarkers`.`ID` = $id";
I am comfortable with using UPDATE to update the tblMarkers fields but how do I update or INSERT a record into tblLocations if the record does not exist yet in tblLocations.
Also, how do I lock the record I ma working on to prevent someone else from doing an update at the same time?
Can I also use UPDATE tblMarkers * or do I have to list every field in the UPDATE statement?
Unfortunately you might have to implement some validation in your outside script. There is an IF statement in SQL, but I'm not sure if you can trigger different commands based on it's outcome.
Locking
In terms of locking, you have 2 options. for MyISAM tables, you can only lock the entire table using http://dev.mysql.com/doc/refman/5.0/en/lock-tables.html
LOCK TABLE users;
For InnoDB tables, there is no explicit 'lock' for single rows, however you can use transactions, to get exclusive rights during the operation. http://dev.mysql.com/doc/refman/5.0/en/innodb-locks-set.html
Update
There might be some shorthand notation, but I think you have to list every field in your query. Alternatively, you can always read the entire row, delete it and insert again using shorthand INSERT query. It all depends on how many fields you've got.