I have two tables with the following structure and example content. Table one has the membership_no set to the correct values, but table two has some incorrect values in the membership_no column. I am needing to query both tables and check to see when the membership_no values are not equal, then update table two's membership_no column with the value from table one.
Table One:
id membership_no
====================
800960 800960
800965 800965
Table Two:
id membership_no
====================
800960 800970
800965 800975
Update query so far. It is not catching all of the incorrect values from table two.
UPDATE
tabletwo
INNER JOIN
tableone ON tabletwo.id = tableone.id
SET
tabletwo.membership_no = tableone.membership_no;
EDIT: Including SHOW CREATE and SELECT queries for unmatched membership_no column values.
Table One SHOW:
CREATE TABLE `n2z7m3_kiduka_accounts_j15` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`membership_no` int(11) NOT NULL,
...
`membershipyear` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=800987 DEFAULT CHARSET=utf8
Table Two SHOW:
CREATE TABLE `n2z7m3_kiduka_accounts` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`membership_no` int(11) NOT NULL,
...
`membershipyear` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `user_id` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=801072 DEFAULT CHARSET=utf8
SELECT query for unmatched membership_no column values:
SELECT
u.name,
a.membership_no as 'Joomla 1.5 accounts table',
j.membership_no as 'Joomla 3.0 accounts table'
FROM
n2z7m3_kiduka_accounts_j15 AS a
INNER JOIN n2z7m3_users AS u ON a.user_id = u.id
INNER JOIN n2z7m3_kiduka_accounts AS j ON a.user_id = j.membership_no
and a.membership_no != j.membership_no
ORDER BY u.name;
While Tim's Answer is perfectly valid, another variation is to add the filter qualifier to the ON clause such that:
UPDATE tabletwo
INNER JOIN
tableone ON tabletwo.id = tableone.id AND tabletwo.membership_no <> tableone.membership_no
SET
tabletwo.membership_no = tableone.membership_no;
This means that you don't have the WHERE filter so it will process all rows, but will act on only those with differing membership_no values. Because it is an INNER JOIN the results will be both tables or no tables (Skipped/NULL result).
EDIT:
If you suspect you have a problem still, what does the MySQL command respond, do you have a specific error notice? With 80k columns, it may take a while for the comand to actually process , so are you giving the command time to complete or is PHP or the system causing the command to abort due to execution time expiry? (Update your execution time on PHP and MySQL and rerun query just to see if that causes it to complete successfully?)
Suggestion
As another sggestion I think your UNIQUE KEY should also be your AI key so for both tables:
DROP INDEX `user_id` ON <table> #removes the current unique index.
then
CREATE UNIQUE INDEX `id` ON <table> #addes unique index to the A_I column.
You just need to add a WHERE clause:
UPDATE
tabletwo
INNER JOIN
tableone
ON tabletwo.id = tableone.id
SET
tabletwo.membership_no = tableone.membership_no
WHERE tabletwo.membership_no <> tableone.membership_no
Related
I am trying to update one table based on another in the most efficient way.
Here is the table DDL of what I am trying to update
Table1
CREATE TABLE `customersPrimary` (
`id` int NOT NULL AUTO_INCREMENT,
`groupID` int NOT NULL,
`IDInGroup` int NOT NULL,
`name` varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`address` varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `groupID-IDInGroup` (`groupID`,`IDInGroup`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Table2
CREATE TABLE `customersSecondary` (
`groupID` int NOT NULL,
`IDInGroup` int NOT NULL,
`name` varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`address` varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (`groupID`,`IDInGroup`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Both the tables are practically identical but customersSecondary table is a staging table for the other by design. The big difference is primary keys. Table 1 has an auto incrementing primary key, table 2 has a composite primary key.
In both tables the combination of groupID and IDInGroup are unique.
Here is the query I want to optimize
UPDATE customersPrimary
INNER JOIN customersSecondary ON
(customersPrimary.groupID = customersSecondary.groupID
AND customersPrimary.IDInGroup = customersSecondary.IDInGroup)
SET
customersPrimary.name = customersSecondary.name,
customersPrimary.address = customersSecondary.address
This query works but scans EVERY row in customersSecondary.
Adding
WHERE customersPrimary.groupID = (groupID)
Cuts it down significantly to the number of rows with the GroupID in customersSecondary. But this is still often far larger than the number of rows being updated since the groupID can be large. I think the WHERE needs improvement.
I can control table structure and add indexes. I will have to keep both tables.
Any suggestions would be helpful.
Your existing query requires a full table scan because you are saying update everything on the left based on the value on the right. Presumably the optimiser is choosing customersSecondary because it has fewer rows, or at least it thinks it has.
Is the full table scan causing you problems? Locking? Too slow? How long does it take? How frequently are the tables synced? How many records are there in each table? What is the rate of change in each of the tables?
You could add separate indices on name and address but that will take a good chunk of space. The better option is going to be to add an indexed updatedAt column and use that to track which records have been changed.
ALTER TABLE `customersPrimary`
ADD COLUMN `updatedAt` DATETIME NOT NULL DEFAULT '2000-01-01 00:00:00',
ADD INDEX `idx_customer_primary_updated` (`updatedAt`);
ALTER TABLE `customersSecondary`
ADD COLUMN `updatedAt` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
ADD INDEX `idx_customer_secondary_updated` (`updatedAt`);
And then you can add updatedAt to your join criteria and the WHERE clause -
UPDATE customersPrimary cp
INNER JOIN customersSecondary cs
ON cp.groupID = cs.groupID
AND cp.IDInGroup = cs.IDInGroup
AND cp.updatedAt < cs.updatedAt
SET
cp.name = cs.name,
cp.address = cs.address,
cp.updatedAt = cs.updatedAt
WHERE cs.updatedAt > :last_query_run_time;
For :last_query_run_time you could use the last run time if you are storing it. Otherwise, if you know you are running the query every hour you could use NOW() - INTERVAL 65 MINUTE. Notice I have used more than one hour to make sure records aren't missed if there is a slight delay for some reason. Another option would be to use SELECT MAX(updatedAt) FROM customersPrimary -
UPDATE customersPrimary cp
INNER JOIN (SELECT MAX(updatedAt) maxUpdatedAt FROM customersPrimary) t
INNER JOIN customersSecondary cs
ON cp.groupID = cs.groupID
AND cp.IDInGroup = cs.IDInGroup
AND cp.updatedAt < cs.updatedAt
SET
cp.name = cs.name,
cp.address = cs.address,
cp.updatedAt = cs.updatedAt
WHERE cs.updatedAt > t.maxUpdatedAt;
Plan A:
Something like this would first find the "new" rows, then add only those:
UPDATE primary
SET ...
JOIN ( SELECT ...
FROM secondary
LEFT JOIN primary
WHERE primary... IS NULL )
ON ...
Might secondary have changes? If so, a variant of that would work.
Plan B:
Better yet is to TRUNCATE TABLE secondary after it is folded into primary.
I have a MySQL table (TABLE1) with 400 thousand records
CREATE TABLE `TABLE1` (
`ID` bigint(20) NOT NULL AUTO_INCREMENT,
`NAME` varchar(255) NOT NULL,
`VALUE` varchar(255) NOT NULL,
`UID` varchar(255) NOT NULL,
`USER_ID` varchar(255) DEFAULT NULL,
PRIMARY KEY (`ID`),
UNIQUE KEY `ukey1` (`VALUE`,`NAME`,`UID`),
UNIQUE KEY `ukey2` (`UID`,`NAME`,`VALUE`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `TABLE2` (
`ID` bigint(20) NOT NULL AUTO_INCREMENT,
`UID` varchar(255) DEFAULT NULL,
`TABLE3ID` bigint(20) NOT NULL
PRIMARY KEY (`ID`),
KEY `FKEY` (`TABLE3ID`),
CONSTRAINT `FKEY` FOREIGN KEY (`TABLE3ID`) REFERENCES `TABLE3` (`ID`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `TABLE3` (
`ID` bigint(20) NOT NULL AUTO_INCREMENT,
`TYPEID` bigint(20) NOT NULL,
PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
The following query is very slow and takes hours and finally fails
delete from TABLE1 t1
inner join TABLE2 t2 on t1.UID=t2.UID
inner join TABLE3 t3 on t2.TABLE3ID=t3.ID
where t3.TYPEID in (234,3434) t1.USER_ID is not null and t1.USER_ID <> '12345';
Visual explain shows the following and adding index on UID not helping. How to optimize the performance of this query?
I tried adding an index on TABLE1.UID
Converting into a subquery
A simple query like SELECT * FROM TABLE3 where UID="SOMEUID" takes 800+ ms to fetch data
Change it to a JOIN.
DELETE t1
FROM TABLE1 AS t1
JOIN (SELECT uid FROM ...) AS t2 ON t1.uid = t2.uid
WHERE USER_ID is not null and USER_ID <> '12345';
I've found that MySQL implements WHERE uid IN (subquery) very poorly sometimes. Instead of getting all the results of the subquery and looking them up in the index of the table, it scans the table and performs the subquery for each row, then checks if the uid is in that result.
First of all make a backup of that table this is the first rule for doing a delete queries or you can ruin it and take all the precautions that you considere before
( uid1,uid2,...uid45000)
What is the meaning of those values between the parenthesis ? Are you need to compare in the list all the UID values or some of them?
beacause you can avoiding put all the UIDS manually like this.
delete from TABLE1 where UID in (SELECT T.UID FROM TABLE1 as T where T.UID is not NULL and USER_ID <> '12345');
Before to doing this please check what do you want between the parenthesis and run the command in a TEST environment first with dummy values
Take in consideration that you have in the table varchars types in the UIDS field and thats the reason that this operation take a lot of time more than if you are using integer values
The other way is that you need to create a new table and put the data that you need to store for the old table, next truncate the original table and reinsert the same values of the new table to the old table again
Please before to run a solution check all your restrictions with your teamates and make a test with dummy values
I would split your uid filter list in chunks (100 by chunk or other, need to test) and iterate or multithread over it
I have two tables, users and points. Currently users has 84,263 rows, while points has 1,636,119 rows. Each user can have 0 or multiple points and I need to extract which point was created last.
show create table users
CREATE TABLE `users` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`email` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`password` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`remember_token` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL,
`role` varchar(15) COLLATE utf8_unicode_ci DEFAULT 'consument',
`created_at` timestamp NOT NULL DEFAULT current_timestamp(),
`updated_at` timestamp NOT NULL DEFAULT current_timestamp(),
`deleted_at` timestamp NULL DEFAULT NULL,
`email_verified_at` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`email_verify_token` text COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `users_email_unique` (`email`)
) ENGINE=InnoDB AUTO_INCREMENT=84345 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
show create table points
CREATE TABLE `points` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(10) unsigned NOT NULL,
`tablet_id` int(10) unsigned DEFAULT NULL,
`parent_company` int(10) unsigned NOT NULL,
`company_id` int(10) unsigned NOT NULL,
`points` int(10) unsigned NOT NULL,
`mutation_type` tinyint(3) unsigned NOT NULL,
`created_at` timestamp NOT NULL DEFAULT current_timestamp(),
`updated_at` timestamp NOT NULL DEFAULT current_timestamp(),
PRIMARY KEY (`id`),
KEY `points_user_id_foreign` (`user_id`),
KEY `points_company_id_foreign` (`company_id`),
KEY `points_parent_company_index` (`parent_company`),
KEY `points_tablet_id_index` (`tablet_id`),
KEY `points_mutation_type_company_id_created_at_index` (`mutation_type`,`company_id`,`created_at`),
KEY `created_at_user_id` (`created_at`,`user_id`),
CONSTRAINT `points_company_id_foreign` FOREIGN KEY (`company_id`) REFERENCES `companies` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `points_parent_company_foreign` FOREIGN KEY (`parent_company`) REFERENCES `parent_company` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
CONSTRAINT `points_tablet_id_foreign` FOREIGN KEY (`tablet_id`) REFERENCES `tablets` (`id`) ON DELETE SET NULL ON UPDATE CASCADE,
CONSTRAINT `points_user_id_foreign` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1798627 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Queries I tried, but are taking too long (we're talking in minutes, not seconds):
select
`users`.`id`,
`users`.`email`,
`users`.`role`,
`users`.`created_at`,
`users`.`updated_at`,
max(pt.created_at) as `last_transaction`
from `users`
left join points as pt on pt.user_id = users.id
where `users`.`role` = 'consument' and `users`.`deleted_at` is null
group by users.id
select
`users`.`id`,
`users`.`email`,
`users`.`role`,
`users`.`created_at`,
`users`.`updated_at`,
pt.created_at as `last_transaction`
from `users`
left join (select points.user_id, points.created_at from points order by points.created_at desc) as pt on pt.user_id = users.id
where `users`.`role` = 'consument' and `users`.`deleted_at` is null
group by users.id
Why am I not limiting the results and returning only 100 at a time? Because I am using Yajra DataTables for Laravel and when limiting results, it only returns limited results and it does not recognize that there are more. So instead of 84,263 rows, I only get 100 rows and that's it.
Basically your "users" table has a "role" column. It is not indexed. So your queries are doing full table scan on "users" table which has 84263 rows. One way to to optimize it would be to have an index on "role" column. But I can see "consument" is the default value & you are querying by that value. Now suppose 95% of users are having "consument" role. Then even adding index on "role" won't help much. You would have to add more condition to filter out the query & have an index for that condition.
Your first query is better as it would avoid unnecessary inner query of second one.
If you need to return 84263 rows, then that is a sperate issue. Somehow you would have to introduce pagination. You would have to break your queries to multiple queries. Suppose in each call you return 500 users data. You can sort it by id. And in subsequent call, you can ask for next 500 where id is greater than the last id returned in the previous query (for the very first call last id value would be 0). Then the queries can use "id" as index.
You can check the query plan using "explain" keyword & can have better understanding.
Edit
I tried with adding an index on role on users table with 1000 users ans 50000 points, your first query took ~4seconds which is way too long.
So I tried this query which took ~0.5 second, still too long :
select
`users`.`id`,
`users`.`email`,
`users`.`role`,
`users`.`created_at`,
`users`.`updated_at`,
pt.created_at as `last_transaction`
from `users`
left join points pt on pt.id = (select pt2.id from points pt2 WHERE pt2.user_id = users.id ORDER BY pt2.created_at DESC limit 1)
where `users`.`role` = 'consument' and `users`.`deleted_at` is null
So I added an index on points.created_at and now query took 0.05 second, which is more acceptable
It looks like you want a result set with some columns from your users table, and the most recent created_at value from the points table for each user.
So-called compound covering indexes usually help speed these sorts of queries. So, let's start with what you need from points. This subquery gets it.
SELECT user_id, MAX(created_at) last_transaction
FROM points
GROUP BY user_id
This gives you a virtual table with each user_id and the created_at value you want. The following index
CREATE INDEX points_maxcreated ON points (user_id, created_at DESCENDING);
will let MySQL satisfy the subquery with an almost miraculously fast loose index scan.
Then, let's consider the rest of your query.
select
`users`.`id`,
`users`.`email`,
`users`.`role`,
`users`.`created_at`,
`users`.`updated_at`
from `users`
where `users`.`role` = 'consument' and `users`.`deleted_at` is null
For this you want the following index
CREATE INDEX users_del_role_etc
ON users
(deleted_at, role, id, email, created_at, updated_at);
MySQL can satisfy your query directly from this index. Think of these indexes as being stored in order. MySQL random accesses the index to the first eligible row (null deleted_at, role = 'consument') and then reads the index, not the table, row by row to get the data you want.
Putting it all together, you get
select
`users`.`id`,
`users`.`email`,
`users`.`role`,
`users`.`created_at`,
`users`.`updated_at`,
`subquery`.`last_transaction`
from `users`
left join (
SELECT user_id, MAX(created_at) last_transaction
FROM points
GROUP BY user_id
) subquery ON users.id = subquery.user_id
where `users`.`role` = 'consument' and `users`.`deleted_at` is null
This should be reasonably speedy for the query you gave us. Nevertheless, a query that you expect to return tens of thousands of rows you also should expect to take some time. There's no magic that makes SQL handle very large result sets fast. It's designed to retrieve small result sets fast from vast tables.
With respect, your understanding of how to paginate rows from your result set isn't quite right. It's hard to believe your user will actually examine tens of thousands of rows. Without an ORDER BY operation in your query, LIMIT is a very inexpensive operation. If you need ORDER BY ... LIMIT to paginate your results, ask another question, because that performance can also be managed.
There are some similar questions, but none of them matches my case.
SQL Optimization - Join different tables based on column value
How to JOIN on different tables based on column value
MySQL query to JOIN tables based on column values
MySQL: Use CASE/ELSE value as join parameter
MySQL query where JOIN depends on CASE
https://dba.stackexchange.com/questions/53301/mysql-getting-result-using-3-tables-and-case-statements
I have notifications table with this structure
CREATE TABLE `notifications` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`notificaiton_type_id` int(11) DEFAULT NULL,
`table1_id` int(11) DEFAULT NULL,
`table2_id` int(11) DEFAULT NULL,
`table3_id` int(11) DEFAULT NULL,
`table4_id` int(11) DEFAULT NULL,
`table5_id` int(11) DEFAULT NULL,
`user_id` int(11) DEFAULT NULL,
`created` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `userIdIndex` (`user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=17 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
and 5 tables, from table1 to table5, with these structure (others are the same: I set this for testing, not sure if it matters, but those tables (1 to 5) in addition to posted fields have other fields as well, just they do not participate in the query, so for simplicity I just skipped them)
CREATE TABLE `table1` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(300) COLLATE utf8_bin DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=34 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
table*_id is foreign key for tables: table1 - table5 with one to many relationship.
I should select notifications based on user_id. Based on notification type, appropriate table*_id has some value, other foreign_keys are null(btw there are notification types that 2 or even 3 table *_id's can be different from null ). The initial thought was to have a query that would join only those tables, if the foreign key has some value different from null via using CASE, WHEN, but as I learnt from the answer of this question,
MySQL query where JOIN depends on CASE
it can not be used in this case.
Tables table1-table5 are going to be relatively big, having kinda millions or dozens of millions records. So I would not prefer to join extra 2-4 tables if foreign keys are null. Also, I do not think it is any better to separate the query into 2 main parts, like - first getting the notifications and then in a loop find associated tables' values.
So, the point is to only join those tables that table*_id is not null if it can be done in mysql.
The main question is what would be the most efficient way to achieve this - get notification info with its related tables data.
general query with joins to all tables is a usual left join, smth like this
EXPLAIN SELECT
n.`id`,
n.`user_id`,
n.`table1_id`,
n.`table2_id`,
n.`table3_id`,
n.`table4_id`,
n.`table5_id`
// other fields
FROM
notifications AS n
LEFT JOIN table1 AS t1
ON t1.`id` = n.`table1_id`
LEFT JOIN table2 AS t2
ON t2.`id` = n.`table2_id`
LEFT JOIN table3 AS t3
ON t3.`id` = n.`table3_id`
LEFT JOIN table4 AS t4
ON t4.`id` = n.`table4_id`
LEFT JOIN table5 AS t5
ON t5.`id` = n.`table5_id`
WHERE user_id = 5
here is sql fiddle with data
http://sqlfiddle.com/#!2/3bf8f/1/0
Thanks
I think you are worrying over nothing. MySQL will handle your query, as it is, without any more effort from you.
You state:
I would not prefer to join extra 2-4 tables if foreign keys are null.
Good news: MySQL won't.
It will see that the key is null in the notifications table, see that there are no records in the corresponding table you are joining to, and then just move on. I'm not even sure what you imagine it may be trying to do that you are trying to optimize away, but your query is already optimized as it is.
If you are already running this query and have performance problems, you issue is likely elsewhere. Please provide more information in that case. In particular, your // other fields line may actually affect things more than you think, depending on where those other fields are located.
Would it not make more sense to use a single ID as the foreign key then a column for which table to query:
CREATE TABLE `notifications` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`notification_type_id` int(11) DEFAULT NULL,
`table_id` int(11) DEFAULT NULL,
`table_name` VARCHAR(10) DEFAULT NULL
...
Then you can select which table to query for the actual data you need.
SELECT `table_id`,`table_name` FROM `notifications`;
SELECT * FROM #table_name WHERE `id`=#table_id;
No expensive LEFT JOINs are necessary in this scenario and two queries (or a compound query as a stored procedure) would negate the need for a large index on the foreign key and so simplify the construct. It also has the advantage of being scalable, for example what if you needed a 6th, 7th or 100th partition table?
Why not use VIEW for this Left join query?
Here's something more about View's performance: Is a view faster than a simple query?
Assuming that your query works fine, you could create view from it:
CREATE VIEW view_myView AS
SELECT
n.`id`,
n.`user_id`,
n.`table1_id`,
n.`table2_id`,
n.`table3_id`,
n.`table4_id`,
n.`table5_id`
FROM
notifications AS n
LEFT JOIN table1 AS t1
ON t1.`id` = n.`table1_id`
LEFT JOIN table2 AS t2
ON t2.`id` = n.`table2_id`
LEFT JOIN table3 AS t3
ON t3.`id` = n.`table3_id`
LEFT JOIN table4 AS t4
ON t4.`id` = n.`table4_id`
LEFT JOIN table5 AS t5
ON t5.`id` = n.`table5_id`
WHERE user_id = 5
Then you access the data from this view simply by:
SELECT * FROM view_myView;
and it should be faster than calling the query everytime.
It's also much shorter to write as you see.
How to perform MINUS in MySQL (5.5) between two SQL queries. I am not good with SQL.
What I want is that only fetch the keys which do not exist in second query.
I am trying to use LEFT JOIN to accomplish this. Can I use something like this:
SELECT target.component_id, target.asset_code FROM
(SELECT comp_files.* FROM stageSchema.components comp
INNER JOIN productionSchema.component_files comp_files
ON comp.id = comp_files.component_id) target
LEFT JOIN
(SELECT comp_files.* FROM stageSchema.components comp
INNER JOIN stageSchema.component_files comp_files
ON comp.id = comp_files.component_id) stage
ON target.component_id = stage.component_id
WHERE stage.component_id IS NULL;
I know the above query is not proper. Any ideas how to do this ?
The structure for "component_files" table is :
CREATE TABLE `component_files` (
`component_id` int(11) NOT NULL,
`asset_code` varchar(3) NOT NULL,
`file_name` varchar(40) NOT NULL,
`file_size` int(11) NOT NULL,
`video_size` varchar(10) DEFAULT NULL,
`bit_rate` varchar(10) NOT NULL,
`last_updated_date` datetime NOT NULL,
UNIQUE KEY `cf_uk` (`component_id`,`asset_code`) USING BTREE,
CONSTRAINT `cf_comp_fk` FOREIGN KEY (`component_id`) REFERENCES `components` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Here are the results of EXPLAIN
Without WHERE clause :
With WHERE clause :
Your query looks correct. The reasons for the inefficiency could be missing indexes in the 2 tables that you haven't provided their design and MySQL optimizer not producing efficient plan because of the derived tables.
Try to write the query without derived tables, with joins only:
SELECT
target.component_id, target.asset_code
FROM
stageSchema.components comp
INNER JOIN
productionSchema.component_files target
ON comp.id = target.component_id
LEFT JOIN
stageSchema.component_files stage
ON comp.id = stage.component_id
WHERE
stage.component_id IS NULL;