SQL UPDATE, WHERE conditions limitation (index stop work) - mysql

Index stop working if i have some multiple OR pairs like ((ID = 5 AND TEST_DATE = '2019-01-17 05:56:19.0'))
SQL Where clauses has limitations? SQL optimizer decide to use full scan?
Database query settings restrictions?
I can split sql to small peaces.
EXPLAIN
UPDATE TEST_TABLE
SET MY_FLAG=1
WHERE (ID = 1 AND TEST_DATE = '2019-01-15 01:24:01.0') ||
(ID = 2 AND TEST_DATE = '2019-01-15 02:14:02.0') ||
(ID = 3 AND TEST_DATE = '2019-01-16 03:32:08.0') ||
(ID = 4 AND TEST_DATE = '2019-01-16 04:45:19.0') ||
(ID = 5 AND TEST_DATE = '2019-01-17 05:56:19.0')
Explain results1: OR pairs > 200
(1, 'SIMPLE', 'TEST_TABLE', 'range', 'PRIMARY,test_date_index', 'PRIMARY', '8', NULL, 316, 'Using where');
Explain results2: OR pairs > 300
(1, 'SIMPLE', 'TEST_TABLE', 'index', NULL, 'PRIMARY', '8', NULL, 51425278, 'Using where');
Table strucute:
CREATE TABLE `TEST_TABLE` (
`ID` BIGINT(20) NOT NULL AUTO_INCREMENT,
`ATTR_ADDRESS` VARCHAR(255) NULL DEFAULT NULL,
`ATTR_CITY` VARCHAR(40) NULL DEFAULT NULL,
`ATTR_COUNTRY` VARCHAR(40) NULL DEFAULT NULL,
`TEST_DATE` DATETIME NOT NULL,
`MY_FLAG` BIT(1) NOT NULL DEFAULT b'0',
PRIMARY KEY (`ID`),
INDEX `test_date_index` (`TEST_DATE`),
INDEX `MY_FLAG` (`MY_FLAG`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;

The Optimizer does not have sufficient information to decide which way is better for performing the query. If you have found that, say, 200 is a safe limit for "faster" execution, then chunk it. That is, do only 200 rows at a time.
Alternatively, you could try putting the ID, date pairs in another table, then do a "Multi-table UPDATE". It may run faster. Include the "composite" INDEX(test_date, id) in test_table.

Related

SQL Views for single row (with where clause)

I'm working on a system that administrates courses, with multiple classes, with multiple lessons and multiple consumers on them. As the system grows more data were required so with some performance issues I've decided to go with SQL Views. We're using MySQL.
So I've replaced old calls to the DB (for example for the single lesson)
select * from `courses_classes_lessons` where `courses_classes_lessons`.`deleted_at` is null limit 1;
select count(consumer_id) as consumers_count from `courses_classes_lessons_consumers` where `lesson_id` = '448' limit 1;
select `max_consumers` from `courses_classes` where `id` = '65' limit 1;
select `id` from `courses_classes_lessons` left join `courses_classes_lessons_consumers` on `courses_classes_lessons_consumers`.`lesson_id` = `courses_classes_lessons`.`id` where `id` = '448' group by `courses_classes_lessons`.`id` having count(courses_classes_lessons_consumers.consumer_id) < '4' limit 1;
select courses_classes.max_consumers - LEAST(count(courses_classes_lessons_consumers.consumer_id), courses_classes.max_consumers) as available_spaces from `courses_classes_lessons` left join `courses_classes_lessons_consumers` on `courses_classes_lessons_consumers`.`lesson_id` = `courses_classes_lessons`.`id` left join `courses_classes` on `courses_classes_lessons`.`class_id` = `courses_classes`.`id` where `courses_classes_lessons`.`id` = '448' group by `courses_classes`.`id` limit 1;
The above took around 4-5ms
with the SQL View as follow:
CREATE OR REPLACE VIEW `courses_classes_lessons_view` AS
SELECT
courses_classes_lessons.id AS lesson_id,
(SELECT
max_consumers
FROM
courses_classes
WHERE
id = courses_classes_lessons.class_id
LIMIT 1) AS class_max_consumers,
(SELECT
count(consumer_id)
FROM
courses_classes_lessons_consumers
WHERE
lesson_id = courses_classes_lessons.id) AS consumers_count,
(SELECT
CASE WHEN consumers_count >= class_max_consumers THEN
TRUE
ELSE
FALSE
END AS is_full) AS is_full,
(CASE WHEN courses_classes_lessons.completed_at > NOW() THEN
'completed'
WHEN courses_classes_lessons.cancelled_at > NOW() THEN
'cancelled'
WHEN courses_classes_lessons.starts_at > NOW() THEN
'upcoming'
ELSE
'incomplete'
END) AS status,
(SELECT
class_max_consumers - LEAST(consumers_count, class_max_consumers)) AS available_spaces
FROM
courses_classes_lessons
The problem I'm having is that doesn't matter if I'm loading the whole View or a single row from it - it always takes about 6-9s to load! But when I've tried the same query with a WHERE clause it takes about 500μs. I'm new to SQL View and confused - why there are no indexes/primary keys that I could use to load a single row quickly? Am I doing something wrong?
EXPLAIN RESULT
INSERT INTO `courses_classes` (`id`, `select_type`, `table`, `partitions`, `type`, `possible_keys`, `key`, `key_len`, `ref`, `rows`, `filtered`, `Extra`) VALUES
(1, 'PRIMARY', 'courses_classes_lessons', NULL, 'ALL', NULL, NULL, NULL, NULL, 478832, 100.00, NULL),
(3, 'DEPENDENT SUBQUERY', 'courses_classes_lessons_consumers', NULL, 'ref', 'PRIMARY,courses_classes_lessons_consumers_lesson_id_index', 'courses_classes_lessons_consumers_lesson_id_index', '4', 'api.courses_classes_lessons.id', 3, 100.00, 'Using index'),
(2, 'DEPENDENT SUBQUERY', 'courses_classes', NULL, 'eq_ref', 'PRIMARY,courses_classes_id_parent_id_index', 'PRIMARY', '4', 'api.courses_classes_lessons.class_id', 1, 100.00, NULL);
TABLE STRUCTURE
Lessons
CREATE TABLE `courses_classes_lessons` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`franchisee_id` int(10) unsigned NOT NULL,
`class_id` int(10) unsigned NOT NULL,
`instructor_id` int(10) unsigned NOT NULL,
`instructor_rate` int(10) unsigned NOT NULL DEFAULT '0',
`instructor_total` int(10) unsigned NOT NULL DEFAULT '0',
`instructor_paid` tinyint(1) NOT NULL DEFAULT '0',
`starts_at` timestamp NULL DEFAULT NULL,
`ends_at` timestamp NULL DEFAULT NULL,
`completed_at` timestamp NULL DEFAULT NULL,
`cancelled_at` timestamp NULL DEFAULT NULL,
`cancelled_reason` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
`cancelled_reason_extra` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`deleted_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `courses_classes_lessons_franchisee_id_foreign` (`franchisee_id`),
KEY `courses_classes_lessons_class_id_foreign` (`class_id`),
KEY `courses_classes_lessons_instructor_id_foreign` (`instructor_id`),
KEY `courses_classes_lessons_starts_at_ends_at_index` (`starts_at`,`ends_at`),
KEY `courses_classes_lessons_completed_at_index` (`completed_at`),
KEY `courses_classes_lessons_cancelled_at_index` (`cancelled_at`),
KEY `courses_classes_lessons_class_id_deleted_at_index` (`class_id`,`deleted_at`),
KEY `courses_classes_lessons_deleted_at_index` (`deleted_at`),
KEY `class_ownership_index` (`class_id`,`starts_at`,`cancelled_at`,`deleted_at`),
CONSTRAINT `courses_classes_lessons_class_id_foreign` FOREIGN KEY (`class_id`) REFERENCES `courses_classes` (`id`) ON DELETE CASCADE,
CONSTRAINT `courses_classes_lessons_franchisee_id_foreign` FOREIGN KEY (`franchisee_id`) REFERENCES `franchisees` (`id`) ON DELETE CASCADE,
CONSTRAINT `courses_classes_lessons_instructor_id_foreign` FOREIGN KEY (`instructor_id`) REFERENCES `instructors` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=487853 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Lessons consumers
CREATE TABLE `courses_classes_lessons_consumers` (
`lesson_id` int(10) unsigned NOT NULL,
`consumer_id` int(10) unsigned NOT NULL,
`present` tinyint(1) DEFAULT NULL,
`plan_id` int(10) unsigned DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`lesson_id`,`consumer_id`),
KEY `courses_classes_lessons_consumers_consumer_id_foreign` (`consumer_id`),
KEY `courses_classes_lessons_consumers_plan_id_foreign` (`plan_id`),
KEY `courses_classes_lessons_consumers_lesson_id_index` (`lesson_id`),
KEY `courses_classes_lessons_consumers_present_index` (`present`),
CONSTRAINT `courses_classes_lessons_consumers_consumer_id_foreign` FOREIGN KEY (`consumer_id`) REFERENCES `customers_consumers` (`id`) ON DELETE CASCADE,
CONSTRAINT `courses_classes_lessons_consumers_lesson_id_foreign` FOREIGN KEY (`lesson_id`) REFERENCES `courses_classes_lessons` (`id`) ON DELETE CASCADE,
CONSTRAINT `courses_classes_lessons_consumers_plan_id_foreign` FOREIGN KEY (`plan_id`) REFERENCES `customers_plans` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
From classes it's only using max_consumers int(10) unsigned NOT NULL DEFAULT '0',
UPDATE 1
I've changed SQL View to the following one:
CREATE OR REPLACE VIEW `courses_classes_lessons_view` AS
SELECT
courses_classes_lessons.id AS lesson_id,
courses_classes.max_consumers AS class_max_consumers,
lessons_consumers.consumers_count AS consumers_count,
(
SELECT
CASE WHEN consumers_count >= class_max_consumers THEN
TRUE
ELSE
FALSE
END AS is_full) AS is_full,
(
CASE WHEN courses_classes_lessons.completed_at > NOW() THEN
'completed'
WHEN courses_classes_lessons.cancelled_at > NOW() THEN
'cancelled'
WHEN courses_classes_lessons.starts_at > NOW() THEN
'upcoming'
ELSE
'incomplete'
END) AS status,
(
SELECT
class_max_consumers - LEAST(consumers_count, class_max_consumers)) AS available_spaces
FROM
courses_classes_lessons
JOIN courses_classes ON courses_classes.id = courses_classes_lessons.class_id
JOIN (
SELECT
lesson_id,
count(*) AS consumers_count
FROM
courses_classes_lessons_consumers
GROUP BY
courses_classes_lessons_consumers.lesson_id) AS lessons_consumers ON lessons_consumers.lesson_id = courses_classes_lessons.id;
and even though the SELECT query itself seems to be way slower than the previous one then as the View it seems to perform way better. It's still not as fast as I wish it will be but it's a step forward.
Overall improvement jumps from 6-7s to around 800ms, the aim here is in the area of 500μs-1ms. Any adivces how I can improve my SQL View more?
UPDATE 2
Ok, I've found the bottleneck! Again - it's kinda similar to the last one (SELECT query works fast for a single row, but SQL VIEW is trying to access the whole table at once every time.
My new lesson SQL VIEW:
CREATE OR REPLACE VIEW `courses_classes_lessons_view` AS
SELECT
courses_classes_lessons.id AS lesson_id,
courses_classes.max_consumers AS class_max_consumers,
IFNULL(lessons_consumers.consumers_count,0) AS consumers_count,
(
SELECT
CASE WHEN consumers_count >= class_max_consumers THEN
TRUE
ELSE
FALSE
END AS is_full) AS is_full,
(
CASE WHEN courses_classes_lessons.completed_at > NOW() THEN
'completed'
WHEN courses_classes_lessons.cancelled_at > NOW() THEN
'cancelled'
WHEN courses_classes_lessons.starts_at > NOW() THEN
'upcoming'
ELSE
'incomplete'
END) AS status,
(
SELECT
IFNULL(class_max_consumers, 0) - LEAST(IFNULL(consumers_count,0), class_max_consumers)) AS available_spaces
FROM
courses_classes_lessons
JOIN courses_classes ON courses_classes.id = courses_classes_lessons.class_id
LEFT JOIN courses_classes_lessons_consumers_view AS lessons_consumers ON lessons_consumers.lesson_id = courses_classes_lessons.id;
Another SQL View - this time for consumers:
CREATE OR REPLACE VIEW `courses_classes_lessons_consumers_view` AS
SELECT
lesson_id,
IFNULL(count(
consumer_id),0) AS consumers_count
FROM
courses_classes_lessons_consumers
GROUP BY
courses_classes_lessons_consumers.lesson_id;
And looks like this one is the trouble maker! The consumers table is above, and here is the explain for the above SELECT query:
INSERT INTO `courses_classes_lessons_consumers` (`id`, `select_type`, `table`, `partitions`, `type`, `possible_keys`, `key`, `key_len`, `ref`, `rows`, `filtered`, `Extra`)
VALUES(1, 'SIMPLE', 'courses_classes_lessons_consumers', NULL, 'index', 'PRIMARY,courses_classes_lessons_consumers_consumer_id_foreign,courses_classes_lessons_consumers_plan_id_foreign,courses_classes_lessons_consumers_lesson_id_index,courses_classes_lessons_consumers_present_index', 'courses_classes_lessons_consumers_lesson_id_index', '4', NULL, 1330649, 100.00, 'Using index');
Any idea how to spread up this count?
Consider writing a Stored procedure; it may be able to get the 448 put into place to be better optimized.
If you know there will be only one row (such as when doing COUNT(*)), skip the LIMIT 1.
Unless consumer_id might be NULL, use COUNT(*) instead of COUNT(consumer_id).
A LIMIT without an ORDER BY leaves you getting a random row.
If courses_classes_lessons_consumers is a many-to-many mapping table, I will probably have some index advice after I see SHOW CREATE TABLE.
Which of the 5 SELECTs is the slowest?
After many attempts, it looks like that the Procedure way is the best approach and I won't be spending more time on the SQL Views
Here's the procedure I wrote:
CREATE PROCEDURE `LessonData`(
IN lessonId INT(10)
)
BEGIN
SELECT
courses_classes_lessons.id AS lesson_id,
courses_classes.max_consumers AS class_max_consumers,
IFNULL((SELECT
count(consumer_id) as consumers_count
FROM
courses_classes_lessons_consumers
WHERE
lesson_id = courses_classes_lessons.id
GROUP BY
courses_classes_lessons_consumers.lesson_id), 0) AS consumers_count,
(
SELECT
CASE WHEN consumers_count >= class_max_consumers THEN
TRUE
ELSE
FALSE
END) AS is_full,
(
CASE WHEN courses_classes_lessons.completed_at > NOW() THEN
'completed'
WHEN courses_classes_lessons.cancelled_at > NOW() THEN
'cancelled'
WHEN courses_classes_lessons.starts_at > NOW() THEN
'upcoming'
ELSE
'incomplete'
END) AS status,
(
SELECT
class_max_consumers - LEAST(consumers_count, class_max_consumers)) AS available_spaces
FROM
courses_classes_lessons
JOIN courses_classes ON courses_classes.id = courses_classes_lessons.class_id
WHERE courses_classes_lessons.id = lessonId;
END
And the execution time for it is around 500μs-1ms.
Thank you all for your help!

MySQL UPDATE query partly updating with default values

I am running an update query (in MySQL 5.6) on 2 tables joined like the following:
UPDATE c_cache cc
JOIN p_cache pc USING (user_id, attribute_id, calculation_quarter)
JOIN batch_table bt USING (user_id, attribute_id, calculation_quarter, client_id, group_code, version_id)
SET cc.epop = SUBSTRING(bt.result, 1, 1),
cc.excl = SUBSTRING(bt.result, 2, 1),
cc.num_result = SUBSTRING(bt.result, 3, 20),
cc.status = 'FR',
pc.epop = IF(bt.enrolled = 2, SUBSTRING(bt.result, 1, 1), 1),
pc.excl = IF(bt.enrolled = 2, SUBSTRING(bt.result, 2, 1), 0),
pc.num_result = IF(bt.enrolled = 2, SUBSTRING(bt.result, 3, 20), REPEAT('0', 20)),
pc.status = IF(pc.status = 'FL2', 'S', 'FR');
The p_cache is being updated properly, but the 3 columns in the c_cache is being set to NULLs and status to 'S' (all default values of these columns).
The rows that are being missed out are usually contiguous (in chunks).
This query is within a loop in a stored procedure that runs till all 'S' (stale) status rows of p_cache are marked 'FR' (fresh), i.e. computed.
(All rows of p_cache are present in c_cache with a one-to-one correspondence).
The batch_table picks up rows in batches of 25000 rows per iteration, and gets updated with computed results in result column through some stored functions.
This whole stored proc. is called from a MySQL event. Multiple events run simultaneously (each for an exclusive set of attributes) to find stale rows in the p_cache, and update both cache tables with computed results in batches using queries similar to this one.
This anomalous behavior happens only on the c_cache, but only sometimes.
The schema definitions are:
CREATE TABLE c_cache (
user_id INT(11) NOT NULL DEFAULT '0',
attribute_id INT(11) NOT NULL DEFAULT '0',
calculation_quarter DATE NOT NULL DEFAULT '0000-00-00',
version_id INT(11) NOT NULL DEFAULT '0',
epop TINYINT(1) DEFAULT NULL,
excl TINYINT(1) DEFAULT NULL,
num_result CHAR(20) DEFAULT NULL,
status ENUM('FR','S','FL1','FL2') NOT NULL DEFAULT 'S',
PRIMARY KEY (user_id, attribute_id, calculation_quarter, version_id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE p_cache (
user_id INT(11) NOT NULL DEFAULT '0',
attribute_id INT(11) NOT NULL DEFAULT '0',
calculation_quarter DATE NOT NULL DEFAULT '0000-00-00',
client_id INT(11) NOT NULL DEFAULT '0',
group_code CHAR(5) NOT NULL DEFAULT '',
epop TINYINT(1) DEFAULT NULL,
excl TINYINT(1) DEFAULT NULL,
num_result CHAR(20) DEFAULT NULL,
status ENUM('FR','S','FL1','FL2','S1','S2') NOT NULL DEFAULT 'S',
PRIMARY KEY (user_id,attribute_id,calculation_quarter,client_id,group_code),
KEY date_status_id_index (calculation_quarter,status,attribute_id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Can anyone kindly explain why this is happening and suggest a way to avoid this?
Thanks in advance.

How to optimize this MySql query - joins 3 tables?

This query is very slow. It is pretty simple and the 3 tables used are indexed on all columns in JOIN and WHERE clauses. How can I optimize my query, or my tables for this query?
This is the slow query. It takes 15-20 seconds to run.
SELECT
user.id,
user.name,
user.key,
user.secret,
account.id,
account.name,
account.admin,
setting.attribute,
setting.value
FROM user
INNER JOIN account ON account.id = user.account_id
INNER JOIN setting ON setting.user_id = user.id
AND setting.deleted = 0
WHERE user.deleted = 0
It is likely issue is caused by join on the setting table, because the below two queries take about 5 seconds total. Although, 5 seconds still seems a little long?
SELECT
user.id,
user.name,
user.user_key,
user.secret,
account.id,
account.name,
account.admin
FROM user
INNER JOIN account ON account.user_id = user.id
WHERE user.deleted = 0
SELECT
setting.user_id,
setting.attribute,
setting.value
FROM setting
WHERE setting.deleted = 0
The explain for the slow query:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1, 'SIMPLE', 'user', 'ALL', 'PRIMARY,idx_id,idx_deleted', null, null, null, 600, 'Using where'
1, 'SIMPLE', 'account', 'eq_ref', 'PRIMARY', 'PRIMARY', '8', 'user.account_id', 1, null
1, 'SIMPLE', 'setting', 'ref', 'attribute_version_unique,idx_user_id,indx_deleted', 'attribute_version_unique', '8', 'user.id', 35, 'Using where'
The schema:
CREATE TABLE user
(
id BIGINT(20) unsigned PRIMARY KEY NOT NULL AUTO_INCREMENT,
name VARCHAR(45) NOT NULL,
user_key VARCHAR(45) NOT NULL,
secret VARCHAR(16),
account_id BIGINT(20) unsigned NOT NULL,
name VARCHAR(40) NOT NULL,
demo TINYINT(1) DEFAULT '0' NOT NULL,
details VARCHAR(4000),
date_created DATETIME NOT NULL,
date_modified DATETIME NOT NULL,
deleted TINYINT(1) DEFAULT '0' NOT NULL
);
CREATE INDEX idx_date_modified ON user (date_modified);
CREATE INDEX idx_deleted ON user (deleted);
CREATE INDEX idx_id ON pub_application (id);
CREATE UNIQUE INDEX idx_name_unique ON user (user_key);
CREATE TABLE account
(
id BIGINT(20) unsigned PRIMARY KEY NOT NULL AUTO_INCREMENT,
name VARCHAR(100) NOT NULL,
display_name VARCHAR(100),
admin TINYINT(1) DEFAULT '0' NOT NULL,
visibility VARCHAR(15) DEFAULT 'public',
cost DOUBLE,
monthly_fee VARCHAR(300),
date_created DATETIME NOT NULL,
date_modified DATETIME NOT NULL,
deleted TINYINT(1) DEFAULT '0'
);
CREATE INDEX idx_date_modified ON account (date_modified);
CREATE TABLE setting
(
id BIGINT(20) unsigned PRIMARY KEY NOT NULL AUTO_INCREMENT,
user_id BIGINT(20) unsigned NOT NULL,
attribute VARCHAR(45) NOT NULL,
value VARCHAR(4000),
date_created DATETIME NOT NULL,
date_modified DATETIME NOT NULL,
deleted TINYINT(1) DEFAULT '0' NOT NULL
);
CREATE UNIQUE INDEX attribute_version_unique ON setting (user_id, attribute);
CREATE INDEX idx_user_id ON setting (user_id);
CREATE INDEX idx_date_modified ON setting (date_modified);
CREATE INDEX indx_deleted ON setting (deleted);
With respect, you've stumbled across a common antipattern. Indexing "all columns" ordinarily is a useless move. MySQL (as of late 2016) can exploit at most one index per table when satisfying a query. So the extra indexes are likely to help no queries, and definitely add overhead on INSERT and UPDATE operations.
This query might be improved by some purpose-designed compound covering indexes.
Try this index on your user table. It's a covering index: intended to contain all the columns necessary to satisfy the query. It's organized in an order that matches your WHERE clause.
CREATE INDEX idx_user_account_setting
ON user (deleted , account_id, id, name, key, secret);
This covering index might help on your setting table
CREATE INDEX idx_setting_user
ON setting (user_id, deleted , attribute, value);
Also try this one, switching the order of the first two columns, if the first one doesn't help.
CREATE INDEX idx_setting_user_alt
ON setting (deleted, user_id, attribute, value);
Finally try this one on account.
CREATE INDEX idx_account_user
ON account (id, name, admin);
Please, if these suggestions help leave a brief comment telling how much they helped.
Read this. http://use-the-index-luke.com/

specify conditions from outer query on a materialized subquery

i have got the below query which references couple of views 'goldedRunQueries' and 'currentGoldMarkings'. My issue seems to be from the view that is referred in the subquery - currentGoldMarkings. While execution, MySQL first materializes this subquery and then implements the where clauses of 'queryCode' and 'runId', which therefore results in execution time of more than hour as the view refers tables that has got millions of rows of data. My question is how do I enforce those two where conditions on the subquery before it materializes.
SELECT goldedRunQueries.queryCode, goldedRunQueries.runId
FROM goldedRunQueries
LEFT OUTER JOIN
( SELECT measuredRunId, queryCode, COUNT(resultId) as c
FROM currentGoldMarkings
GROUP BY measuredRunId, queryCode
) AS accuracy ON accuracy.measuredRunId = goldedRunQueries.runId
AND accuracy.queryCode = goldedRunQueries.queryCode
WHERE goldedRunQueries.queryCode IN ('CH001', 'CH002', 'CH003')
and goldedRunQueries.runid = 5000
ORDER BY goldedRunQueries.runId DESC, goldedRunQueries.queryCode;
Here are the two views. Both of these also get used in a standalone mode and so integrating any clauses into them is not possible.
CREATE VIEW currentGoldMarkings
AS
SELECT result.resultId, result.runId AS measuredRunId, result.documentId,
result.queryCode, result.queryValue AS measuredValue,
gold.queryValue AS goldValue,
CASE result.queryValue WHEN gold.queryValue THEN 1 ELSE 0 END AS correct
FROM results AS result
INNER JOIN gold ON gold.documentId = result.documentId
AND gold.queryCode = result.queryCode
WHERE gold.isCurrent = 1
CREATE VIEW goldedRunQueries
AS
SELECT runId, queryCode
FROM runQueries
WHERE EXISTS
( SELECT 1 AS Expr1
FROM runs
WHERE (runId = runQueries.runId)
AND (isManual = 0)
)
AND EXISTS
( SELECT 1 AS Expr1
FROM results
WHERE (runId = runQueries.runId)
AND (queryCode = runQueries.queryCode)
AND EXISTS
( SELECT 1 AS Expr1
FROM gold
WHERE (documentId = results.documentId)
AND (queryCode = results.queryCode)
)
)
Note: The above query reflects only a part of my actual query. There are 3 other left outer joins which are similar in nature to the above subquery which makes the problem far more worse.
EDIT: As suggested, here is the structure and some sample data for the tables
CREATE TABLE `results`(
`resultId` int auto_increment NOT NULL,
`runId` int NOT NULL,
`documentId` int NOT NULL,
`queryCode` char(5) NOT NULL,
`queryValue` char(1) NOT NULL,
`comment` varchar(255) NULL,
CONSTRAINT `PK_results` PRIMARY KEY
(
`resultId`
)
);
insert into results values (100, 242300, 'AC001', 'I', NULL)
insert into results values (100, 242300, 'AC001', 'S', NULL)
insert into results values (150, 242301, 'AC005', 'I', 'abc')
insert into results values (100, 242300, 'AC001', 'I', NULL)
insert into results values (109, 242301, 'PQ001', 'S', 'zzz')
insert into results values (400, 242400, 'DD006', 'I', NULL)
CREATE TABLE `gold`(
`goldId` int auto_increment NOT NULL,
`runDate` datetime NOT NULL,
`documentId` int NOT NULL,
`queryCode` char(5) NOT NULL,
`queryValue` char(1) NOT NULL,
`comment` varchar(255) NULL,
`isCurrent` tinyint(1) NOT NULL DEFAULT 0,
CONSTRAINT `PK_gold` PRIMARY KEY
(
`goldId`
)
);
insert into gold values ('2015-02-20 00:00:00', 138904, 'CH001', 'N', NULL, 1)
insert into gold values ('2015-05-20 00:00:00', 138904, 'CH001', 'N', 'aaa', 1)
insert into gold values ('2016-02-20 00:00:00', 138905, 'CH002', 'N', NULL, 0)
insert into gold values ('2015-12-12 00:00:00', 138804, 'CH001', 'N', 'zzzz', 1)
CREATE TABLE `runQueries`(
`runId` int NOT NULL,
`queryCode` char(5) NOT NULL,
CONSTRAINT `PK_runQueries` PRIMARY KEY
(
`runId`,
`queryCode`
)
);
insert into runQueries values (100, 'AC001')
insert into runQueries values (109, 'PQ001')
insert into runQueries values (400, 'DD006')
CREATE TABLE `runs`(
`runId` int auto_increment NOT NULL,
`runName` varchar(63) NOT NULL,
`isManual` tinyint(1) NOT NULL,
`runDate` datetime NOT NULL,
`comment` varchar(1023) NULL,
`folderName` varchar(63) NULL,
`documentSetId` int NOT NULL,
`pipelineVersion` varchar(50) NULL,
`isArchived` tinyint(1) NOT NULL DEFAULT 0,
`pipeline` varchar(50) NULL,
CONSTRAINT `PK_runs` PRIMARY KEY
(
`runId`
)
);
insert into runs values ('test1', 0, '2015-08-04 06:30:46.000000', 'zzzz', '2015-08-04_103046', 2, '2015-08-03', 0, NULL)
insert into runs values ('test2', 1, '2015-12-04 12:30:46.000000', 'zzzz', '2015-08-04_103046', 2, '2015-08-03', 0, NULL)
insert into runs values ('test3', 1, '2015-06-24 10:56:46.000000', 'zzzz', '2015-08-04_103046', 2, '2015-08-03', 0, NULL)
insert into runs values ('test4', 1, '2016-05-04 11:30:46.000000', 'zzzz', '2015-08-04_103046', 2, '2015-08-03', 0, NULL)
First, let's try to improve the performance via indexes:
results: INDEX(runId, queryCode) -- in either order
gold: INDEX(documentId, query_code, isCurrent) -- in that order
After that, update the CREATE TABLEs in the question and add the output of:
EXPLAIN EXTENDED SELECT ...;
SHOW WARNINGS;
What version are you running? You effectively have FROM ( SELECT ... ) JOIN ( SELECT ... ). Before 5.6, neither subquery had an index; with 5.6, an index is generated on the fly.
It is a shame that the query is built that way, since you know which one to use: and goldedRunQueries.runid = 5000.
Bottom Line: add the indexes; upgrade to 5.6 or 5.7; if that is not enough, then rethink the use of VIEWs.

insert multiple rows in a table without inserting duplicates rows

I would like to merge a table that the data comes from 2 differents databases.
I operated as described below:
1 – I have done a dump of the source database table and I get the following insert query:
INSERT INTO `t_vaccination` VALUES (242,NULL,NULL,53,1,'20030528','0','W5770-2',0,'DTP - REVAXIS','A 20130521170623','2013-05-21 17:06:23'),
(243,NULL,NULL,53,1,'20130525','0','',1,'DTP - ','A 20130521170623','2013-05-21 17:06:23'),
(1830,NULL,NULL,50,1,'20080502','3','',0,'DTP - REVAXIS','A 20130521170623','2013-05-21 17:06:23'),
(1831,NULL,NULL,50,1,'20130501','4','',1,'DTP - ','A 20130521170623','2013-05-21 17:06:23'),
(1832,NULL,NULL,50,1,'20080502','3','',0,'PAPILLOMAVIRUS - Gardasil','A 20130521170623','2013-05-21 17:06:23')
the structure of the t_vaccination table is:
CREATE TABLE `t_vaccination` (
`nIdVaccination` INT(10) UNSIGNED NOT NULL,
`nIdVaccin` INT(10) UNSIGNED NULL DEFAULT NULL,
`nIdVacProtocole` INT(10) UNSIGNED NULL DEFAULT NULL,
`nIdPatient` INT(10) UNSIGNED NOT NULL,
`nIdUtilisateur` INT(10) UNSIGNED NULL DEFAULT NULL,
`sDateInjection` VARCHAR(8) NOT NULL DEFAULT '',
`nNumInjection` VARCHAR(45) NOT NULL DEFAULT '0',
`sNumLot` VARCHAR(45) NOT NULL DEFAULT '',
`nRappel` TINYINT(4) NOT NULL DEFAULT '0',
`sLibelle` VARCHAR(255) NOT NULL DEFAULT '',
`sAction` VARCHAR(16) NOT NULL DEFAULT 'A 20080101000000',
`sDH_REPLIC` DATETIME NULL DEFAULT '2010-01-01 00:00:00',
PRIMARY KEY (`nIdVaccination`),
INDEX `NDX_t_vaccination_nIdUtilisateur` (`nIdUtilisateur`),
INDEX `NDX_t_vaccination_nIdVaccin` (`nIdVaccin`),
INDEX `NDX_t_vaccination_nIdVacProtocole` (`nIdVacProtocole`),
INDEX `NDX_t_vaccination_nIdPatient` (`nIdPatient`),
CONSTRAINT `FK_vaccination_nIdUtilisateur_utilisateur` FOREIGN KEY (`nIdUtilisateur`) REFERENCES `t_utilisateur` (`nIdUtilisateur`),
CONSTRAINT `FK_vaccination_nIdVaccin_vaccin` FOREIGN KEY (`nIdVaccin`) REFERENCES `t_vaccin` (`nIdVaccin`)
)
2 - I would like to insert all the rows in the t_vaccination table of the final database without inserting the duplicates rows. the new query run by inserting one row:
INSERT INTO t_vaccination (nIdVaccination, nIdVaccin, nIdVacProtocole, nIdPatient, nIdUtilisateur, sDateInjection, nNumInjection, sNumLot, nRappel, sLibelle, sAction, sDH_REPLIC)
SELECT 251,41,4,53,1,'20030528','0','W5770-2',0,'DTP - REVAXIS','A 20130521170623','2013-05-21 17:06:23' FROM t_vaccination WHERE NOT EXISTS (SELECT nIdVaccin, nIdVacProtocole, nIdPatient, nIdUtilisateur FROM t_vaccination WHERE nIdVaccin = 41 and nIdVacProtocole = 4 and nIdPatient = 53 and nIdUtilisateur =1 ) LIMIT 1
3 - Is it possible to insert rows by group by using insert where not exists because the attempts that i have done failed. here is an example of an an insert that failed:
INSERT INTO t_vaccination (nIdVaccination, nIdVaccin, nIdVacProtocole, nIdPatient, nIdUtilisateur, sDateInjection, nNumInjection, sNumLot, nRappel, sLibelle, sAction, sDH_REPLIC)
SELECT 251,41,4,53,1,'20030528','0','W5770-2',0,'DTP - REVAXIS','A 20130521170623','2013-05-21 17:06:23' FROM t_vaccination WHERE NOT EXISTS (SELECT nIdVaccin, nIdVacProtocole, nIdPatient, nIdUtilisateur FROM t_vaccination WHERE nIdVaccin = 41 and nIdVacProtocole = 4 and nIdPatient = 53 and nIdUtilisateur =1 ) LIMIT 1,
SELECT 243,NULL,NULL,53,1,'20130525','0','',1,'DTP - ','A 20130521170623','2013-05-21 17:06:23' FROM t_vaccination WHERE NOT EXISTS (SELECT nIdVaccin, nIdVacProtocole, nIdPatient, nIdUtilisateur FROM t_vaccination WHERE nIdVaccin = NULL and nIdVacProtocole = NULL and nIdPatient = 53 and nIdUtilisateur =1 ) LIMIT 1
I hope for your help.
Regards
Motti
In my opinion, the simplest way to do what you want is to remove Unique keys/indexes and remove dupes or create a temporary table without those keys. Let's assume you create a temp_t_vaccination table and import all your rows in, you'll then just have to do :
INSERT INTO t_vaccination (field1, field2 ...) SELECT DISTINCT field1, fields2 ... FROM temp_vaccination
ref : http://dev.mysql.com/doc/refman/5.0/en/insert-select.html?ff=nopfpls