Use PivotTable with Left Outer Join - mysql

I'm fairly new to MySQL.
Put simply, I want to turn each "meta_key" value into a column of the resulting table.
I've got two tables with the following schema:
CREATE TABLE `frmt_form_entry` (
`entry_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`entry_type` varchar(191) COLLATE utf8mb4_unicode_520_ci NOT NULL,
`form_id` bigint(20) unsigned NOT NULL,
`is_spam` tinyint(1) NOT NULL DEFAULT '0',
`date_created` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`entry_id`),
KEY `entry_is_spam` (`is_spam`),
KEY `entry_type` (`entry_type`),
KEY `entry_form_id` (`form_id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_520_ci;
CREATE TABLE `frmt_form_entry_meta` (
`meta_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`entry_id` bigint(20) unsigned NOT NULL,
`meta_key` varchar(191) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
`meta_value` longtext COLLATE utf8mb4_unicode_520_ci,
`date_created` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`date_updated` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`meta_id`),
KEY `meta_key` (`meta_key`),
KEY `meta_entry_id` (`entry_id`),
KEY `meta_key_object` (`entry_id`,`meta_key`)
) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_520_ci;
There exist multiple "meta entries" per "form entry":
INSERT INTO `frmt_form_entry` (`entry_id`, `entry_type`, `form_id`, `is_spam`, `date_created`)
VALUES
(1, 'custom-forms', 3744, 0, '2020-08-14 13:00:32');
INSERT INTO `frmt_form_entry_meta` (`meta_id`, `entry_id`, `meta_key`, `meta_value`, `date_created`, `date_updated`)
VALUES
(1, 1, 'text-7', 'Foreign Legal Form', '2020-08-14 13:00:32', '0000-00-00 00:00:00'),
(2, 1, 'name-1', 'Test Name', '2020-08-14 13:00:32', '0000-00-00 00:00:00'),
(3, 1, 'address-1', 'a:6:{s:7:\"country\";s:0:\"\";s:4:\"city\";s:0:\"\";s:5:\"state\";s:0:\"\";s:3:\"zip\";s:0:\"\";s:14:\"street_address\";s:0:\"\";s:12:\"address_line\";s:0:\"\";}', '2020-08-14 13:00:32', '0000-00-00 00:00:00'),
(4, 1, 'address-2', 'a:6:{s:7:\"country\";s:0:\"\";s:4:\"city\";s:0:\"\";s:5:\"state\";s:0:\"\";s:3:\"zip\";s:0:\"\";s:14:\"street_address\";s:0:\"\";s:12:\"address_line\";s:0:\"\";}', '2020-08-14 13:00:32', '0000-00-00 00:00:00'),
(5, 1, 'address-3', 'a:6:{s:7:\"country\";s:0:\"\";s:4:\"city\";s:0:\"\";s:5:\"state\";s:0:\"\";s:3:\"zip\";s:0:\"\";s:14:\"street_address\";s:0:\"\";s:12:\"address_line\";s:0:\"\";}', '2020-08-14 13:00:32', '0000-00-00 00:00:00'),
(6, 1, '_forminator_user_ip', '8.8.8.8', '2020-08-14 13:00:32', '0000-00-00 00:00:00'),
(7, 1, 'stripe-1', 'a:6:{s:4:\"mode\";s:4:\"test\";s:6:\"status\";s:7:\"success\";s:6:\"amount\";s:2:\"0\";s:8:\"currency\";s:3:\"\";s:14:\"transaction_id\";s:27:\"\";s:16:\"transaction_link\";s:70:\"\";}', '2020-08-14 13:00:32', '0000-00-00 00:00:00');
I'd like to create a Pivot Table with the same number of rows as there are rows in the frmt_form_entry table (1 in this case).
I want each meta key to be a column name of that pivot table. Since each meta key can only exist once per entry_id, I won't need to concatenate the meta values of multiple fields for the same meta key.
This is the expected output:
entry_id entry_type form_id is_spam date_created text-7 name-1 address-1 address-2 address-3. _forminator_user_ip stripe-1
1 custom-forms 3744 0 2020-08-14 13:00:32 Foreign Legal Form Test Name [...]
What I've tried so far:
SELECT entries.*,
CASE WHEN meta.`meta_key`='stripe-1'
THEN meta.`meta_value`
ELSE NULL
END
AS stripe,
CASE WHEN meta.`meta_key`='text-7'
THEN meta.`meta_value`
ELSE NULL
END
AS text7,
CASE WHEN meta.`meta_key`='_forminator_user_ip'
THEN meta.`meta_value`
ELSE NULL
END
AS user_ip,
CASE WHEN meta.`meta_key`='address-1'
THEN meta.`meta_value`
ELSE NULL
END
AS address1,
CASE WHEN meta.`meta_key`='address-2'
THEN meta.`meta_value`
ELSE NULL
END
AS address2,
CASE WHEN meta.`meta_key`='address-3'
THEN meta.`meta_value`
ELSE NULL
END
AS address3,
CASE WHEN meta.`meta_key`='name-1'
THEN meta.`meta_value`
ELSE NULL
END
AS name1
FROM frmt_form_entry entries
LEFT OUTER JOIN frmt_form_entry_meta meta
ON entries.entry_id = meta.entry_id
However, this feels wrong and messy and it doesn't give the expected result (it creates one entry per meta value).
Is there an easy, better way to turn each "meta_key" value into a column of the resulting table?

Sorry, but I am going to be rather blunt. "Entry-Attribute-Value" is a great schema pattern for textbooks, but not for the real world.
EAV is especially inappropriate for columns that exist for all the items. If "every" location has an address, city, etc, then make them columns. Simpler. More efficient. (In the case of address_1 and an optional address_2, use NULL to indicate optionality.) Do you ever need the parts of an address for searching or filtering? If not, then simply have a single column for the whole "address".
Reserve "meta" for attributes that are usually not present. As a separate discussion we can discuss the benefits/drawbacks of using a single JSON column for such. Or how to do searching with FULLTEXT.
Do you really need the date_created and date_updated?
The 4 indexes you have on the meta table are quite inefficient. See this for tips on improving it: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#speeding_up_wp_postmeta

Related

SQL Views for single row (with where clause)

I'm working on a system that administrates courses, with multiple classes, with multiple lessons and multiple consumers on them. As the system grows more data were required so with some performance issues I've decided to go with SQL Views. We're using MySQL.
So I've replaced old calls to the DB (for example for the single lesson)
select * from `courses_classes_lessons` where `courses_classes_lessons`.`deleted_at` is null limit 1;
select count(consumer_id) as consumers_count from `courses_classes_lessons_consumers` where `lesson_id` = '448' limit 1;
select `max_consumers` from `courses_classes` where `id` = '65' limit 1;
select `id` from `courses_classes_lessons` left join `courses_classes_lessons_consumers` on `courses_classes_lessons_consumers`.`lesson_id` = `courses_classes_lessons`.`id` where `id` = '448' group by `courses_classes_lessons`.`id` having count(courses_classes_lessons_consumers.consumer_id) < '4' limit 1;
select courses_classes.max_consumers - LEAST(count(courses_classes_lessons_consumers.consumer_id), courses_classes.max_consumers) as available_spaces from `courses_classes_lessons` left join `courses_classes_lessons_consumers` on `courses_classes_lessons_consumers`.`lesson_id` = `courses_classes_lessons`.`id` left join `courses_classes` on `courses_classes_lessons`.`class_id` = `courses_classes`.`id` where `courses_classes_lessons`.`id` = '448' group by `courses_classes`.`id` limit 1;
The above took around 4-5ms
with the SQL View as follow:
CREATE OR REPLACE VIEW `courses_classes_lessons_view` AS
SELECT
courses_classes_lessons.id AS lesson_id,
(SELECT
max_consumers
FROM
courses_classes
WHERE
id = courses_classes_lessons.class_id
LIMIT 1) AS class_max_consumers,
(SELECT
count(consumer_id)
FROM
courses_classes_lessons_consumers
WHERE
lesson_id = courses_classes_lessons.id) AS consumers_count,
(SELECT
CASE WHEN consumers_count >= class_max_consumers THEN
TRUE
ELSE
FALSE
END AS is_full) AS is_full,
(CASE WHEN courses_classes_lessons.completed_at > NOW() THEN
'completed'
WHEN courses_classes_lessons.cancelled_at > NOW() THEN
'cancelled'
WHEN courses_classes_lessons.starts_at > NOW() THEN
'upcoming'
ELSE
'incomplete'
END) AS status,
(SELECT
class_max_consumers - LEAST(consumers_count, class_max_consumers)) AS available_spaces
FROM
courses_classes_lessons
The problem I'm having is that doesn't matter if I'm loading the whole View or a single row from it - it always takes about 6-9s to load! But when I've tried the same query with a WHERE clause it takes about 500μs. I'm new to SQL View and confused - why there are no indexes/primary keys that I could use to load a single row quickly? Am I doing something wrong?
EXPLAIN RESULT
INSERT INTO `courses_classes` (`id`, `select_type`, `table`, `partitions`, `type`, `possible_keys`, `key`, `key_len`, `ref`, `rows`, `filtered`, `Extra`) VALUES
(1, 'PRIMARY', 'courses_classes_lessons', NULL, 'ALL', NULL, NULL, NULL, NULL, 478832, 100.00, NULL),
(3, 'DEPENDENT SUBQUERY', 'courses_classes_lessons_consumers', NULL, 'ref', 'PRIMARY,courses_classes_lessons_consumers_lesson_id_index', 'courses_classes_lessons_consumers_lesson_id_index', '4', 'api.courses_classes_lessons.id', 3, 100.00, 'Using index'),
(2, 'DEPENDENT SUBQUERY', 'courses_classes', NULL, 'eq_ref', 'PRIMARY,courses_classes_id_parent_id_index', 'PRIMARY', '4', 'api.courses_classes_lessons.class_id', 1, 100.00, NULL);
TABLE STRUCTURE
Lessons
CREATE TABLE `courses_classes_lessons` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`franchisee_id` int(10) unsigned NOT NULL,
`class_id` int(10) unsigned NOT NULL,
`instructor_id` int(10) unsigned NOT NULL,
`instructor_rate` int(10) unsigned NOT NULL DEFAULT '0',
`instructor_total` int(10) unsigned NOT NULL DEFAULT '0',
`instructor_paid` tinyint(1) NOT NULL DEFAULT '0',
`starts_at` timestamp NULL DEFAULT NULL,
`ends_at` timestamp NULL DEFAULT NULL,
`completed_at` timestamp NULL DEFAULT NULL,
`cancelled_at` timestamp NULL DEFAULT NULL,
`cancelled_reason` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
`cancelled_reason_extra` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`deleted_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `courses_classes_lessons_franchisee_id_foreign` (`franchisee_id`),
KEY `courses_classes_lessons_class_id_foreign` (`class_id`),
KEY `courses_classes_lessons_instructor_id_foreign` (`instructor_id`),
KEY `courses_classes_lessons_starts_at_ends_at_index` (`starts_at`,`ends_at`),
KEY `courses_classes_lessons_completed_at_index` (`completed_at`),
KEY `courses_classes_lessons_cancelled_at_index` (`cancelled_at`),
KEY `courses_classes_lessons_class_id_deleted_at_index` (`class_id`,`deleted_at`),
KEY `courses_classes_lessons_deleted_at_index` (`deleted_at`),
KEY `class_ownership_index` (`class_id`,`starts_at`,`cancelled_at`,`deleted_at`),
CONSTRAINT `courses_classes_lessons_class_id_foreign` FOREIGN KEY (`class_id`) REFERENCES `courses_classes` (`id`) ON DELETE CASCADE,
CONSTRAINT `courses_classes_lessons_franchisee_id_foreign` FOREIGN KEY (`franchisee_id`) REFERENCES `franchisees` (`id`) ON DELETE CASCADE,
CONSTRAINT `courses_classes_lessons_instructor_id_foreign` FOREIGN KEY (`instructor_id`) REFERENCES `instructors` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=487853 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Lessons consumers
CREATE TABLE `courses_classes_lessons_consumers` (
`lesson_id` int(10) unsigned NOT NULL,
`consumer_id` int(10) unsigned NOT NULL,
`present` tinyint(1) DEFAULT NULL,
`plan_id` int(10) unsigned DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`lesson_id`,`consumer_id`),
KEY `courses_classes_lessons_consumers_consumer_id_foreign` (`consumer_id`),
KEY `courses_classes_lessons_consumers_plan_id_foreign` (`plan_id`),
KEY `courses_classes_lessons_consumers_lesson_id_index` (`lesson_id`),
KEY `courses_classes_lessons_consumers_present_index` (`present`),
CONSTRAINT `courses_classes_lessons_consumers_consumer_id_foreign` FOREIGN KEY (`consumer_id`) REFERENCES `customers_consumers` (`id`) ON DELETE CASCADE,
CONSTRAINT `courses_classes_lessons_consumers_lesson_id_foreign` FOREIGN KEY (`lesson_id`) REFERENCES `courses_classes_lessons` (`id`) ON DELETE CASCADE,
CONSTRAINT `courses_classes_lessons_consumers_plan_id_foreign` FOREIGN KEY (`plan_id`) REFERENCES `customers_plans` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
From classes it's only using max_consumers int(10) unsigned NOT NULL DEFAULT '0',
UPDATE 1
I've changed SQL View to the following one:
CREATE OR REPLACE VIEW `courses_classes_lessons_view` AS
SELECT
courses_classes_lessons.id AS lesson_id,
courses_classes.max_consumers AS class_max_consumers,
lessons_consumers.consumers_count AS consumers_count,
(
SELECT
CASE WHEN consumers_count >= class_max_consumers THEN
TRUE
ELSE
FALSE
END AS is_full) AS is_full,
(
CASE WHEN courses_classes_lessons.completed_at > NOW() THEN
'completed'
WHEN courses_classes_lessons.cancelled_at > NOW() THEN
'cancelled'
WHEN courses_classes_lessons.starts_at > NOW() THEN
'upcoming'
ELSE
'incomplete'
END) AS status,
(
SELECT
class_max_consumers - LEAST(consumers_count, class_max_consumers)) AS available_spaces
FROM
courses_classes_lessons
JOIN courses_classes ON courses_classes.id = courses_classes_lessons.class_id
JOIN (
SELECT
lesson_id,
count(*) AS consumers_count
FROM
courses_classes_lessons_consumers
GROUP BY
courses_classes_lessons_consumers.lesson_id) AS lessons_consumers ON lessons_consumers.lesson_id = courses_classes_lessons.id;
and even though the SELECT query itself seems to be way slower than the previous one then as the View it seems to perform way better. It's still not as fast as I wish it will be but it's a step forward.
Overall improvement jumps from 6-7s to around 800ms, the aim here is in the area of 500μs-1ms. Any adivces how I can improve my SQL View more?
UPDATE 2
Ok, I've found the bottleneck! Again - it's kinda similar to the last one (SELECT query works fast for a single row, but SQL VIEW is trying to access the whole table at once every time.
My new lesson SQL VIEW:
CREATE OR REPLACE VIEW `courses_classes_lessons_view` AS
SELECT
courses_classes_lessons.id AS lesson_id,
courses_classes.max_consumers AS class_max_consumers,
IFNULL(lessons_consumers.consumers_count,0) AS consumers_count,
(
SELECT
CASE WHEN consumers_count >= class_max_consumers THEN
TRUE
ELSE
FALSE
END AS is_full) AS is_full,
(
CASE WHEN courses_classes_lessons.completed_at > NOW() THEN
'completed'
WHEN courses_classes_lessons.cancelled_at > NOW() THEN
'cancelled'
WHEN courses_classes_lessons.starts_at > NOW() THEN
'upcoming'
ELSE
'incomplete'
END) AS status,
(
SELECT
IFNULL(class_max_consumers, 0) - LEAST(IFNULL(consumers_count,0), class_max_consumers)) AS available_spaces
FROM
courses_classes_lessons
JOIN courses_classes ON courses_classes.id = courses_classes_lessons.class_id
LEFT JOIN courses_classes_lessons_consumers_view AS lessons_consumers ON lessons_consumers.lesson_id = courses_classes_lessons.id;
Another SQL View - this time for consumers:
CREATE OR REPLACE VIEW `courses_classes_lessons_consumers_view` AS
SELECT
lesson_id,
IFNULL(count(
consumer_id),0) AS consumers_count
FROM
courses_classes_lessons_consumers
GROUP BY
courses_classes_lessons_consumers.lesson_id;
And looks like this one is the trouble maker! The consumers table is above, and here is the explain for the above SELECT query:
INSERT INTO `courses_classes_lessons_consumers` (`id`, `select_type`, `table`, `partitions`, `type`, `possible_keys`, `key`, `key_len`, `ref`, `rows`, `filtered`, `Extra`)
VALUES(1, 'SIMPLE', 'courses_classes_lessons_consumers', NULL, 'index', 'PRIMARY,courses_classes_lessons_consumers_consumer_id_foreign,courses_classes_lessons_consumers_plan_id_foreign,courses_classes_lessons_consumers_lesson_id_index,courses_classes_lessons_consumers_present_index', 'courses_classes_lessons_consumers_lesson_id_index', '4', NULL, 1330649, 100.00, 'Using index');
Any idea how to spread up this count?
Consider writing a Stored procedure; it may be able to get the 448 put into place to be better optimized.
If you know there will be only one row (such as when doing COUNT(*)), skip the LIMIT 1.
Unless consumer_id might be NULL, use COUNT(*) instead of COUNT(consumer_id).
A LIMIT without an ORDER BY leaves you getting a random row.
If courses_classes_lessons_consumers is a many-to-many mapping table, I will probably have some index advice after I see SHOW CREATE TABLE.
Which of the 5 SELECTs is the slowest?
After many attempts, it looks like that the Procedure way is the best approach and I won't be spending more time on the SQL Views
Here's the procedure I wrote:
CREATE PROCEDURE `LessonData`(
IN lessonId INT(10)
)
BEGIN
SELECT
courses_classes_lessons.id AS lesson_id,
courses_classes.max_consumers AS class_max_consumers,
IFNULL((SELECT
count(consumer_id) as consumers_count
FROM
courses_classes_lessons_consumers
WHERE
lesson_id = courses_classes_lessons.id
GROUP BY
courses_classes_lessons_consumers.lesson_id), 0) AS consumers_count,
(
SELECT
CASE WHEN consumers_count >= class_max_consumers THEN
TRUE
ELSE
FALSE
END) AS is_full,
(
CASE WHEN courses_classes_lessons.completed_at > NOW() THEN
'completed'
WHEN courses_classes_lessons.cancelled_at > NOW() THEN
'cancelled'
WHEN courses_classes_lessons.starts_at > NOW() THEN
'upcoming'
ELSE
'incomplete'
END) AS status,
(
SELECT
class_max_consumers - LEAST(consumers_count, class_max_consumers)) AS available_spaces
FROM
courses_classes_lessons
JOIN courses_classes ON courses_classes.id = courses_classes_lessons.class_id
WHERE courses_classes_lessons.id = lessonId;
END
And the execution time for it is around 500μs-1ms.
Thank you all for your help!

Simple SQL query but wrong results returned

I have a simple click tracking system that consists of three tables "tracking" (which holds unique views), "views" (which holds raw views) and "products" (which holds products).
Here's how it works: each time a user clicks on a tracking link, if the hash present in the link does not exist in the database, it will be saved in the "tracking" table as an unique view and also in the "views" table as a raw view. If the hash present in the link does exist in the database, then it will be saved only in the "views" table. So basically the number of "raw views" can not be smaller than the number of "unique views" because each "unique view" also counts as a "raw view".
I wrote a query to create reports based on products, but the number of "raw views" returned is not correct.
I've also created a fiddle which I hope it will give a better overview of my problem.
Here's the table structure:
CREATE TABLE `products` (
`id` int(10) UNSIGNED NOT NULL,
`name` varchar(128) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `products` (`id`, `name`) VALUES
(1, 'Test product');
CREATE TABLE `tracking` (
`id` int(10) UNSIGNED NOT NULL,
`product_id` int(11) NOT NULL,
`hash` varchar(32) NOT NULL,
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `tracking` (`id`, `product_id`, `hash`, `created`) VALUES
(1, 1, '7ddf32e17a6ac5ce04a8ecbf782ca509', '2020-02-09 18:50:19'),
(2, 1, '00bb28eaf259ba0c932d67f649d90783', '2020-02-09 18:55:34');
CREATE TABLE `views` (
`id` int(10) UNSIGNED NOT NULL,
`hash` varchar(32) NOT NULL,
`created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `views` (`id`, `hash`, `created`) VALUES
(1, '7ddf32e17a6ac5ce04a8ecbf782ca509', '2020-02-09 18:46:30'),
(2, '7ddf32e17a6ac5ce04a8ecbf782ca509', '2020-02-09 18:46:30'),
(3, '7ddf32e17a6ac5ce04a8ecbf782ca509', '2020-02-09 18:46:35'),
(4, '7ddf32e17a6ac5ce04a8ecbf782ca509', '2020-02-09 18:46:42'),
(5, '00bb28eaf259ba0c932d67f649d90783', '2020-02-09 18:56:31'),
(6, '00bb28eaf259ba0c932d67f649d90783', '2020-02-09 18:57:01');
And here's the query I wrote so far:
SELECT products.name AS `param`,
SUM(IF(tracking.product_id<>24, 1, 0)) AS `uniques`,
IF(SUM(IF(tracking.product_id<>24, 1, 0))=0, 0,
(SELECT COUNT(`hash`)
FROM `views` WHERE tracking.hash = views.hash)) AS `views`
FROM tracking
LEFT JOIN products ON products.id = tracking.product_id
WHERE tracking.created BETWEEN '2019-01-01 00:00:00' AND '2020-02-10 00:00:00'
GROUP BY products.name
As you can see I have 2 unique views and 6 raw views (4 for one hash and 2 for the other hash).
My expectation would be for the query result to be 2 uniques and 6 raw views for this given product, but instead I'm getting 2 uniques and 4 raw views. Like it's counting the views only for the first hash.
The next query can solve your situation:
SELECT
products.name,
COUNT(DISTINCT `tracking`.`hash`) AS `uniques`, -- count unique hashes
COUNT(*) AS `views` -- count total
FROM `tracking`
JOIN `views` ON `views`.hash = tracking.hash
LEFT JOIN products ON products.id = tracking.product_id
WHERE tracking.created BETWEEN '2019-01-01 00:00:00' AND '2020-02-10 00:00:00'
GROUP BY products.name;
;

MySQL UPDATE query partly updating with default values

I am running an update query (in MySQL 5.6) on 2 tables joined like the following:
UPDATE c_cache cc
JOIN p_cache pc USING (user_id, attribute_id, calculation_quarter)
JOIN batch_table bt USING (user_id, attribute_id, calculation_quarter, client_id, group_code, version_id)
SET cc.epop = SUBSTRING(bt.result, 1, 1),
cc.excl = SUBSTRING(bt.result, 2, 1),
cc.num_result = SUBSTRING(bt.result, 3, 20),
cc.status = 'FR',
pc.epop = IF(bt.enrolled = 2, SUBSTRING(bt.result, 1, 1), 1),
pc.excl = IF(bt.enrolled = 2, SUBSTRING(bt.result, 2, 1), 0),
pc.num_result = IF(bt.enrolled = 2, SUBSTRING(bt.result, 3, 20), REPEAT('0', 20)),
pc.status = IF(pc.status = 'FL2', 'S', 'FR');
The p_cache is being updated properly, but the 3 columns in the c_cache is being set to NULLs and status to 'S' (all default values of these columns).
The rows that are being missed out are usually contiguous (in chunks).
This query is within a loop in a stored procedure that runs till all 'S' (stale) status rows of p_cache are marked 'FR' (fresh), i.e. computed.
(All rows of p_cache are present in c_cache with a one-to-one correspondence).
The batch_table picks up rows in batches of 25000 rows per iteration, and gets updated with computed results in result column through some stored functions.
This whole stored proc. is called from a MySQL event. Multiple events run simultaneously (each for an exclusive set of attributes) to find stale rows in the p_cache, and update both cache tables with computed results in batches using queries similar to this one.
This anomalous behavior happens only on the c_cache, but only sometimes.
The schema definitions are:
CREATE TABLE c_cache (
user_id INT(11) NOT NULL DEFAULT '0',
attribute_id INT(11) NOT NULL DEFAULT '0',
calculation_quarter DATE NOT NULL DEFAULT '0000-00-00',
version_id INT(11) NOT NULL DEFAULT '0',
epop TINYINT(1) DEFAULT NULL,
excl TINYINT(1) DEFAULT NULL,
num_result CHAR(20) DEFAULT NULL,
status ENUM('FR','S','FL1','FL2') NOT NULL DEFAULT 'S',
PRIMARY KEY (user_id, attribute_id, calculation_quarter, version_id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE p_cache (
user_id INT(11) NOT NULL DEFAULT '0',
attribute_id INT(11) NOT NULL DEFAULT '0',
calculation_quarter DATE NOT NULL DEFAULT '0000-00-00',
client_id INT(11) NOT NULL DEFAULT '0',
group_code CHAR(5) NOT NULL DEFAULT '',
epop TINYINT(1) DEFAULT NULL,
excl TINYINT(1) DEFAULT NULL,
num_result CHAR(20) DEFAULT NULL,
status ENUM('FR','S','FL1','FL2','S1','S2') NOT NULL DEFAULT 'S',
PRIMARY KEY (user_id,attribute_id,calculation_quarter,client_id,group_code),
KEY date_status_id_index (calculation_quarter,status,attribute_id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Can anyone kindly explain why this is happening and suggest a way to avoid this?
Thanks in advance.

specify conditions from outer query on a materialized subquery

i have got the below query which references couple of views 'goldedRunQueries' and 'currentGoldMarkings'. My issue seems to be from the view that is referred in the subquery - currentGoldMarkings. While execution, MySQL first materializes this subquery and then implements the where clauses of 'queryCode' and 'runId', which therefore results in execution time of more than hour as the view refers tables that has got millions of rows of data. My question is how do I enforce those two where conditions on the subquery before it materializes.
SELECT goldedRunQueries.queryCode, goldedRunQueries.runId
FROM goldedRunQueries
LEFT OUTER JOIN
( SELECT measuredRunId, queryCode, COUNT(resultId) as c
FROM currentGoldMarkings
GROUP BY measuredRunId, queryCode
) AS accuracy ON accuracy.measuredRunId = goldedRunQueries.runId
AND accuracy.queryCode = goldedRunQueries.queryCode
WHERE goldedRunQueries.queryCode IN ('CH001', 'CH002', 'CH003')
and goldedRunQueries.runid = 5000
ORDER BY goldedRunQueries.runId DESC, goldedRunQueries.queryCode;
Here are the two views. Both of these also get used in a standalone mode and so integrating any clauses into them is not possible.
CREATE VIEW currentGoldMarkings
AS
SELECT result.resultId, result.runId AS measuredRunId, result.documentId,
result.queryCode, result.queryValue AS measuredValue,
gold.queryValue AS goldValue,
CASE result.queryValue WHEN gold.queryValue THEN 1 ELSE 0 END AS correct
FROM results AS result
INNER JOIN gold ON gold.documentId = result.documentId
AND gold.queryCode = result.queryCode
WHERE gold.isCurrent = 1
CREATE VIEW goldedRunQueries
AS
SELECT runId, queryCode
FROM runQueries
WHERE EXISTS
( SELECT 1 AS Expr1
FROM runs
WHERE (runId = runQueries.runId)
AND (isManual = 0)
)
AND EXISTS
( SELECT 1 AS Expr1
FROM results
WHERE (runId = runQueries.runId)
AND (queryCode = runQueries.queryCode)
AND EXISTS
( SELECT 1 AS Expr1
FROM gold
WHERE (documentId = results.documentId)
AND (queryCode = results.queryCode)
)
)
Note: The above query reflects only a part of my actual query. There are 3 other left outer joins which are similar in nature to the above subquery which makes the problem far more worse.
EDIT: As suggested, here is the structure and some sample data for the tables
CREATE TABLE `results`(
`resultId` int auto_increment NOT NULL,
`runId` int NOT NULL,
`documentId` int NOT NULL,
`queryCode` char(5) NOT NULL,
`queryValue` char(1) NOT NULL,
`comment` varchar(255) NULL,
CONSTRAINT `PK_results` PRIMARY KEY
(
`resultId`
)
);
insert into results values (100, 242300, 'AC001', 'I', NULL)
insert into results values (100, 242300, 'AC001', 'S', NULL)
insert into results values (150, 242301, 'AC005', 'I', 'abc')
insert into results values (100, 242300, 'AC001', 'I', NULL)
insert into results values (109, 242301, 'PQ001', 'S', 'zzz')
insert into results values (400, 242400, 'DD006', 'I', NULL)
CREATE TABLE `gold`(
`goldId` int auto_increment NOT NULL,
`runDate` datetime NOT NULL,
`documentId` int NOT NULL,
`queryCode` char(5) NOT NULL,
`queryValue` char(1) NOT NULL,
`comment` varchar(255) NULL,
`isCurrent` tinyint(1) NOT NULL DEFAULT 0,
CONSTRAINT `PK_gold` PRIMARY KEY
(
`goldId`
)
);
insert into gold values ('2015-02-20 00:00:00', 138904, 'CH001', 'N', NULL, 1)
insert into gold values ('2015-05-20 00:00:00', 138904, 'CH001', 'N', 'aaa', 1)
insert into gold values ('2016-02-20 00:00:00', 138905, 'CH002', 'N', NULL, 0)
insert into gold values ('2015-12-12 00:00:00', 138804, 'CH001', 'N', 'zzzz', 1)
CREATE TABLE `runQueries`(
`runId` int NOT NULL,
`queryCode` char(5) NOT NULL,
CONSTRAINT `PK_runQueries` PRIMARY KEY
(
`runId`,
`queryCode`
)
);
insert into runQueries values (100, 'AC001')
insert into runQueries values (109, 'PQ001')
insert into runQueries values (400, 'DD006')
CREATE TABLE `runs`(
`runId` int auto_increment NOT NULL,
`runName` varchar(63) NOT NULL,
`isManual` tinyint(1) NOT NULL,
`runDate` datetime NOT NULL,
`comment` varchar(1023) NULL,
`folderName` varchar(63) NULL,
`documentSetId` int NOT NULL,
`pipelineVersion` varchar(50) NULL,
`isArchived` tinyint(1) NOT NULL DEFAULT 0,
`pipeline` varchar(50) NULL,
CONSTRAINT `PK_runs` PRIMARY KEY
(
`runId`
)
);
insert into runs values ('test1', 0, '2015-08-04 06:30:46.000000', 'zzzz', '2015-08-04_103046', 2, '2015-08-03', 0, NULL)
insert into runs values ('test2', 1, '2015-12-04 12:30:46.000000', 'zzzz', '2015-08-04_103046', 2, '2015-08-03', 0, NULL)
insert into runs values ('test3', 1, '2015-06-24 10:56:46.000000', 'zzzz', '2015-08-04_103046', 2, '2015-08-03', 0, NULL)
insert into runs values ('test4', 1, '2016-05-04 11:30:46.000000', 'zzzz', '2015-08-04_103046', 2, '2015-08-03', 0, NULL)
First, let's try to improve the performance via indexes:
results: INDEX(runId, queryCode) -- in either order
gold: INDEX(documentId, query_code, isCurrent) -- in that order
After that, update the CREATE TABLEs in the question and add the output of:
EXPLAIN EXTENDED SELECT ...;
SHOW WARNINGS;
What version are you running? You effectively have FROM ( SELECT ... ) JOIN ( SELECT ... ). Before 5.6, neither subquery had an index; with 5.6, an index is generated on the fly.
It is a shame that the query is built that way, since you know which one to use: and goldedRunQueries.runid = 5000.
Bottom Line: add the indexes; upgrade to 5.6 or 5.7; if that is not enough, then rethink the use of VIEWs.

optimising updating player scores table with results of queries on game results

I have a table containing the results of games played. I have another table for each player showing wins, losses, draws. I would like to update the player results table by analysing the games table. Currently calculation is done in php, and due to the number of games causes a delay in our database for about 4 seconds, which causes delays in general. I was thinking of moving the operation to a stored procedure to make it faster. Can anyone recommend a clever way of doing the calculation and subsequent updates to the player_chan_stats. I would like to do it entirely in mysql queries as this would probably be faster (assumption) than php.
This is an extract of our game result table
CREATE TABLE IF NOT EXISTS `temp_game_result` (
`gam_key` bigint(20) NOT NULL COMMENT 'the game key',
`gam_pla_1` bigint(20) NOT NULL COMMENT 'player 1',
`gam_pla_2` bigint(20) NOT NULL COMMENT 'player2',
`gam_to_play` tinyint(4) NOT NULL COMMENT 'who started',
`gam_start` datetime NOT NULL,
`gam_stop` datetime DEFAULT NULL,
`gam_status` enum('playing','win','draw','lose','error') NOT NULL COMMENT 'result with reference to gam_pla_1',
`mg_cleaned` tinyint(4) NOT NULL DEFAULT '0' COMMENT '0 if it has not passed thru cleanup, 1 otherwise',
`chn_key` bigint(20) NOT NULL COMMENT 'the tournament the game was for',
PRIMARY KEY (`gam_key`),
KEY `gam_status` (`gam_status`),
KEY `gam_start` (`gam_start`),
KEY `gam_stop` (`gam_stop`),
KEY `mg_cleaned` (`mg_cleaned`),
KEY `gam_pla_1` (`gam_pla_1`),
KEY `gam_pla_2` (`gam_pla_2`),
KEY `chn_key` (`chn_key`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `temp_game_result` (`gam_key`, `gam_pla_1`, `gam_pla_2`, `gam_to_play`, `gam_start`, `gam_stop`, `gam_status`, `mg_cleaned`, `chn_key`) VALUES
(1, 1, 2, 2, '2011-05-02 20:12:13', '2011-05-02 20:42:46', 'lose', 1, 1),
(2, 1, 2, 1, '2011-05-02 20:43:00', '2011-05-02 21:55:19', 'error', 1, 1),
(3, 2, 1, 1, '2011-05-03 21:13:18', '2011-05-03 21:14:21', 'win', 1, 1);
this is an extract of our player result table
CREATE TABLE IF NOT EXISTS `player_chan_stats` (
`pcs_key` bigint(20) NOT NULL AUTO_INCREMENT,
`pla_key` bigint(20) NOT NULL,
`chn_key` bigint(20) NOT NULL,
`pcs_seed` int(11) NOT NULL,
`pcs_rank` int(11) NOT NULL,
`pcs_games` int(11) NOT NULL DEFAULT '0',
`pcs_wins` int(11) NOT NULL DEFAULT '0',
`pcs_losses` int(11) NOT NULL DEFAULT '0',
`pcs_draws` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`pcs_key`),
UNIQUE KEY `pla_key_2` (`pla_key`,`chn_key`),
KEY `pla_key` (`pla_key`),
KEY `pcs_seed` (`pcs_seed`),
KEY `pcs_rank` (`pcs_rank`),
KEY `chn_key` (`chn_key`),
KEY `pcs_wins` (`pcs_wins`),
KEY `pcs_losses` (`pcs_losses`),
KEY `pcs_draws` (`pcs_draws`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COMMENT='Stats of player per channel' AUTO_INCREMENT=26354 ;
INSERT INTO `player_chan_stats` (`pcs_key`, `pla_key`, `chn_key`, `pcs_seed`, `pcs_rank`, `pcs_games`, `pcs_wins`, `pcs_losses`, `pcs_draws`) VALUES
(1, 1, 1, 1552, 1844, 325, 146, 176, 3),
(2, 2, 1, 1543, 2272, 93, 48, 43, 2);
Triggers may be your solution http://dev.mysql.com/doc/refman/5.0/en/triggers.html
a helpful trigger to you will be on insert (or update) in temp_game_result if gam_status is win update +1 to wins of the player...
will be (more or less)
CREATE TRIGGER update_wins AFTER UPDATE ON account
FOR EACH ROW
BEGIN
IF NEW.gam_status = 'win' THEN
update player_chan_stats set pcs_wins=pcs_wins+1 where psc_key=NEW.gam_pla_1;
update player_chan_stats set pcs_losses=pcs_losses +1 where psc_key=NEW.gam_pla_2;
ELSEIF NEW.gam_status = 'lose'
[...]
END IF;
END;