I have generated a calendar table, containing every date from 2000-01-01 until 2050-12-31.
Apart from that I also have the user table, this table contains the following columns:
id, created, is_profile_public
And lastly, I have a table which links my users to 1 or many organisations (this is optional, not every user will be linked to an organisation). This table is called user_organisation.
I want to fetch data for statistical purposes where I get the data from the earliest create date of my user until yesterday. And missing dates should just contain 0 values in every column.
I have created this query:
SELECT c.datefield, DATE(u.created) AS created,
SUM(case when u.is_profile_public=1 AND uo.user_id is null then 1 else 0 end) as amount_public_volunteers,
SUM(case when u.is_profile_public=0 AND uo.user_id is null then 1 else 0 end) as amount_private_volunteers,
SUM(case when u.is_profile_public=1 AND uo.user_id is not null then 1 else 0 end) as amount_public_volunteers_admin,
SUM(case when u.is_profile_public=0 AND uo.user_id is not null then 1 else 0 end) as amount_private_volunteers_admin
FROM calendar AS c
LEFT OUTER JOIN user AS u ON c.datefield = DATE(u.created)
LEFT JOIN (select max(organisation_id), user_id from user_organisation group by user_id) AS uo on uo.user_id=u.id
WHERE u.id IN (87, 89, 172, 185, 186, 341, 342, 343, 344, 443, 444, 445,
446, 455, 459, 463, 20, 94, 61, 100, 101, 102, 109, 112,
113, 115, 132, 166, 184, 198, 199, 203, 205, 206, 207, 271,
272, 273, 274, 275, 276, 277, 280, 278, 279, 281, 284, 282,
283, 285, 288, 286, 287, 289, 292, 290, 291, 293, 294, 295,
302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313,
318, 314, 316, 315, 319, 317, 324, 325, 326, 328, 330, 332,
340, 358, 369, 383, 384, 391, 395, 396, 397, 398, 405, 399,
406, 400, 409, 401) AND (c.datefield BETWEEN (SELECT MIN(DATE(created)) FROM user) AND DATE(NOW()))
GROUP BY c.datefield
This shows me only the dates on which the users have been created. But it does not give me any rows back on the dates where no users were created.
I have a query similar to:
SELECT
ANY_VALUE(name) AS `name`,
100 * SUM(score) / SUM(sum(score)) OVER (PARTITION BY date(scores.created_at)) AS `average_score`,
ANY_VALUE(DATE_FORMAT(scores.created_at, "%Y-%m-%d")) AS `shift_date`
FROM
`scores`
INNER JOIN `shifts` ON `shifts`.`id` = `scores`.`shift_id`
WHERE
`shifts`.`table_c_id` in(1, 2, 3, 4, 5, 6, 7, 8, 9, 10……)
AND date(`scores`.`created_at`) >= '2020-01-01'
GROUP BY
`name`,
date(scores.created_at)
ORDER BY
`shift_date` ASC;
The where in can be up to 2000 IDs which may not be sequential and the created_at where can be up to 14 months ago. Currently, at those levels, the execution time is 10-20 seconds.
I'm trying to optimise this. I've tried adding an index on created_at on the scores table but that had no effect. I also tried changing the date where clause to:
AND `scores`.`created_at` >= '2020-01-01 00:00:00
Which again made no difference.
Having read up on the topic, some recommended creating a temporary table but I can't see how this would have any benefit. I'm also not sure how to do this in one (is it even possible?) query.
The indexes on scores table are: shift_id, employee_id, name,created_at (used for another query). As I said, a created_at index didn't help this one.
The shifts table has indexes on table_c_id and created_at
Some sites suggest using WITH and CTEs, but again, I'm not sure how this would work or if the performance would actually improve.
The schema for scores and shifts is:
DROP TABLE IF EXISTS `scores`;
CREATE TABLE `scores` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`shift_id` int unsigned NOT NULL,
`hash` varchar(40) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`name` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`sscore` double(8,2) unsigned NOT NULL,
`created_at` timestamp NULL DEFAULT NULL
PRIMARY KEY (`id`),
KEY `scores_hash_index` (`hash`) USING BTREE,
KEY `scores_shift_id_index` (`shift_id`) USING BTREE,
KEY `scores_name_created_at_index` (`name`,`created_at`)
) ENGINE=InnoDB AUTO_INCREMENT=3140922 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
DROP TABLE IF EXISTS `shifts`;
CREATE TABLE `shifts` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`table_c_id` int unsigned NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `shifts_table_c_id_index` (`table_c_id`),
KEY `shifts_created_at_index` (`created_at`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=536392 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Update
Using a lookup table for names:
names table: int unsigned, id, primary; varchar, name
SELECT
names.name AS `name`,
100 * SUM(score) / SUM(sum(score)) OVER (PARTITION BY date(scores.created_at)) AS `average_score`,
ANY_VALUE(DATE_FORMAT(scores.created_at, "%Y-%m-%d")) AS `shift_date`
FROM
`scores`
INNER JOIN `shifts` ON `shifts`.`id` = `scores`.`shift_id`
INNER JOIN `names` ON `names`.id = `scores`.`name_id`
WHERE
`shifts`.`table_c_id` in(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506)
AND `scores`.`created_at` >= '2019-04-03'
GROUP BY
`names`.`name`,
date(scores.created_at)
ORDER BY
`shift_date` ASC;
Has given no benefit. Also an index on scores table for shift_id, name_id and created_at hasn't helped.
Plan A: Avoid windowing functions (many of them are slower than one would think.)
SELECT
ANY_VALUE(brand_name),
100 * SUM(score) / init.tot AS `average_score`,
DATE(scores.created_at) AS `shift_date`
FROM
`scores`
INNER JOIN `shifts` ON `shifts`.`id` = `scores`.`shift_id`
JOIN ( SELECT SUM(score) AS tot FROM shifts
WHERE table_c_id IN (...)
AND `created_at` >= '2020-01-01' ) AS init
WHERE
`shifts`.`table_c_id` in(1, 2, 3, 4, 5, 6, 7, 8, 9, 10……)
AND `scores`.`created_at` >= '2020-01-01'
GROUP BY
shift_date,
`brand_name`
ORDER BY
`shift_date` ASC;
Notes:
several changes with the state syntax.
I assumed that name and brand_name were the same
By flipping the GROUP BY order, it may avoid a second sort.
I used a derived table to compute the grand total, thereby obviating the need for OVER.
This composite, covering, index on scores may help:
INDEX(created_at, shift_id)
Plan B: Use a CTE to compute SUM(score), then finish the query.
I'm having trouble with a JOIN and a GROUP_CONCAT. The query is concatenating additional data that should not be associated with the join.
Here's my table structure:
linkages
ID table_name tag_id
1 subcategories 6
2 categories 9
music
ID artwork
1 5
2 4
artwork
ID url_path
1 /some/file/path
2 /some/file/path
And here's my query:
SELECT music.*,
artwork.url_path AS artwork_url_path,
GROUP_CONCAT( linkages.tag_id ) AS tag_ids,
GROUP_CONCAT( linkages.table_name ) AS table_name
FROM music
LEFT JOIN artwork ON artwork.id = music.artwork
LEFT JOIN linkages ON music.id = linkages.track_id
WHERE music.id IN( '1356',
'1357',
'719',
'169',
'170',
'171',
'805' )
ORDER BY FIELD( music.id,
1356,
1357,
719,
169,
170,
171,
805 )
This is the result of the GROUP_CONCAT :
[tag_ids] => 3, 6, 9, 17, 19, 20, 26, 49, 63, 64, 53, 57, 63, 65, 67, 73, 79, 80, 85, 96, 98, 11, 53, 67, 3, 6, 15, 17, 26, 38, 50, 63, 74, 53, 56, 57, 62, 63, 65, 66, 67, 72, 85, 88, 98, 24, 69, 71, 3, 6, 15, 17, 26, 38, 50
The first portion of the result is correct:
[tag_ids] => 3, 6, 9, 17, 19, 20, 26, 49, 63, 64, 53, 57, 63, 65, 67, 73, 79, 80, 85, 96, 98, 11, 53, 67
Everything after the correct values seems random and most of the values don't exist in the result in the database, but it's still pulling it in. It seems to repeat a portion of the correct result (3, 6, 15, 17 - the 3, 6, 17 are correct, but 15 shouldn't be there, similar with a bunch of other numbers - 71, etc. I can't use DISTINCT because I need to match up the tag_ids and table_name results as a multidimensional array from the results.
Any thoughts as to why?
UPDATE:
I ended up solving it with the initial push from Gordon. It needed a GROUP_BY clause, otherwise it was putting every results tag id's in each result. The final query ended up becoming this:
SET SESSION group_concat_max_len = 1000000;
SELECT
music.*,
artwork.url_path as artwork_url_path,
GROUP_CONCAT(linkages.tag_id, ':', linkages.table_name) as tags
FROM music
LEFT JOIN artwork ON artwork.id = music.artwork
LEFT JOIN linkages ON music.id = linkages.track_id
WHERE music.id IN('1356', '1357', '719', '169', '170', '171', '805')
GROUP BY music.id
ORDER BY FIELD(music.id,1356,1357,719,169,170,171,805);
Your join is generating duplicate rows. I would suggest that you fix the root cause of the problem. But, a quick-and-dirty solution is to use group_concat(distinct):
GROUP_CONCAT(DISTINCT linkages.tag_id) as tag_ids,
GROUP_CONCAT(DISTINCT linkages.table_name) as table_name
You can put the columns in a single field using GROUP_CONCAT():
GROUP_CONCAT(DISTINCT linkages.tag_id, ':', linkages.table_name) as tags
This is probably something very simple, so forgive my blonde moment :)
I have a table 'album'
* albumId
* albumOwnerId (who created)
* albumCSD (create stamp date)
Now what I am trying to do is to select the top 10 most recently updated albums. But, I don't want 10 albums from the same person coming back - I only want one album per unique person. I.E 10 albums from 10 different people.
So, this is what I have below, but it is not working properly and I just can't figure out why. Any ideas?
Thanks
SELECT DISTINCT(albumOwnerId), albumId
FROM album
ORDER BY albumCSD DESC
LIMIT 0,10
Here is some example data, followed by what I am trying to get. Hope this makes it clearer.
DATA:
albumOwnerID, albumId, albumCSD
18, 194, '2010-10-23 11:02:30'
23, 193, '2010-10-22 11:39:59'
22, 192, '2010-10-12 21:48:16'
21, 181, '2010-10-12 20:34:11'
21, 178, '2010-10-12 20:20:16'
19, 168, '2010-10-12 18:31:55'
18, 167, '2010-10-11 21:06:55'
20, 166, '2010-10-11 21:01:47'
18, 165, '2010-10-11 21:00:32'
20, 164, '2010-10-11 20:50:06'
17, 145, '2010-10-10 18:54:24'
17, 144, '2010-10-10 18:49:28'
17, 143, '2010-10-10 18:48:08'
17, 142, '2010-10-10 18:46:54'
16, 130, '2010-10-10 16:17:57'
16, 129, '2010-10-10 16:17:26'
16, 128, '2010-10-10 16:07:21'
15, 119, '2010-10-10 15:24:28'
15, 118, '2010-10-10 15:24:11'
14, 100, '2010-10-09 18:22:49'
14, 99, '2010-10-09 18:18:46'
11, 98, '2010-10-09 15:50:13'
11, 97, '2010-10-09 15:44:09'
11, 96, '2010-10-09 15:42:28'
11, 95, '2010-10-09 15:37:25'
DESIRED DATA:
18, 194, '2010-10-23 11:02:30'
23, 193, '2010-10-22 11:39:59'
22, 192, '2010-10-12 21:48:16'
21, 181, '2010-10-12 20:34:11'
19, 168, '2010-10-12 18:31:55'
17, 145, '2010-10-10 18:54:24'
16, 130, '2010-10-10 16:17:57'
15, 119, '2010-10-10 15:24:28'
14, 100, '2010-10-09 18:22:49'
11, 98, '2010-10-09 15:50:13'
I get results, you want to have, with this query
SELECT albumOwnerID, albumId, albumCSD
FROM album
WHERE albumCSD in
(SELECT Max(album.albumCSD) AS MaxvonalbumCSD
FROM album
GROUP BY album.albumOwnerID);
However in MS Access
select albumOwnerID, albumID
from album
Group by albumOwnerID, albumID
Order by albumcsd desc
LIMIT 0,10
EDIT:
select albumOwnerID, albumID
from album
where albumOwnerID in (select distinct albumOwnerID from album order by albumCSD )
LIMIT 0,10