SQL NOT IN still includes rows that should be excluded

SQL NOT IN still includes rows that should be excluded - mysql

I have the following statement to find rows that include certain values but exclude others:
SELECT *
FROM tests
WHERE author = 4
OR id = -999
OR id = 276
OR id = 343
OR id = 197
OR id = 170
OR id = 1058
OR id = 1328
OR id = 1417
AND is_deleted = 0
AND id NOT IN (457, 2409, 173, 400, 167, 277, 163, 404, 2222, 24, 26,
2457, 16, 25, 1639, 2224, 1804, 2308, 197, 461, 1442,
1594, 460, 1235, 1814, 2467, 168, 172, 170, 171, 2223, 2535, 2754)
However, I am still getting rows that should be exclude, as per the NOT IN list. For example, a test with the id, 16, should be excluded even though the tests.author = 4. But it is being returned in the query, which I don't want.
The statement is created programmatically depending on the situation.
Is there a syntax mistake that I'm making?

Have a look at SQL Server's operator precedence. You'll see that and has a higher precedence than or.
Say that you're looking for a fast car that is red or blue. If you write:
where speed = 'fast' and color = 'green' or color = 'blue'
SQL Server will read:
where (speed = 'fast' and color = 'green') or color = 'blue'
And in response to your query, SQL Server could return a slow blue car.

Change your query to this:
SELECT *
FROM tests
WHERE (author = 4 OR id = -999 OR id = 276 OR id = 343 OR id = 197 OR id = 170 OR id = 1058 OR id = 1328 OR id = 1417)
AND is_deleted = 0
AND id NOT IN (457, 2409, 173, 400, 167, 277, 163, 404, 2222, 24, 26, 2457, 16, 25, 1639, 2224, 1804, 2308, 197, 461, 1442, 1594, 460, 1235, 1814, 2467, 168, 172, 170, 171, 2223, 2535, 2754)
you have to put all your or in parenthesis.

Try this::
SELECT
*
FROM tests
WHERE
(author = 4
OR
id in (-999,276 ,343 ,197 ,170 ,1058 ,1328 ,1417)
AND is_deleted = 0 )
AND id NOT IN (457, 2409, 173, 400, 167, 277, 163, 404, 2222, 24, 26, 2457, 16, 25, 1639, 2224, 1804, 2308, 197, 461, 1442, 1594, 460, 1235, 1814, 2467, 168, 172, 170, 171, 2223, 2535, 2754)

Omair you are misplacing the '(' first of all just be clear that you want to select which author.
Suppose we need,
Authors having author = 4 or whose id is contained in -999, 343, 197 etc and whose deleted status = 0 and ID must not be in 457, 2409 ,...... etc.
What you did was,
author = 4 OR id = -999 OR id = 276 ...
AND is_deleted = 0
AND id NOT IN (457, 2409, 173, 400, 167, 277, 163, 404, 2222, 24, 26, ...)
This is interpreted according to operator precedence as
(author = 4 ) OR ( id = -999 OR id = 276 ...
AND is_deleted = 0
AND id NOT IN (457, 2409, 173, 400, 167, 277, 163, 404, 2222, 24, 26, ...)
)
Here, we just need to add proper '(' to separate our conditions as we need
((author = 4 ) OR ( id = -999 OR id = 276 ...)
AND (is_deleted = 0)
AND (id NOT IN (457, 2409, 173, 400, 167, 277, 163, 404, 2222, 24, 26, ...) )
)
So You can change SQL with proper brackets,
SELECT
*
FROM tests
WHERE
( (author = 4) OR id in (-999,276 ,343 ,197 ,170 ,1058 ,1328 ,1417) )
AND ( is_deleted = 0 )
AND ( id NOT IN (457, 2409, 173, 400, 167, 277, 163, 404, 2222, 24, 26, 2457, 16, 25, 1639, 2224, 1804, 2308, 197, 461, 1442, 1594, 460, 1235, 1814, 2467, 168, 172, 170, 171, 2223, 2535, 2754) )

Related

Show missing dates when joining on calendar table and filtering on certain users

I have generated a calendar table, containing every date from 2000-01-01 until 2050-12-31.
Apart from that I also have the user table, this table contains the following columns:
id, created, is_profile_public
And lastly, I have a table which links my users to 1 or many organisations (this is optional, not every user will be linked to an organisation). This table is called user_organisation.
I want to fetch data for statistical purposes where I get the data from the earliest create date of my user until yesterday. And missing dates should just contain 0 values in every column.
I have created this query:
SELECT c.datefield, DATE(u.created) AS created,
SUM(case when u.is_profile_public=1 AND uo.user_id is null then 1 else 0 end) as amount_public_volunteers,
SUM(case when u.is_profile_public=0 AND uo.user_id is null then 1 else 0 end) as amount_private_volunteers,
SUM(case when u.is_profile_public=1 AND uo.user_id is not null then 1 else 0 end) as amount_public_volunteers_admin,
SUM(case when u.is_profile_public=0 AND uo.user_id is not null then 1 else 0 end) as amount_private_volunteers_admin
FROM calendar AS c
LEFT OUTER JOIN user AS u ON c.datefield = DATE(u.created)
LEFT JOIN (select max(organisation_id), user_id from user_organisation group by user_id) AS uo on uo.user_id=u.id
WHERE u.id IN (87, 89, 172, 185, 186, 341, 342, 343, 344, 443, 444, 445,
446, 455, 459, 463, 20, 94, 61, 100, 101, 102, 109, 112,
113, 115, 132, 166, 184, 198, 199, 203, 205, 206, 207, 271,
272, 273, 274, 275, 276, 277, 280, 278, 279, 281, 284, 282,
283, 285, 288, 286, 287, 289, 292, 290, 291, 293, 294, 295,
302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313,
318, 314, 316, 315, 319, 317, 324, 325, 326, 328, 330, 332,
340, 358, 369, 383, 384, 391, 395, 396, 397, 398, 405, 399,
406, 400, 409, 401) AND (c.datefield BETWEEN (SELECT MIN(DATE(created)) FROM user) AND DATE(NOW()))
GROUP BY c.datefield
This shows me only the dates on which the users have been created. But it does not give me any rows back on the dates where no users were created.

Optimising query with large WHERE IN and Date clause

I have a query similar to:
SELECT
ANY_VALUE(name) AS `name`,
100 * SUM(score) / SUM(sum(score)) OVER (PARTITION BY date(scores.created_at)) AS `average_score`,
ANY_VALUE(DATE_FORMAT(scores.created_at, "%Y-%m-%d")) AS `shift_date`
FROM
`scores`
INNER JOIN `shifts` ON `shifts`.`id` = `scores`.`shift_id`
WHERE
`shifts`.`table_c_id` in(1, 2, 3, 4, 5, 6, 7, 8, 9, 10……)
AND date(`scores`.`created_at`) >= '2020-01-01'
GROUP BY
`name`,
date(scores.created_at)
ORDER BY
`shift_date` ASC;
The where in can be up to 2000 IDs which may not be sequential and the created_at where can be up to 14 months ago. Currently, at those levels, the execution time is 10-20 seconds.
I'm trying to optimise this. I've tried adding an index on created_at on the scores table but that had no effect. I also tried changing the date where clause to:
AND `scores`.`created_at` >= '2020-01-01 00:00:00
Which again made no difference.
Having read up on the topic, some recommended creating a temporary table but I can't see how this would have any benefit. I'm also not sure how to do this in one (is it even possible?) query.
The indexes on scores table are: shift_id, employee_id, name,created_at (used for another query). As I said, a created_at index didn't help this one.
The shifts table has indexes on table_c_id and created_at
Some sites suggest using WITH and CTEs, but again, I'm not sure how this would work or if the performance would actually improve.
The schema for scores and shifts is:
DROP TABLE IF EXISTS `scores`;
CREATE TABLE `scores` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`shift_id` int unsigned NOT NULL,
`hash` varchar(40) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`name` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`sscore` double(8,2) unsigned NOT NULL,
`created_at` timestamp NULL DEFAULT NULL
PRIMARY KEY (`id`),
KEY `scores_hash_index` (`hash`) USING BTREE,
KEY `scores_shift_id_index` (`shift_id`) USING BTREE,
KEY `scores_name_created_at_index` (`name`,`created_at`)
) ENGINE=InnoDB AUTO_INCREMENT=3140922 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
DROP TABLE IF EXISTS `shifts`;
CREATE TABLE `shifts` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`table_c_id` int unsigned NOT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `shifts_table_c_id_index` (`table_c_id`),
KEY `shifts_created_at_index` (`created_at`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=536392 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Update
Using a lookup table for names:
names table: int unsigned, id, primary; varchar, name
SELECT
names.name AS `name`,
100 * SUM(score) / SUM(sum(score)) OVER (PARTITION BY date(scores.created_at)) AS `average_score`,
ANY_VALUE(DATE_FORMAT(scores.created_at, "%Y-%m-%d")) AS `shift_date`
FROM
`scores`
INNER JOIN `shifts` ON `shifts`.`id` = `scores`.`shift_id`
INNER JOIN `names` ON `names`.id = `scores`.`name_id`
WHERE
`shifts`.`table_c_id` in(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506)
AND `scores`.`created_at` >= '2019-04-03'
GROUP BY
`names`.`name`,
date(scores.created_at)
ORDER BY
`shift_date` ASC;
Has given no benefit. Also an index on scores table for shift_id, name_id and created_at hasn't helped.

Plan A: Avoid windowing functions (many of them are slower than one would think.)
SELECT
ANY_VALUE(brand_name),
100 * SUM(score) / init.tot AS `average_score`,
DATE(scores.created_at) AS `shift_date`
FROM
`scores`
INNER JOIN `shifts` ON `shifts`.`id` = `scores`.`shift_id`
JOIN ( SELECT SUM(score) AS tot FROM shifts
WHERE table_c_id IN (...)
AND `created_at` >= '2020-01-01' ) AS init
WHERE
`shifts`.`table_c_id` in(1, 2, 3, 4, 5, 6, 7, 8, 9, 10……)
AND `scores`.`created_at` >= '2020-01-01'
GROUP BY
shift_date,
`brand_name`
ORDER BY
`shift_date` ASC;
Notes:
several changes with the state syntax.
I assumed that name and brand_name were the same
By flipping the GROUP BY order, it may avoid a second sort.
I used a derived table to compute the grand total, thereby obviating the need for OVER.
This composite, covering, index on scores may help:
INDEX(created_at, shift_id)
Plan B: Use a CTE to compute SUM(score), then finish the query.

MySQL query SELECT inside IN

I'm new in SQL, have a comlpex query for me which choose only one cheapest line from each cityId to each cityId:
SELECT cityIdFrom, cityIdTo, MIN(fromToPrice) fromToPrice_min
FROM pricing
WHERE cityIdFrom IN (91, 94, 95, 99)
AND cityIdTo IN (91, 94, 95, 99)
GROUP BY cityIdFrom, cityIdTo
So we take cheapest from 91 to 94, cheapest from 91 to 95 etc. How can I fix this query to SELECT lines for cityIdFrom 91 only when column Bull IS 1.
For example, if we have:
cityIdFrom - cityIdTo - fromToPrice - Bull
91, 94, 3000, 0
91, 94, 5000, 1
91, 95, 1000, 0
91, 99, 1500, 1
99, 95, 2000, 0
Our query will give us:
91, 94, 5000, 1
91, 99, 1500, 1
99, 95, 2000, 0
Thank for help!

I was all about one line in my Query:
AND (CityIdFrom = 91 AND Bull = 1) OR CityIdFrom <> 91
Many thanks to #ADyson, he helped me so much!

MySQL - group_concat pulling in additional incorrect data

I'm having trouble with a JOIN and a GROUP_CONCAT. The query is concatenating additional data that should not be associated with the join.
Here's my table structure:
linkages
ID table_name tag_id
1 subcategories 6
2 categories 9
music
ID artwork
1 5
2 4
artwork
ID url_path
1 /some/file/path
2 /some/file/path
And here's my query:
SELECT music.*,
artwork.url_path AS artwork_url_path,
GROUP_CONCAT( linkages.tag_id ) AS tag_ids,
GROUP_CONCAT( linkages.table_name ) AS table_name
FROM music
LEFT JOIN artwork ON artwork.id = music.artwork
LEFT JOIN linkages ON music.id = linkages.track_id
WHERE music.id IN( '1356',
'1357',
'719',
'169',
'170',
'171',
'805' )
ORDER BY FIELD( music.id,
1356,
1357,
719,
169,
170,
171,
805 )
This is the result of the GROUP_CONCAT :
[tag_ids] => 3, 6, 9, 17, 19, 20, 26, 49, 63, 64, 53, 57, 63, 65, 67, 73, 79, 80, 85, 96, 98, 11, 53, 67, 3, 6, 15, 17, 26, 38, 50, 63, 74, 53, 56, 57, 62, 63, 65, 66, 67, 72, 85, 88, 98, 24, 69, 71, 3, 6, 15, 17, 26, 38, 50
The first portion of the result is correct:
[tag_ids] => 3, 6, 9, 17, 19, 20, 26, 49, 63, 64, 53, 57, 63, 65, 67, 73, 79, 80, 85, 96, 98, 11, 53, 67
Everything after the correct values seems random and most of the values don't exist in the result in the database, but it's still pulling it in. It seems to repeat a portion of the correct result (3, 6, 15, 17 - the 3, 6, 17 are correct, but 15 shouldn't be there, similar with a bunch of other numbers - 71, etc. I can't use DISTINCT because I need to match up the tag_ids and table_name results as a multidimensional array from the results.
Any thoughts as to why?
UPDATE:
I ended up solving it with the initial push from Gordon. It needed a GROUP_BY clause, otherwise it was putting every results tag id's in each result. The final query ended up becoming this:
SET SESSION group_concat_max_len = 1000000;
SELECT
music.*,
artwork.url_path as artwork_url_path,
GROUP_CONCAT(linkages.tag_id, ':', linkages.table_name) as tags
FROM music
LEFT JOIN artwork ON artwork.id = music.artwork
LEFT JOIN linkages ON music.id = linkages.track_id
WHERE music.id IN('1356', '1357', '719', '169', '170', '171', '805')
GROUP BY music.id
ORDER BY FIELD(music.id,1356,1357,719,169,170,171,805);

Your join is generating duplicate rows. I would suggest that you fix the root cause of the problem. But, a quick-and-dirty solution is to use group_concat(distinct):
GROUP_CONCAT(DISTINCT linkages.tag_id) as tag_ids,
GROUP_CONCAT(DISTINCT linkages.table_name) as table_name
You can put the columns in a single field using GROUP_CONCAT():
GROUP_CONCAT(DISTINCT linkages.tag_id, ':', linkages.table_name) as tags

Getting top distinct records in MySQL

This is probably something very simple, so forgive my blonde moment :)
I have a table 'album'
* albumId
* albumOwnerId (who created)
* albumCSD (create stamp date)
Now what I am trying to do is to select the top 10 most recently updated albums. But, I don't want 10 albums from the same person coming back - I only want one album per unique person. I.E 10 albums from 10 different people.
So, this is what I have below, but it is not working properly and I just can't figure out why. Any ideas?
Thanks
SELECT DISTINCT(albumOwnerId), albumId
FROM album
ORDER BY albumCSD DESC
LIMIT 0,10
Here is some example data, followed by what I am trying to get. Hope this makes it clearer.
DATA:
albumOwnerID, albumId, albumCSD
18, 194, '2010-10-23 11:02:30'
23, 193, '2010-10-22 11:39:59'
22, 192, '2010-10-12 21:48:16'
21, 181, '2010-10-12 20:34:11'
21, 178, '2010-10-12 20:20:16'
19, 168, '2010-10-12 18:31:55'
18, 167, '2010-10-11 21:06:55'
20, 166, '2010-10-11 21:01:47'
18, 165, '2010-10-11 21:00:32'
20, 164, '2010-10-11 20:50:06'
17, 145, '2010-10-10 18:54:24'
17, 144, '2010-10-10 18:49:28'
17, 143, '2010-10-10 18:48:08'
17, 142, '2010-10-10 18:46:54'
16, 130, '2010-10-10 16:17:57'
16, 129, '2010-10-10 16:17:26'
16, 128, '2010-10-10 16:07:21'
15, 119, '2010-10-10 15:24:28'
15, 118, '2010-10-10 15:24:11'
14, 100, '2010-10-09 18:22:49'
14, 99, '2010-10-09 18:18:46'
11, 98, '2010-10-09 15:50:13'
11, 97, '2010-10-09 15:44:09'
11, 96, '2010-10-09 15:42:28'
11, 95, '2010-10-09 15:37:25'
DESIRED DATA:
18, 194, '2010-10-23 11:02:30'
23, 193, '2010-10-22 11:39:59'
22, 192, '2010-10-12 21:48:16'
21, 181, '2010-10-12 20:34:11'
19, 168, '2010-10-12 18:31:55'
17, 145, '2010-10-10 18:54:24'
16, 130, '2010-10-10 16:17:57'
15, 119, '2010-10-10 15:24:28'
14, 100, '2010-10-09 18:22:49'
11, 98, '2010-10-09 15:50:13'

I get results, you want to have, with this query
SELECT albumOwnerID, albumId, albumCSD
FROM album
WHERE albumCSD in
(SELECT Max(album.albumCSD) AS MaxvonalbumCSD
FROM album
GROUP BY album.albumOwnerID);
However in MS Access

select albumOwnerID, albumID
from album
Group by albumOwnerID, albumID
Order by albumcsd desc
LIMIT 0,10
EDIT:
select albumOwnerID, albumID
from album
where albumOwnerID in (select distinct albumOwnerID from album order by albumCSD )
LIMIT 0,10

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

SQL NOT IN still includes rows that should be excluded - mysql

Related

Show missing dates when joining on calendar table and filtering on certain users

Optimising query with large WHERE IN and Date clause

MySQL query SELECT inside IN

MySQL - group_concat pulling in additional incorrect data

Getting top distinct records in MySQL

Categories

Resources