SQL NOT IN [list of ids] (performance) - mysql

I'm just wondering if the amount of id's in a list will influence query performance.
query example:
SELECT * FROM foos WHERE foos.ID NOT IN (2, 4, 5, 6, 7)
Where (2, 4, 5, 6, 7) is an indefinitely long list.
And how many is too many (in context of order)?
UPDATE: The reason why i'm asking it because i have two db. On of it (read-only) is the source of items and another one contain items that is processed by operator. Every time when operator asking for new item from read-only db I want to exclude item that is already processed.

Yes, the amount of IDs in the list will impact performance. A network packet is only so big, for example, and the database has to parse all that noise and turn it into a series of:
WHERE foo.ID <> 2
AND foo.ID <> 4
AND foo.ID <> 5
AND ...
You should consider other ways to let your query know about this set.

Here is wacky rewrite of that query that might perform a little better
SELECT * FROM foos
LEFT JOIN
(
SELECT 2 id UNION
SELECT 4 UNION
SELECT 5 UNION
SELECT 6 UNION
SELECT 7
) NOT_IDS
USING (id) WHERE NOT_IDS.id IS NULL;
The NOT_IDS subquery does work as shown by the following:
mysql> SELECT * FROM
-> (
-> SELECT 2 id UNION
-> SELECT 4 UNION
-> SELECT 5 UNION
-> SELECT 6 UNION
-> SELECT 7
-> ) NOT_IDS;
+----+
| id |
+----+
| 2 |
| 4 |
| 5 |
| 6 |
| 7 |
+----+
5 rows in set (0.00 sec)
mysql>

Just for fun, and given your update, I'm going to suggest a different strategy:
You could join across tables like so ...
insert into db1.foos (cols)
select cols
from db2.foos src
left join db1.foos dst
on src.pk = dst.pk
where dst.othercolumn is null
I'm not sure how the optimizer will handle this or if it's going to be faster (depends on your indexing strategy, I guess) than what you're doing.

The db's are in the same server? If yes you can make a multi-db query with a left join and take the null ones. (here an example: Querying multiple databases at once ) . Otherwise you can make a stored procedure, pass the id's with a string, and split them inside with a regular expression. I have a similar problem, but within an in-memory db and a postgres db. Luckly my situation is (In...)

Related

Union as sub query using MySQL 8

I'm wanting to optimize a query using a union as a sub query.
Im not really sure how to construct the query though.
I'm using MYSQL 8.0.12
Here is the original query:
---------------
| c1 | c2 |
---------------
| 18182 | 0 |
| 18015 | 0 |
---------------
2 rows in set (0.35 sec)
I'm sorry but the question doesn't stored if I paste the sql query as text and format using ctrl+k
Output expected
---------------
| c1 | c2 |
---------------
| 18182 | 167 |
| 18015 | 0 |
---------------
As a output I would like to have the difference of rows between the two tables in UNION ALL.
I processed this question using the wizard https://stackoverflow.com/questions/ask
Since a parenthesized SELECT can be used almost anywhere a expression can go:
SELECT
ABS( (SELECT COUNT(*) FROM tbl_aaa) -
(SELECT COUNT(*) FROM tbl_bbb) ) AS diff;
Also, MySQL is happy to allow a SELECT without a FROM.
There are several ways to go for this, including UNION, but I wouldn't recommend it, as it is IMO a bit 'hacky'. Instead, I suggest you use subqueries or use CTEs.
With subqueries
SELECT
ABS(c_tbl_aaa.size - c_tbl_bbb.size) as diff
FROM (
SELECT
COUNT(*) as size
FROM tbl_aaa
) c_tbl_aaa
CROSS JOIN (
SELECT
COUNT(*) as size
FROM tbl_bbb
) c_tbl_bbb
With CTEs, also known as WITHs
WITH c_tbl_aaa AS (
SELECT
COUNT(*) as size
FROM tbl_aaa
), c_tbl_bbb AS (
SELECT
COUNT(*) as size
FROM tbl_bbb
)
SELECT
ABS(c_tbl_aaa.size - c_tbl_bbb.size) as diff
FROM c_tbl_aaa
CROSS JOIN c_tbl_bbb
In a practical sense, they are the same. Depending on the needs, you might want to define and join the results though, and in said cases, you could use a single number as a "pseudo id" in the select statement.
Since you only want to know the differences, I used the ABS function, which returns the absolute value of a number.
Let me know if you want a solution with UNIONs anyway.
Edit: As #Rick James pointed out, COUNT(*) should be used in the subqueries to count the number of rows, as COUNT(id_***) will only count the rows with non-null values in that field.

How to extract strings occurring after a certain character in MySQL?

If, I have a string:
'#name#user#user2#laugh#cry'
I would like to print,
name
user
user2
laugh
cry
All the strings are different and have a different number of '#'.
I have tried using Regex but it's not working. What logic has to be applied for this query?
The first thing to say is that storing delimited list of values in text columns is, in many ways, not a good database design. You should basically rework your database structure, or prepare for a potential world of pain.
A quick and dirty solution is to use a numbers table, or an inline suquery, and to cross join it with the table ; REGEXP_SUBSTR() (available in MySQL 8.0), lets you select a given occurence of a particular pattern.
Here is a query that will extract up to 10 values from the column:
SELECT
REGEXP_SUBSTR(t.val, '[^#]+', 1, numbers.n) name
FROM
mytable t
INNER JOIN (
SELECT 1 n UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4
UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7
UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 10
) numbers
ON REGEXP_SUBSTR(t.val, '[^#]+', 1, numbers.n) IS NOT NULL
Regexp [^#]+ means: as many consecutive characters as possible other than #.
Ths demo on DB Fiddle, when given input string '#name#user#user2#laugh#cry', returns:
| name |
| ----- |
| name |
| user |
| user2 |
| laugh |
| cry |

Joining pre-defined, possibly non-existing keys with table data

In MySQL (or SQL in general), is it possible to generate a list of pre-defined identifiers, joined with matching table data?
Take for instance the following table data, let's call it my_table:
id | value
---+------
1 | 'a'
3 | 'c'
Now, I have a list of possible id values and would like to get a full list of these values, together with joined data from the table above. With a list [1, 2, 3, 4], the desired result is:
item | id | value
-----+------+------
1 | 1 | 'a'
2 | NULL | NULL
3 | 3 | 'c'
4 | NULL | NULL
Obviously, a query like SELECT * FROM my_table WHERE id IN (1, 2, 3, 4) yields only results for two rows (values 'a' and 'c').
For a solution, I am thinking along the line of some form of temporary table, fed with the full list of id's ([1, 2, 3, 4]) and left joining that with the table data, such as
SELECT t1.`item`, t2.`id`, t2.`value`
FROM
...
AS t1
LEFT JOIN `my_table` AS t2 ON t2.`id` = t1.`item`
But how do I do that?
Is this even possible? Or is it really necessary to compare the result with the initial list in external code? (This would be possible, but not trivial as in my case, the identifiers are not integers)
(The ultimate idea of this, is that I would like a result set from the DB with all input id's so that I can easily identify the non-existing records)
Update: I guess it boils down to the question: how can I get a result set such as
id
---
1
2
3
4
from a (My)SQL server without having this as data in some table, but from setting the data in some query?
A new approach flashed into my mind... using a union.
SELECT t1.`item`, t2.`id`, t2.`value`
FROM (
select 1 as `item`
union select 2
union select 3
union select 4
) AS t1
LEFT JOIN `my_table` AS t2 ON t2.`id` = t1.`item`
It answers the question, but it remains to be seen whether this is the 'best' answer. It works as long as the list of items is not too long (which is the case for me).
Anyone a better solution?

Performance loss when using UNION ALL to add an arbitrary tuple

I wanted to add an arbitrary tuple to a SELECT result containing 1.8M rows. I decided to use the UNION operator like this :
SELECT
id as id
FROM
user
UNION
SELECT
-1 as id
Which returns :
+---+
| id|
+---+
| -1|
+---+
| 01|
+---+
| 02|
+---+
|...|
+---+
However the performance loss between the queries with and without the UNION operator is tremendous. I tried using a UNION ALL statement like this :
SELECT
id as id
FROM
user
UNION ALL
SELECT
-1 as id
Which - I thought - could have been the reason behind the perf loss but the performance impairment is still there.
Am I missing something ? I simply want to add an extra arbitrary tuple to the SELECT result.
Using UNION (or UNION ALL) appears to result in a temporary table being created. See this bug for reference.
Creating a temporary table with 1.8M rows is likely the cause of your slowdown.
In good news, 5.7.3 appears to change this behavior in some situations. See last post in the linked bug report.
Try using union all instead of union. union removes duplicates:
SELECT id as id
FROM user
UNION ALL
SELECT -1 as id
If -1 could be a valid value -- and you only want it to appear once -- then you can do:
SELECT id as id
FROM user
UNION ALL
SELECT u.id
FROM (SELECT -1 as id) u
WHERE NOT EXISTS (SELECT 1 FROM user WHERE user.id = u.id)

Fetching linked list in MySQL database

I have a MySQL database table with this structure:
table
id INT NOT NULL PRIMARY KEY
data ..
next_id INT NULL
I need to fetch the data in order of the linked list. For example, given this data:
id | next_id
----+---------
1 | 2
2 | 4
3 | 9
4 | 3
9 | NULL
I need to fetch the rows for id=1, 2, 4, 3, 9, in that order. How can I do this with a database query? (I can do it on the client end. I am curious if this can be done on the database side. Thus, saying it's impossible is okay (given enough proof)).
It would be nice to have a termination point as well (e.g. stop after 10 fetches, or when some condition on the row turns true) but this is not a requirement (can be done on client side). I (hope I) do not need to check for circular references.
Some brands of database (e.g. Oracle, Microsoft SQL Server) support extra SQL syntax to run "recursive queries" but MySQL does not support any such solution.
The problem you are describing is the same as representing a tree structure in a SQL database. You just have a long, skinny tree.
There are several solutions for storing and fetching this kind of data structure from an RDBMS. See some of the following questions:
"What is the most efficient/elegant way to parse a flat table into a tree?"
"Is it possible to make a recursive SQL query ?"
Since you mention that you'd like to limit the "depth" returned by the query, you can achieve this while querying the list this way:
SELECT * FROM mytable t1
LEFT JOIN mytable t2 ON (t1.next_id = t2.id)
LEFT JOIN mytable t3 ON (t2.next_id = t3.id)
LEFT JOIN mytable t4 ON (t3.next_id = t4.id)
LEFT JOIN mytable t5 ON (t4.next_id = t5.id)
LEFT JOIN mytable t6 ON (t5.next_id = t6.id)
LEFT JOIN mytable t7 ON (t6.next_id = t7.id)
LEFT JOIN mytable t8 ON (t7.next_id = t8.id)
LEFT JOIN mytable t9 ON (t8.next_id = t9.id)
LEFT JOIN mytable t10 ON (t9.next_id = t10.id);
It'll perform like molasses, and the result will come back all on one row (per linked list), but you'll get the result.
If what you are trying to avoid is having several queries (one for each node) and you are able to add columns, then you could have a new column that links to the root node. That way you can pull in all the data at once by the root id, but you will still have to sort the list (or tree) on the client side.
So in this is example you would have:
id | next_id | root_id
----+---------+---------
1 | 2 | 1
2 | 4 | 1
3 | 9 | 1
4 | 3 | 1
9 | NULL | 1
Of course the disadvantage of this as opposed to traditional linked lists or trees is that the root cannot change without writing on an order of magnitude of O(n) where n is the number of nodes. This is because you would have to update the root id for each node. Fortunately though you should always be able to do this in a single update query unless you are dividing a list/tree in the middle.
This is less a solution and more of a workaround but, for a linear list (rather than the tree Bill Karwin mentioned), it might be more efficient to use a sort column on your list. For example:
TABLE `schema`.`my_table` (
`id` INT NOT NULL PRIMARY KEY,
`order` INT,
data ..,
INDEX `ix_order` (`sort_order` ASC)
);
Then:
SELECT * FROM `schema`.`my_table` ORDER BY `order`;
This has the disadvantage of slower inserts (you have to reposition all sorted elements past the insertion point) but should be fast for retrieval because the order column is indexed.