Performance loss when using UNION ALL to add an arbitrary tuple - mysql

I wanted to add an arbitrary tuple to a SELECT result containing 1.8M rows. I decided to use the UNION operator like this :
SELECT
id as id
FROM
user
UNION
SELECT
-1 as id
Which returns :
+---+
| id|
+---+
| -1|
+---+
| 01|
+---+
| 02|
+---+
|...|
+---+
However the performance loss between the queries with and without the UNION operator is tremendous. I tried using a UNION ALL statement like this :
SELECT
id as id
FROM
user
UNION ALL
SELECT
-1 as id
Which - I thought - could have been the reason behind the perf loss but the performance impairment is still there.
Am I missing something ? I simply want to add an extra arbitrary tuple to the SELECT result.

Using UNION (or UNION ALL) appears to result in a temporary table being created. See this bug for reference.
Creating a temporary table with 1.8M rows is likely the cause of your slowdown.
In good news, 5.7.3 appears to change this behavior in some situations. See last post in the linked bug report.

Try using union all instead of union. union removes duplicates:
SELECT id as id
FROM user
UNION ALL
SELECT -1 as id
If -1 could be a valid value -- and you only want it to appear once -- then you can do:
SELECT id as id
FROM user
UNION ALL
SELECT u.id
FROM (SELECT -1 as id) u
WHERE NOT EXISTS (SELECT 1 FROM user WHERE user.id = u.id)

Related

Joining pre-defined, possibly non-existing keys with table data

In MySQL (or SQL in general), is it possible to generate a list of pre-defined identifiers, joined with matching table data?
Take for instance the following table data, let's call it my_table:
id | value
---+------
1 | 'a'
3 | 'c'
Now, I have a list of possible id values and would like to get a full list of these values, together with joined data from the table above. With a list [1, 2, 3, 4], the desired result is:
item | id | value
-----+------+------
1 | 1 | 'a'
2 | NULL | NULL
3 | 3 | 'c'
4 | NULL | NULL
Obviously, a query like SELECT * FROM my_table WHERE id IN (1, 2, 3, 4) yields only results for two rows (values 'a' and 'c').
For a solution, I am thinking along the line of some form of temporary table, fed with the full list of id's ([1, 2, 3, 4]) and left joining that with the table data, such as
SELECT t1.`item`, t2.`id`, t2.`value`
FROM
...
AS t1
LEFT JOIN `my_table` AS t2 ON t2.`id` = t1.`item`
But how do I do that?
Is this even possible? Or is it really necessary to compare the result with the initial list in external code? (This would be possible, but not trivial as in my case, the identifiers are not integers)
(The ultimate idea of this, is that I would like a result set from the DB with all input id's so that I can easily identify the non-existing records)
Update: I guess it boils down to the question: how can I get a result set such as
id
---
1
2
3
4
from a (My)SQL server without having this as data in some table, but from setting the data in some query?
A new approach flashed into my mind... using a union.
SELECT t1.`item`, t2.`id`, t2.`value`
FROM (
select 1 as `item`
union select 2
union select 3
union select 4
) AS t1
LEFT JOIN `my_table` AS t2 ON t2.`id` = t1.`item`
It answers the question, but it remains to be seen whether this is the 'best' answer. It works as long as the list of items is not too long (which is the case for me).
Anyone a better solution?

Does mysql (InnoDB) systematically build table for subquery? When won't he build it? [duplicate]

I'm trying to find out if a row exists in a table. Using MySQL, is it better to do a query like this:
SELECT COUNT(*) AS total FROM table1 WHERE ...
and check to see if the total is non-zero or is it better to do a query like this:
SELECT * FROM table1 WHERE ... LIMIT 1
and check to see if any rows were returned?
In both queries, the WHERE clause uses an index.
You could also try EXISTS:
SELECT EXISTS(SELECT * FROM table1 WHERE ...)
and per the documentation, you can SELECT anything.
Traditionally, an EXISTS subquery starts with SELECT *, but it could
begin with SELECT 5 or SELECT column1 or anything at all. MySQL
ignores the SELECT list in such a subquery, so it makes no difference.
I have made some researches on this subject recently. The way to implement it has to be different if the field is a TEXT field, a non unique field.
I have made some tests with a TEXT field. Considering the fact that we have a table with 1M entries. 37 entries are equal to 'something':
SELECT * FROM test WHERE text LIKE '%something%' LIMIT 1 with
mysql_num_rows() : 0.039061069488525s. (FASTER)
SELECT count(*) as count FROM test WHERE text LIKE '%something% :
16.028197050095s.
SELECT EXISTS(SELECT 1 FROM test WHERE text LIKE '%something%') :
0.87045907974243s.
SELECT EXISTS(SELECT 1 FROM test WHERE text LIKE '%something%' LIMIT 1) : 0.044898986816406s.
But now, with a BIGINT PK field, only one entry is equal to '321321' :
SELECT * FROM test2 WHERE id ='321321' LIMIT 1 with
mysql_num_rows() : 0.0089840888977051s.
SELECT count(*) as count FROM test2 WHERE id ='321321' : 0.00033879280090332s.
SELECT EXISTS(SELECT 1 FROM test2 WHERE id ='321321') : 0.00023889541625977s.
SELECT EXISTS(SELECT 1 FROM test2 WHERE id ='321321' LIMIT 1) : 0.00020313262939453s. (FASTER)
A short example of #ChrisThompson's answer
Example:
mysql> SELECT * FROM table_1;
+----+--------+
| id | col1 |
+----+--------+
| 1 | foo |
| 2 | bar |
| 3 | foobar |
+----+--------+
3 rows in set (0.00 sec)
mysql> SELECT EXISTS(SELECT 1 FROM table_1 WHERE id = 1);
+--------------------------------------------+
| EXISTS(SELECT 1 FROM table_1 WHERE id = 1) |
+--------------------------------------------+
| 1 |
+--------------------------------------------+
1 row in set (0.00 sec)
mysql> SELECT EXISTS(SELECT 1 FROM table_1 WHERE id = 9);
+--------------------------------------------+
| EXISTS(SELECT 1 FROM table_1 WHERE id = 9) |
+--------------------------------------------+
| 0 |
+--------------------------------------------+
1 row in set (0.00 sec)
Using an alias:
mysql> SELECT EXISTS(SELECT 1 FROM table_1 WHERE id = 1) AS mycheck;
+---------+
| mycheck |
+---------+
| 1 |
+---------+
1 row in set (0.00 sec)
In my research, I can find the result getting on following speed.
select * from table where condition=value
(1 total, Query took 0.0052 sec)
select exists(select * from table where condition=value)
(1 total, Query took 0.0008 sec)
select count(*) from table where condition=value limit 1)
(1 total, Query took 0.0007 sec)
select exists(select * from table where condition=value limit 1)
(1 total, Query took 0.0006 sec)
I feel it is worth pointing out, although it was touched on in the comments, that in this situation:
SELECT 1 FROM my_table WHERE *indexed_condition* LIMIT 1
Is superior to:
SELECT * FROM my_table WHERE *indexed_condition* LIMIT 1
This is because the first query can be satisfied by the index, whereas the second requires a row look up (unless possibly all the table's columns are in the index used).
Adding the LIMIT clause allows the engine to stop after finding any row.
The first query should be comparable to:
SELECT EXISTS(SELECT * FROM my_table WHERE *indexed_condition*)
Which sends the same signals to the engine (1/* makes no difference here), but I'd still write the 1 to reinforce the habit when using EXISTS:
SELECT EXISTS(SELECT 1 FROM my_table WHERE *indexed_condition*)
It may make sense to add the EXISTS wrapping if you require an explicit return when no rows match.
Suggest you not to use Count because count always makes extra loads for db use SELECT 1 and it returns 1 if your record right there otherwise it returns null and you can handle it.
At times it is quite handy to get the auto increment primary key (id) of the row if it exists and 0 if it doesn't.
Here's how this can be done in a single query:
SELECT IFNULL(`id`, COUNT(*)) FROM WHERE ...
A COUNT query is faster, although maybe not noticeably, but as far as getting the desired result, both should be sufficient.
For non-InnoDB tables you could also use the information schema tables:
http://dev.mysql.com/doc/refman/5.1/en/tables-table.html
I'd go with COUNT(1). It is faster than COUNT(*) because COUNT(*) tests to see if at least one column in that row is != NULL. You don't need that, especially because you already have a condition in place (the WHERE clause). COUNT(1) instead tests the validity of 1, which is always valid and takes a lot less time to test.
Or you can insert raw sql part to conditions
so I have
'conditions'=>array('Member.id NOT IN (SELECT Membership.member_id FROM memberships AS Membership)')
COUNT(*) are optimized in MySQL, so the former query is likely to be faster, generally speaking.

Best way to check if a row matching a condition exists at least once [duplicate]

I'm trying to find out if a row exists in a table. Using MySQL, is it better to do a query like this:
SELECT COUNT(*) AS total FROM table1 WHERE ...
and check to see if the total is non-zero or is it better to do a query like this:
SELECT * FROM table1 WHERE ... LIMIT 1
and check to see if any rows were returned?
In both queries, the WHERE clause uses an index.
You could also try EXISTS:
SELECT EXISTS(SELECT * FROM table1 WHERE ...)
and per the documentation, you can SELECT anything.
Traditionally, an EXISTS subquery starts with SELECT *, but it could
begin with SELECT 5 or SELECT column1 or anything at all. MySQL
ignores the SELECT list in such a subquery, so it makes no difference.
I have made some researches on this subject recently. The way to implement it has to be different if the field is a TEXT field, a non unique field.
I have made some tests with a TEXT field. Considering the fact that we have a table with 1M entries. 37 entries are equal to 'something':
SELECT * FROM test WHERE text LIKE '%something%' LIMIT 1 with
mysql_num_rows() : 0.039061069488525s. (FASTER)
SELECT count(*) as count FROM test WHERE text LIKE '%something% :
16.028197050095s.
SELECT EXISTS(SELECT 1 FROM test WHERE text LIKE '%something%') :
0.87045907974243s.
SELECT EXISTS(SELECT 1 FROM test WHERE text LIKE '%something%' LIMIT 1) : 0.044898986816406s.
But now, with a BIGINT PK field, only one entry is equal to '321321' :
SELECT * FROM test2 WHERE id ='321321' LIMIT 1 with
mysql_num_rows() : 0.0089840888977051s.
SELECT count(*) as count FROM test2 WHERE id ='321321' : 0.00033879280090332s.
SELECT EXISTS(SELECT 1 FROM test2 WHERE id ='321321') : 0.00023889541625977s.
SELECT EXISTS(SELECT 1 FROM test2 WHERE id ='321321' LIMIT 1) : 0.00020313262939453s. (FASTER)
A short example of #ChrisThompson's answer
Example:
mysql> SELECT * FROM table_1;
+----+--------+
| id | col1 |
+----+--------+
| 1 | foo |
| 2 | bar |
| 3 | foobar |
+----+--------+
3 rows in set (0.00 sec)
mysql> SELECT EXISTS(SELECT 1 FROM table_1 WHERE id = 1);
+--------------------------------------------+
| EXISTS(SELECT 1 FROM table_1 WHERE id = 1) |
+--------------------------------------------+
| 1 |
+--------------------------------------------+
1 row in set (0.00 sec)
mysql> SELECT EXISTS(SELECT 1 FROM table_1 WHERE id = 9);
+--------------------------------------------+
| EXISTS(SELECT 1 FROM table_1 WHERE id = 9) |
+--------------------------------------------+
| 0 |
+--------------------------------------------+
1 row in set (0.00 sec)
Using an alias:
mysql> SELECT EXISTS(SELECT 1 FROM table_1 WHERE id = 1) AS mycheck;
+---------+
| mycheck |
+---------+
| 1 |
+---------+
1 row in set (0.00 sec)
In my research, I can find the result getting on following speed.
select * from table where condition=value
(1 total, Query took 0.0052 sec)
select exists(select * from table where condition=value)
(1 total, Query took 0.0008 sec)
select count(*) from table where condition=value limit 1)
(1 total, Query took 0.0007 sec)
select exists(select * from table where condition=value limit 1)
(1 total, Query took 0.0006 sec)
I feel it is worth pointing out, although it was touched on in the comments, that in this situation:
SELECT 1 FROM my_table WHERE *indexed_condition* LIMIT 1
Is superior to:
SELECT * FROM my_table WHERE *indexed_condition* LIMIT 1
This is because the first query can be satisfied by the index, whereas the second requires a row look up (unless possibly all the table's columns are in the index used).
Adding the LIMIT clause allows the engine to stop after finding any row.
The first query should be comparable to:
SELECT EXISTS(SELECT * FROM my_table WHERE *indexed_condition*)
Which sends the same signals to the engine (1/* makes no difference here), but I'd still write the 1 to reinforce the habit when using EXISTS:
SELECT EXISTS(SELECT 1 FROM my_table WHERE *indexed_condition*)
It may make sense to add the EXISTS wrapping if you require an explicit return when no rows match.
Suggest you not to use Count because count always makes extra loads for db use SELECT 1 and it returns 1 if your record right there otherwise it returns null and you can handle it.
At times it is quite handy to get the auto increment primary key (id) of the row if it exists and 0 if it doesn't.
Here's how this can be done in a single query:
SELECT IFNULL(`id`, COUNT(*)) FROM WHERE ...
A COUNT query is faster, although maybe not noticeably, but as far as getting the desired result, both should be sufficient.
For non-InnoDB tables you could also use the information schema tables:
http://dev.mysql.com/doc/refman/5.1/en/tables-table.html
I'd go with COUNT(1). It is faster than COUNT(*) because COUNT(*) tests to see if at least one column in that row is != NULL. You don't need that, especially because you already have a condition in place (the WHERE clause). COUNT(1) instead tests the validity of 1, which is always valid and takes a lot less time to test.
Or you can insert raw sql part to conditions
so I have
'conditions'=>array('Member.id NOT IN (SELECT Membership.member_id FROM memberships AS Membership)')
COUNT(*) are optimized in MySQL, so the former query is likely to be faster, generally speaking.

SQL NOT IN [list of ids] (performance)

I'm just wondering if the amount of id's in a list will influence query performance.
query example:
SELECT * FROM foos WHERE foos.ID NOT IN (2, 4, 5, 6, 7)
Where (2, 4, 5, 6, 7) is an indefinitely long list.
And how many is too many (in context of order)?
UPDATE: The reason why i'm asking it because i have two db. On of it (read-only) is the source of items and another one contain items that is processed by operator. Every time when operator asking for new item from read-only db I want to exclude item that is already processed.
Yes, the amount of IDs in the list will impact performance. A network packet is only so big, for example, and the database has to parse all that noise and turn it into a series of:
WHERE foo.ID <> 2
AND foo.ID <> 4
AND foo.ID <> 5
AND ...
You should consider other ways to let your query know about this set.
Here is wacky rewrite of that query that might perform a little better
SELECT * FROM foos
LEFT JOIN
(
SELECT 2 id UNION
SELECT 4 UNION
SELECT 5 UNION
SELECT 6 UNION
SELECT 7
) NOT_IDS
USING (id) WHERE NOT_IDS.id IS NULL;
The NOT_IDS subquery does work as shown by the following:
mysql> SELECT * FROM
-> (
-> SELECT 2 id UNION
-> SELECT 4 UNION
-> SELECT 5 UNION
-> SELECT 6 UNION
-> SELECT 7
-> ) NOT_IDS;
+----+
| id |
+----+
| 2 |
| 4 |
| 5 |
| 6 |
| 7 |
+----+
5 rows in set (0.00 sec)
mysql>
Just for fun, and given your update, I'm going to suggest a different strategy:
You could join across tables like so ...
insert into db1.foos (cols)
select cols
from db2.foos src
left join db1.foos dst
on src.pk = dst.pk
where dst.othercolumn is null
I'm not sure how the optimizer will handle this or if it's going to be faster (depends on your indexing strategy, I guess) than what you're doing.
The db's are in the same server? If yes you can make a multi-db query with a left join and take the null ones. (here an example: Querying multiple databases at once ) . Otherwise you can make a stored procedure, pass the id's with a string, and split them inside with a regular expression. I have a similar problem, but within an in-memory db and a postgres db. Luckly my situation is (In...)

mysql fake select

My purpose is: to get multiple rows from a value list,like (1,2,3,4,5),('a','b','c','anything') and so on.
mysql> select id from accounts where id in (1,2,3,4,5,6);
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
| 5 |
| 6 |
+----+
5 rows in set (0.00 sec)
The above sql is surely ok,but my question is:is there a way to get the same result without
specifying a table?Because my purpose here is just to propagate rows by an id_set
another example:
mysql> select now() as column1;
+---------------------+
| column1 |
+---------------------+
| 2009-06-01 20:59:33 |
+---------------------+
1 row in set (0.00 sec)
mysql>
This example propagated a single row result without specifying a table,
but how to propagate multiple rows from a string like (1,2,3,4,5,6)?
Something like this should work:
SELECT 0 as id
UNION SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
Afterwards, you can select what you need from it by giving it an alias:
SELECT *
FROM (
SELECT 0 as id
UNION SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
) `table1`
MySQL has a dummy table: DUAL. but using DUAL doesn't change anything (it's just for convenience), and certainly doesn't make this query work.
I'm sure there's a better way to achieve what you're trying to do. We might be able to help if you explain your problem.
This does not answer your question exactly, but I believe this will fix your actual problem..
SET #counter = 0;
SELECT (#counter := #counter + 1 as counter) ... rest of your query
A simple and old fashioned way is to use a table which holds consecutive values.
DROP TABLE IF EXISTS `range10`;
CREATE TABLE IF NOT EXISTS `range10` (
`id` int(11) NOT NULL,
KEY `id` (`id`)
) ENGINE=MyISAM;
INSERT INTO `range10` (`id`) VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
Once installed you can write queries as shown below.
get every second row:
select * from your_data_table where id in (
SELECT id*2 as id FROM `range10` WHERE id in(
select id from `range10`
)
)
get rows from 1101 to 1111:
select * from your_data_table where id in (
SELECT id+1100 as id FROM `range10` WHERE id in(
select id from `range10`
)
)
So if you are in the need of greater ranges, then just increase the size of the consecutive values in table range10.
Querying is simple, cost are low, no stored procedure or function needed.
Note:
You can create a table with consecutive char values, too. But varying the contents would not be so easy.
One technique I've found invaluable is an "integers table", which lets you easily do all kinds of neat things including this one (xaprb has written several blog posts on this technique and the closely related "mutex table" one).
Here a way to create custom rows directly with MySQL request SELECT :
SELECT ALL *
FROM (
VALUES
ROW ('1.1', '1.2', '1.3'),
ROW ('2.1', '2.2', '2.3'),
ROW ('3.1', '3.2', '3.3')
) AS dummy (c1, c2, c3)
Gives us a table dummy:
c1 c2 c3
-------------
1.1 1.2 1.3
2.1 2.2 2.3
3.1 3.2 3.3