What is the order of assignment in mysql? - mysql

What is the order of assignment in mysql?
set #rownum := 0;
explain select actor_id, first_name, #rownum as rownum
from actor
where #rownum <= 1
order by first_name, LEAST(0, #rownum:=#rownum + 1);

For anybody else looking in this query spits out 2 rows with 2 row numbers for example given
+----------+-------+
| sudentid | fname |
+----------+-------+
| 101 | NULL |
| 103 | NULL |
| 112 | NULL |
+----------+-------+
3 rows in set (0.00 sec)
and the query produces
+----------+-------+--------+
| sudentid | fname | rownum |
+----------+-------+--------+
| 101 | NULL | 1 |
| 103 | NULL | 2 |
+----------+-------+--------+
2 rows in set (0.00 sec)
with an explain plan of
+------+-------------+---------+------+---------------+------+---------+------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+---------+------+---------------+------+---------+------+------+----------------------------------------------+
| 1 | SIMPLE | student | ALL | NULL | NULL | NULL | NULL | 8 | Using where; Using temporary; Using filesort |
+------+-------------+---------+------+---------------+------+---------+------+------+----------------------------------------------+
1 row in set (0.00 sec)
The explain plan is in line with my expectation that order by is pretty much last (https://blog.jooq.org/2016/12/09/a-beginners-guide-to-the-true-order-of-sql-operations/)
but what appears to happening is that because the order by increments a variable that is used in the select there is another where clause invoked in the ACTUAL order of events which is not reflected in the explain plan (which I think reflects the LOGICAL order of events).
I don't know what the OP is trying to do here (or if the result is incorrect) but this is NOT the way row numbers would usually be assigned prior to version 8.

Related

How to make an efficient UPDATE like my SELECT in MariaDB

Background
I made a small table of 10 rows from a previous SELECT already ran (SavedAnimals).
I have a massive table (animals) which I would like to UPDATE using the rows with the same id as each row in my new table.
What I have tried so far
I can quickly SELECT the desired rows from the big table like this:
mysql> EXPLAIN SELECT * FROM animals WHERE ignored=0 and id IN (SELECT animal_id FROM SavedAnimals);
+------+--------------+-------------------------------+--------+---------------+---------+---------+----------------------------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------+-------------------------------+--------+---------------+---------+---------+----------------------------------------------------------+------+-------------+
| 1 | PRIMARY | <subquery2> | ALL | distinct_key | NULL | NULL | NULL | 10 | |
| 1 | PRIMARY | animals | eq_ref | PRIMARY | PRIMARY | 8 | db_staging.SavedAnimals.animal_id | 1 | Using where |
| 2 | MATERIALIZED | SavedAnimals | ALL | NULL | NULL | NULL | NULL | 10 | |
+------+--------------+-------------------------------+--------+---------------+---------+---------+----------------------------------------------------------+------+-------------+
But the "same" command on the UPDATE is not quick:
mysql> EXPLAIN UPDATE animals SET ignored=1, ignored_when=CURRENT_TIMESTAMP WHERE ignored=0 and id IN (SELECT animal_id FROM SavedAnimals);
+------+--------------------+-------------------------------+-------+---------------+---------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+--------------------+-------------------------------+-------+---------------+---------+---------+------+----------+-------------+
| 1 | PRIMARY | animals | index | NULL | PRIMARY | 8 | NULL | 34269464 | Using where |
| 2 | DEPENDENT SUBQUERY | SavedAnimals | ALL | NULL | NULL | NULL | NULL | 10 | Using where |
+------+--------------------+-------------------------------+-------+---------------+---------+---------+------+----------+-------------+
2 rows in set (0.00 sec)
The UPDATE command never finishes if I run it.
QUESTION
How do I make mariaDB run with the Materialized select_type on the UPDATE like it does on the SELECT?
OR
Is there a totally separate way that I should approach this which would be quick?
Notes
Version: 10.3.23-MariaDB-log
Use JOIN rather than WHERE...IN. MySQL tends to optimize them better.
UPDATE animals AS a
JOIN SavedAnimals AS sa ON a.id = sa.animal_id
SET a.ignored=1, a.ignored_when=CURRENT_TIMESTAMP
WHERE a.ignored = 0
You should find an EXISTS clause more efficient than an IN clause. For example:
UPDATE animals a
SET a.ignored = 1,
a.ignored_when = CURRENT_TIMESTAMP
WHERE a.ignored = 0
AND EXISTS (SELECT * FROM SavedAnimals sa WHERE sa.animal_id = a.id)

Why does the output of EXPLAIN change after each SHOW index?

I was trying to improve performance on some queries through indexes using EXPLAIN and I noticed each time I used SHOW index FROM TableB; the output of the rows colums in the EXPLAIN of a query changed
Ex:
mysql> EXPLAIN Select A.id
From TableA A
Inner join TableB B
On A.address = B.address And A.code = B.code
Group by A.id
Having count(distinct B.id) = 1;
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
| 1 | SIMPLE | B | index | test_index | PRIMARY | 518 | NULL | 10561 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | A | eq_ref | PRIMARY | PRIMARY | 514 | db.B.address,db.B.code | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
2 rows in set (0.00 sec)
mysql> show index from TableB;
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| TableB | 0 | PRIMARY | 1 | id | A | 7 | NULL | NULL | | BTREE | |
| TableB | 0 | PRIMARY | 2 | address | A | 21 | NULL | NULL | | BTREE | |
| TableB | 0 | PRIMARY | 3 | code | A | 10402 | NULL | NULL | | BTREE | |
| TableB | 1 | test_index | 1 | address | A | 1 | NULL | NULL | | BTREE | |
| TableB | 1 | test_index | 2 | code | A | 10402 | NULL | NULL | | BTREE | |
| TableB | 1 | test_index | 3 | id | A | 10402 | NULL | NULL | | BTREE | |
+-----------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
6 rows in set (0.03 sec)
and...
mysql> EXPLAIN Select A.id
From TableA A
Inner join TableB B
On A.address = B.address And A.code = B.code Group by A.id
Having count(distinct B.id) = 1;
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
| 1 | SIMPLE | B | index | test_index | PRIMARY | 518 | NULL | 9800 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | A | eq_ref | PRIMARY | PRIMARY | 514 | db.B.address,db.B.code | 1 | |
+----+-------------+-------+--------+---------------+---------+---------+---------------------------------------+-------+----------------------------------------------+
2 rows in set (0.00 sec)
Why does this happen?
The rows column should be taken as a rough estimate only. It's not a precise number.
It's based on statistical estimates of how many rows will be examined during a query. The actual number of rows cannot be known until you actually execute the query.
The statistics are based on samples read from the table periodically. These samples are re-read occasionally, for example after you run ANALYZE TABLE or certain INFORMATION_SCHEMA queries, or certain SHOW statements.
I don't find 20% variation in statistics to be a big deal. In many situations, think of the graph being like an upturned parabola, and you need to know which side of the minimum point you are on. In complex queries, where the Optimizer is likely to goof, it need a lot more than simple stats, such as Histograms of MariaDB 10.0 / 10.1. (I don't have enough experience with such to say whether that makes much headway.)
Your particular query is probably going to be performed in only one way, regardless of the statistics. An example of a complicated query would be a JOIN with WHERE clauses filtering each table. The optimizer has to decide which table to start with. Another case is a single table with a WHERE and ORDER BY and they cannot both be handled by a single index -- should it use an index to filter, but then have to sort? or should it use an index for ORDER BY, but then have to filter on the fly?

How to improve Limit clause in MySQL

I have the posts table with 10k rows and I want to create pagination by that. So I have the next query for that purpose:
SELECT post_id
FROM posts
LIMIT 0, 10;
When I Explain that query I get the next result:
So I don't understand why MySql need to iterate thru 9976 rows for finding the 10 first rows? I will be very thankful if somebody help me to optimize this query.
Also I know about that topic MySQL ORDER BY / LIMIT performance: late row lookups, but the problem still exist even if I modify the query to the next one:
SELECT t.post_id
FROM (
SELECT post_id
FROM posts
ORDER BY
post_id
LIMIT 0, 10
) q
JOIN posts t
ON q.post_id = t.post_id
Update
#pala_'s solution works great for above simple case but now while I am testing a more complex query with inner join. My purpose is to join comment table with post table and unfortunately when I Explain new query is still iterate through 9976 rows.
Select comm.comment_id
from comments as comm
inner join (
SELECT post_id
FROM posts
ORDER BY post_id
LIMIT 0, 10
) as paged_post on comm.post_id = paged_post.post_id;
Do you have some idea what is the reason of such MySQL behavior ?
Try this:
SELECT post_id
FROM posts
ORDER BY post_id DESC
LIMIT 0, 10;
Pagination via LIMIT doesn't make much sense without ordering anyway, and it should fix your problem.
mysql> explain select * from foo;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | foo | index | NULL | PRIMARY | 4 | NULL | 20 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
mysql> explain select * from foo limit 0, 10;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | foo | index | NULL | PRIMARY | 4 | NULL | 20 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
mysql> explain select * from foo order by id desc limit 0, 10;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | foo | index | NULL | PRIMARY | 4 | NULL | 10 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
Regarding your last comments about the comment join. Do you have an index on comment(post_id)? with my test data I'm getting the following results:
mysql> alter table comments add index pi (post_id);
Query OK, 0 rows affected (0.15 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> explain select c.id from comments c inner join (select id from posts o order by id limit 0, 10) p on c.post_id = p.id;
+----+-------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 10 | |
| 1 | PRIMARY | c | ref | pi | pi | 5 | p.id | 4 | Using where; Using index |
| 2 | DERIVED | o | index | NULL | PRIMARY | 4 | NULL | 10 | Using index |
+----+-------------+------------+-------+---------------+---------+---------+------+------+--------------------------+
and for table size reference:
mysql> select count(*) from posts;
+----------+
| count(*) |
+----------+
| 15021 |
+----------+
1 row in set (0.01 sec)
mysql> select count(*) from comments;
+----------+
| count(*) |
+----------+
| 1000 |
+----------+
1 row in set (0.00 sec)

mysql column count doesn't match value count (redux): yes it does. or does it?

Is MySQL giving me grief because the nested SELECT in the insert statement uses the COUNT(*) function instead of selecting an actual column? So, what's the workaround?
Here's the story:
mysql> explain test;
+----------+----------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+----------------------+------+-----+---------+-------+
| language | varchar(50) | YES | | NULL | |
| count | smallint(5) unsigned | YES | | NULL | |
+----------+----------------------+------+-----+---------+-------+
2 rows in set (0.00 sec)
mysql> SELECT languages.name, COUNT(*) AS `total` FROM languages JOIN events ON languages.id = events.language_id GROUP BY name HAVING total > 250 ORDER BY total DESC;
+-----------+-------+
| name | total |
+-----------+-------+
| Spanish | 60079 |
| Foochow | 2838 |
| Mandarin | 2396 |
| Russian | 1675 |
| Arabic | 1410 |
| Cantonese | 1358 |
| Korean | 736 |
| French | 531 |
| Punjabi | 426 |
| Urdu | 408 |
| Hebrew | 276 |
| Pashto | 255 |
+-----------+-------+
12 rows in set (0.00 sec)
mysql> INSERT INTO test (`language`,`count`) VALUES ((SELECT languages.`name`, COUNT(*) AS `total` FROM languages JOIN events ON languages.id = events.language_id GROUP BY name HAVING total > 250 ORDER BY total DESC));
ERROR 1136 (21S01): Column count doesn't match value count at row 1
thanks.
MySQL doesn't support this sort of multiple-column-returning subquery, so the error message you're seeing is because the VALUES clause only contains one subquery, which is perforce (in a sense) only one column.
To fix it, you can skip the VALUES syntax, and just write:
INSERT
INTO test (`language`,`count`)
SELECT languages.`name`, COUNT(*) AS `total`
FROM languages
JOIN events
ON languages.id = events.language_id
GROUP
BY name
HAVING total > 250
ORDER
BY total DESC
;
(See ยง13.2.5.1 "INSERT ... SELECT Syntax" in the MySQL 5.6 Reference Manual.)
INSERT INTO test (`language`,`count`)
should be
INSERT INTO test (language,count)

Category with lot of pages (huge offsets) (how does stackoverflow work?)

I think that my question can be solved by just knowing how, for example, stackoverflow works.
For example, this page, loads in a few ms (< 300ms):
https://stackoverflow.com/questions?page=61440&sort=newest
The only query i can think about for that page is something like SELECT * FROM stuff ORDER BY date DESC LIMIT {pageNumber}*{stuffPerPage}, {pageNumber}*{stuffPerPage}+{stuffPerPage}
A query like that might take several seconds to run, but the stack overflow page loads almost in no time. It can't be a cached query, since that question are posted over time and rebuild the cache every time a question is posted is simply madness.
So, how do this works in your opinion?
(to make the question easier, let's forget about the ORDER BY)
Example (the table is fully cached in ram and stored in an ssd drive)
mysql> select * from thread limit 1000000, 1;
1 row in set (1.61 sec)
mysql> select * from thread limit 10000000, 1;
1 row in set (16.75 sec)
mysql> describe select * from thread limit 1000000, 1;
+----+-------------+--------+------+---------------+------+---------+------+----------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------+
| 1 | SIMPLE | thread | ALL | NULL | NULL | NULL | NULL | 64801163 | |
+----+-------------+--------+------+---------------+------+---------+------+----------+-------+
mysql> select * from thread ORDER BY thread_date DESC limit 1000000, 1;
1 row in set (1 min 37.56 sec)
mysql> SHOW INDEXES FROM thread;
+--------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| thread | 0 | PRIMARY | 1 | newsgroup_id | A | 102924 | NULL | NULL | | BTREE | | |
| thread | 0 | PRIMARY | 2 | thread_id | A | 47036298 | NULL | NULL | | BTREE | | |
| thread | 0 | PRIMARY | 3 | postcount | A | 47036298 | NULL | NULL | | BTREE | | |
| thread | 0 | PRIMARY | 4 | thread_date | A | 47036298 | NULL | NULL | | BTREE | | |
| thread | 1 | date | 1 | thread_date | A | 47036298 | NULL | NULL | | BTREE | | |
+--------+------------+----------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
5 rows in set (0.00 sec)
Create a BTREE index on date column and the query will run in a breeze.
CREATE INDEX date ON stuff(date) USING BTREE
UPDATE: Here is a test I just did:
CREATE TABLE test( d DATE, i INT, INDEX(d) );
Filled the table with 2,000,000 rows with different unique is and ds
mysql> SELECT * FROM test LIMIT 1000000, 1;
+------------+---------+
| d | i |
+------------+---------+
| 1897-07-22 | 1000000 |
+------------+---------+
1 row in set (0.66 sec)
mysql> SELECT * FROM test ORDER BY d LIMIT 1000000, 1;
+------------+--------+
| d | i |
+------------+--------+
| 1897-07-22 | 999980 |
+------------+--------+
1 row in set (1.68 sec)
And here is an interesiting observation:
mysql> EXPLAIN SELECT * FROM test ORDER BY d LIMIT 1000, 1;
+----+-------------+-------+-------+---------------+------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------+
| 1 | SIMPLE | test | index | NULL | d | 4 | NULL | 1001 | |
+----+-------------+-------+-------+---------------+------+---------+------+------+-------+
mysql> EXPLAIN SELECT * FROM test ORDER BY d LIMIT 10000, 1;
+----+-------------+-------+------+---------------+------+---------+------+---------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+---------+----------------+
| 1 | SIMPLE | test | ALL | NULL | NULL | NULL | NULL | 2000343 | Using filesort |
+----+-------------+-------+------+---------------+------+---------+------+---------+----------------+
MySql does use the index for OFFSET 1000 but not for 10000.
Even more interesting, if I do FORCE INDEX query takes more time:
mysql> SELECT * FROM test FORCE INDEX(d) ORDER BY d LIMIT 1000000, 1;
+------------+--------+
| d | i |
+------------+--------+
| 1897-07-22 | 999980 |
+------------+--------+
1 row in set (2.21 sec)
I think StackOverflow doesn't need to reach the rows at offset 10000000. The query below should be fast enough if you have an index on date and the numbers in LIMIT clause are from real world examples, not millions :)
SELECT *
FROM stuff
ORDER BY date DESC
LIMIT {pageNumber}*{stuffPerPage}, {stuffPerPage}
UPDATE:
If records in a table are relatively rarely deleted (like in StackOverflow) then you can use the following solution:
SELECT *
FROM stuff
WHERE id between
{stuffCount}-{pageNumber}*{stuffPerPage}+1 AND
{stuffCount}-{pageNumber-1}*{stuffPerPage}
ORDER BY id DESC
Where {stuffCount} is:
SELECT MAX(id) FROM stuff
If you have some deleted records in a database then some pages will have less than {stuffPerPage} records, but it should not be the problem. StackOverflow uses some inaccurate algorithm too. For instance try to go to the first page and to the last page and you'll see that both pages return 30 records per page. But mathematically it's nonsense.
Solutions designed to work with large databases often uses some hacks which usually are unnoticeable for regular users.
Nowadays paging with millions of records is not modish, because it's impractical. Currently it's popular to use infinite scrolling (automatic or manual with button click). It has more sense and pages load faster because they don't need to be reloaded. If you think that old records can be useful for your users too, then it's a good idea to create a page with random records (with infinite scrolling too). This was my opinion :)