I have a MyISAM table with 28,900 entires. I'm processing it in chunks of 1500, which a query like this:
SELECT * FROM table WHERE id>0 LIMIT $iStart,1500
Then I loop over this and increment $iStart by 1500 each time.
The problem is that the queries are returning the same rows in some cases. For example the LIMIT 0,1500 query returns some of the same rows as the LIMIT 28500,1500 query does.
If I don't ORDER the rows, can I not expect to use LIMIT for pagination?
(The table is static while these queries are happening, no other queries are going on that would alter its rows).
Like pretty much every other SQL engine out there, MySQL MyISAM tables make no guarantees at all about the order in which rows are returned unless you specify an ORDER BY clause. Typically the order they are returned in will be the order they were read off the filesystem in, which can change from query to query depending on updates, deletes, and even the state of cached selects.
If you want to avoid having the same row returned more than once then you must order by something, the primary key being the most obvious candidate.
You should use ORDER BY to ensure a consistent order. Otherwise the DBMS may return rows in an arbitrary order (however, one would assume that it's arbitrary but consistent between queries if the no rows are modified).
Related
I'm running the next queries:
1. select * from page LIMIT 10;
2. select page.id from page LIMIT 10;
"id" is the primary key in this table. How it can be possible that MySql returns different rows?
I'm using MySql 8.19
In the absence of an order by clause, row order is arbitrary, but typically whatever is the most convenient for the database engine - often in the order rows are laid out on disk, or whatever rows are in cache memory, etc.
However, for queries that refer only to indexed columns, which would be the case for the id column query, databases usually perform an index only scan. Without an order by clause, such a scan would typically return rows in the order the index values are laid out on disk, not the order the rows are laid out on disk. Hence the difference.
In PostgreSQL doc says:
The query optimizer takes LIMIT into account when generating query plans, so you are very likely to get different plans (yielding different row orders) depending on what you give for LIMIT and OFFSET. Thus, using different LIMIT/OFFSET values to select different subsets of a query result will give inconsistent results unless you enforce a predictable result ordering with ORDER BY. This is not a bug; it is an inherent consequence of the fact that SQL does not promise to deliver the results of a query in any particular order unless ORDER BY is used to constrain the order.
But in MySQL InnoDB table, they result will have been delivered in PRIMARY KEY.
Why the query give inconsistent results? What happens in the query?
What the Postgres documentation and your observations are telling you is that records in SQL tables have no internal order. Instead, database tables are modeled after unordered sets of records. Hence, in the following query which could run on either MySQL or Postgres:
SELECT *
FROM yourTable
LIMIT 5
The database is free to return whichever 5 records it wants. In the case of MySQL, if you are seeing an ordering based on primary key, it is only by coincidence, and MySQL does not offer any such contract that this would always happen.
To resolve this problem, you should always be using an ORDER BY clause when using LIMIT. So the following query is well-defined:
SELECT *
FROM yourTable
ORDER BY some_column
LIMIT 5
The default ordering ID of records in mysql is ASC (i.e. Rows that i insert goes down the table) but we'll be using only the latest information from the table (i.e. Rows that are below).
Will there be any performance improvements if we change the default ordering to DESC (i.e New records goes to the top of the table) and frequent information will be queried from top of the table.
I think it would be the opposite.
I'm basing this comment on how I understand indexes to work in SQL Server-I'll try to revise later if I get a chance to read up more on how they work in MySQL.
There could be a slight performance advantage to insert your rows in the same order as your index is sorted, versus inserting them in the opposite order.
If you insert in the same order, and your next row to insert is always greater in sort order than existing rows then you will always find the next available empty spot (when one exists) in your last page of rows data.
If you do the opposite, always have your next insert row lesser in sort order than existing rows then you will probably always have a collision in your first page of rows data and the engine will do a tad bit more work to shift the position of rows if the page has room for it.
As for your order by clause in the select statement:
1) there's nothing in the SQL standard about indexes, and nothing that guarantees your result set ordering except for the ORDER BY clause. Normally queries in SQL Server that use just one index will see results returned in the order of the index. But if the isolation level changes to "read uncommitted" (chaos?) then it will return rows in more likely in the order it finds them in memory or on disk which is not necessarily the order you want.
2) If the order by in your select statement is based on the exact same column criteria as the index, then your database server should perform the same with either the index order, or the opposite of the index order. This is pretty straightforward except perhaps if you have a multi-column index with mixed ASC-DESC declarations for different columns. You get away with equal performance with order by equal to index order and with order by equal to inverse index order where the inverse index order is determined by substituting the ASC and DESC declarations (explicit and implicit) in the index declaration with DESC and ASC in the order by clause.
Any performance change would be on querying the records, not inserting one.
For queries, I doubt this will have much affect as database queries by keys usually have similar speeds.
It also depends on your data so I would run some tests.
I am testing my database design under load and I need to retrieve only a fixed number of rows (5000)
I can specify a LIMIT to achieve this, however it seems that the query builds the result set of all rows that match and then returns only the number of rows specified in the limit. Is that how it is implemented?
Is there a for MySQL to read one row, read another one and basically stop when it retrieves the 5000th matching row?
MySQL is smart in that if you specify a LIMIT 5000 in your query, and it is possible to produce that result without generating the whole result set first, then it will not build the whole result.
For instance, the following query:
SELECT * FROM table ORDER BY column LIMIT 5000
This query will need to scan the whole table unless there is an index on column, in which case it does the smart thing and uses the index to find the rows with the smallest column.
SELECT * FROM `your_table` LIMIT 0, 5000
This will display the first 5000 results from the database.
SELECT * FROM `your_table` LIMIT 1001, 5000
This will show records from 1001 to 6000 (counting from 0).
Complexity of such query is O(LIMIT) (unless you specify order by).
It means that if 10000000 rows will match your query, and you specify limit equal to 5000, then the complexity will be O(5000).
#Jarosław Gomułka is right
If you use LIMIT with ORDER BY, MySQL ends the sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result. If ordering is done by using an index, this is very fast. In either case, after the initial rows have been found, there is no need to sort any remainder of the result set, and MySQL does not do so.
if the set is not sorted it terminates the SELECT operation as soon as it's got enough rows to the result set.
The exact plan the query optimizer uses depends on your query (what fields are being selected, the LIMIT amount and whether there is an ORDER BY) and your table (keys, indexes, and number of rows in the table). Selecting an unindexed column and/or ordering by a non-key column is going to produce a different execution plan than selecting a column and ordering by the primary key column. The later will not even touch the table, and only process the number of rows specified in your LIMIT.
Each database defines its own way of limiting the result set size depends on the database you are using.
While the SQL:2008 specification defines a standard syntax for limiting a SQL query, MySQL 8 does not support it.
Therefore, on MySQL, you need to use the LIMIT clause to restrict the result set to the Top-N records:
SELECT
title
FROM
post
ORDER BY
id DESC
LIMIT 50
Notice that we are using an ORDER BY clause since, otherwise, there is no guarantee which are the first records to be included in the returning result set.
I want to run a simple query to get the "n" oldest records in the table. (It has a creation_date column).
How can i get that without using "order-by". It is a very big table and using order by on entire table to get only "n" records is not so convincing.
(Assume n << size of table)
When you are concerned about performance, you should probably not discard the use of order by too early.
Queries like that can be implemende as Top-N query supported by an appropriate index, that's running very fast because it doesn't need to sort the entire table, not even the selecte rows, because the data is already sorted in the index.
example:
select *
from table
where A = ?
order by creation_date
limit 10;
without appropriate index it will be slow if you are having lot's of data. However, if you create an index like that:
create index test on table (A, creation_date );
The query will be able to start fetching the rows in the correct order, without sorting, and stop when the limit is reached.
Recipe: put the where columns in the index, followed by the order by columns.
If there is no where clause, just put the order by into the index. The order by must match the index definition, especially if there are mixed asc/desc orders.
The indexed Top-N query is the performance king--make sure to use them.
I few links for further reading (all mine):
How to use index efficienty in mysql query
http://blog.fatalmind.com/2010/07/30/analytic-top-n-queries/ (Oracle centric)
http://Use-The-Index-Luke.com/ (not yet covering Top-N queries, but that's to come in 2011).
I haven't tested this concept before but try and create an index on the creation_date column. Which will automatically sort the rows is ascending order. Then your select query can use the orderby creation_date desc with the Limit 20 to get the first 20 records. The database engine should realize the index has already done the work sorting and wont actually need to sort, because the index has already sorted it on save. All it needs to do is read the last 20 records from the index.
Worth a try.
Create an index on creation_date and query by using order by creation_date asc|desc limit n and the response will be very fast (in fact it cannot be faster). For the "latest n" scenario you need to use desc.
If you want more constraints on this query (e.g where state='LIVE') then the query may become very slow and you'll need to reconsider the indexing strategy.
You can use Group By if your grouping some data and then Having clause to select specific records.