I need to fetch data in batch wise. Example 1 to 1000, 1001 to 2000
Query: Select * from Employee limit 1, 1000
Select * from Employee limit 1001, 1000
Here no order by is used in this query. Will the second query returns duplicate data? or it will follow any sorting techniques?
This question was previously called a "duplicate" of The order of a SQL Select statement without Order By clause . That is inappropriate as a "duplicate" link because it refers to engines other than MySQL. However, the effect is "correct". That is, you must use ORDER BY; do not assume the table is in some order.
I brought this question back to life because of a more subtle part of the question, referring to a common cause of duplicates.
This
Select * from Employee limit 1001, 1000
has two problems:
LIMIT without an ORDER BY is asking for trouble (as discussed in the link)
You appear to be doing "pagination" and you mentioned "returns duplicate data". I bring this up because you can get dups even if you have an ORDER BY. To elaborate...
OFFSET is implemented by stepping over rows.
Between getting N rows and getting the next N rows, some rows could be INSERTed or DELETEd in the 'previous' rows. This messes up the OFFSET, leading to either "duplicate" or "missing" rows.
More discussion, plus an alternative to OFFSET: Pagination It involves "remembering where you left off".
Specific to InnoDB:
The data's BTree is ordered by the PRIMARY KEY. That is predictable, but
The query does not necessarily use the "table" to fetch the rows. It might use a 'covering' INDEX, whose BTree is sorted by a secondary key!
For grins... MyISAM:
The data is initially ordered by when the rows were inserted.
That order may change as Inserts and Deletes, and even Updates, occur.
And the query may use a covering index (Primary or secondary).
Related
I have a mySQL database with several tables. But one tabel contains af 1,400,000 rows. I need to get the 50,000 rows with the highest value in one field. A field that counts visitors.
SELECT uid, title, number, views
FROM ct
WHERE hidden = 0
AND deleted = 0
AND number > 0
AND status LIKE 'active'
order by views desc
limit 0, 50000
It is extremely slow. The database is InnoDB and title, number and views are all indexed.
How can I speed up the selection?
From EXPLAIN:
1 SIMPLE ct ALL number_index NULL NULL NULL 1465440 Using where; Using filesort
After indexing Status
From EXPLAIN:
1 SIMPLE ct range number_index,status status 302 NULL 732720 Using index condition; Using where; Using filesort
How many rows do you have matching
WHERE hidden = 0
AND deleted = 0
AND number > 0
AND status LIKE 'active'
?
If the answer is more than 70000 or so, then the short answer is that there is nothing you can do to speed things up. If it is significantly less then you will get some improvement with an index on hidden, deleted, number and status - how much of a speed up depends on the order of these attributes in the index and the cardinality for each of the attributes (nint: you want your highest cardinality entries first).
This composite index may speed it up:
INDEX(hidden, deleted, status, number, views)
The real problem is shoveling 50K rows to the client. What the heck will the client do with that many rows?
However, "the 50,000 rows with the highest value in one field" is not what your query finds. Perhaps you first need to find "the highest value" in that field. Then search for all rows with that value in that field??
Showing 50k rows is always going to be costly. Just transferring the result set over the network is going to take a while. So there's a limit to how much you can "optimize" the query if the result set is that large. I'd seriously reconsider a design that required a result of 50k rows.
As for the best index for this query, the usual rule applies: use a compound index, consisting of the columns in equality conditions first, then ONE column used in range conditions. In your case, I would suggest:
alter table cte add index (hidden, deleted, status, number)
The first three may be in any order, since they're all equality conditions. Then number because it's a range condition. There's no way to optimize the ORDER BY, because the range condition spoils that.
A comment asked about partitioning or other methods of optimizing. I don't think it's likely that partitioning pruning will help.
The other method of optimizing I'd use is archiving. How many of the 1.4 million rows are hidden or deleted? Why not move those rows to another table, or to cold storage, or simply delete them. That would keep the active table smaller, and easier to keep in the buffer pool.
Say you have a table with n rows, what is the most efficient way to get the first row ever recorded on that table without sorting?
This is guaranteed to work, but becomes slower as the number of records increases:
SELECT * FROM posts ORDER BY created_at DESC LIMIT 1;
UPDATE:
This is even better in case there are multiple records with the same created_at value, but still needs sorting:
SELECT * FROM posts ORDER BY id ASC LIMIT 1;
Imagine a ledger book with 1 million pages and 1 billion lines of records, to get the first ever record, you'd simply turn to the first page and get the one on the top most, right? Regardless of the size of the ledger, you should get the first ever record with the same efficiency. I was hoping I could do the same in MySQL without doing any kind of sorting or ordering. For research purposes. I mean, why not? Why can't MySQL? Is it impossible by design?
This is possible in typical array structures in programming:
array = [1,2,3,4,5]
The first element is in array[0], the second in array[1] and so on. There is no sorting necessary. The last element is array[array_count(array)-1].
I can offer the following two queries to find the most recent record:
SELECT * FROM posts ORDER BY created_at DESC LIMIT 1
and
SELECT *
FROM posts
WHERE created_at = (SELECT MAX(created_at) FROM posts
Both queries would suffer from performance degredation as the table gets larger, because the sorting operation needed to find the most recent created date would take more time.
But in both cases, adding the following index should improve the performance of the query:
ALTER TABLE posts ADD INDEX created_idx (created_at)
MySQL can use an index both in the ORDER BY clause and when finding the max. See the documentation for more information.
I am having a problem with the following task using MySQL. I have a table Records(id,enterprise, department, status). Where id is the primary key, and enterprise and department are foreign keys, and status is an integer value (0-CREATED, 1 - APPROVED, 2 - REJECTED).
Now, usually the application need to filter something for a concrete enterprise and department and status:
SELECT * FROM Records WHERE status = 0 AND enterprise = 11 AND department = 21
ORDER BY id desc LIMIT 0,10;
The order by is required, since I have to provide the user with the most recent records. For this query I have created an index (enterprise, department, status), and everything works fine. However, for some privileged users the status should be omitted:
SELECT * FROM Records WHERE enterprise = 11 AND department = 21
ORDER BY id desc LIMIT 0,10;
This obviously breaks the index - it's still good for filtering, but not for sorting. So, what should I do? I don't want create a separate index (enterprise, department), so what if I modify the query like this:
SELECT * FROM Records WHERE enterprise = 11 AND department = 21
AND status IN (0,1,2)
ORDER BY id desc LIMIT 0,10;
MySQL definitely does use the index now, since it's provided with values of status, but how quick will the sorting by primary key be? Will it take the recent 10 values for each status available, and then merge them, or will it first merge the ids for each status together, and only after that take the first ten (this way it's gonna be much slower I guess).
All of the queries will benefit from one composite query:
INDEX(enterprise, department, status, id)
enterprise and department can swapped, but keep the rest of the columns in that order.
The first query will use that index for both the WHERE and the ORDER BY, thereby be able to find the 10 rows without scanning the table or doing a sort.
The second query is missing status, so my index is less than perfect. This would be better:
INDEX(enterprise, department, id)
At that point, it works like above. (Note: If the table is InnoDB, then this 3-column index is identical to your 2-column INDEX(enterprise, department) -- the PK is silently included.)
The third query gets dicier because of the IN. Still, my 4 column index will be nearly the best. It will use the first 3 columns, but not be able to do the ORDER BY id, so it won't use id. And it won't be able to comsume the LIMIT. Hence the EXPLAIN will say Using temporary and/or Using filesort. Don't worry, performance should still be nice.
My second index is not as good for the third query.
See my Index Cookbook.
"How quick will sorting by id be"? That depends on two things.
Whether the sort can be avoided (see above);
How many rows in the query without the LIMIT;
Whether you are selecting TEXT columns.
I was careful to say whether the INDEX is used all the way through the ORDER BY, in which case there is no sort, and the LIMIT is folded in. Otherwise, all the rows (after filtering) are written to a temp table, sorted, then 10 rows are peeled off.
The "temp table" I just mentioned is necessary for various complex queries, such as those with subqueries, GROUP BY, ORDER BY. (As I have already hinted, sometimes the temp table can be avoided.) Anyway, the temp table comes in 2 flavors: MEMORY and MyISAM. MEMORY is favorable because it is faster. However, TEXT (and several other things) prevent its use.
If MEMORY is used then Using filesort is a misnomer -- the sort is really an in-memory sort, hence quite fast. For 10 rows (or even 100) the time taken is insignificant.
The default ordering ID of records in mysql is ASC (i.e. Rows that i insert goes down the table) but we'll be using only the latest information from the table (i.e. Rows that are below).
Will there be any performance improvements if we change the default ordering to DESC (i.e New records goes to the top of the table) and frequent information will be queried from top of the table.
I think it would be the opposite.
I'm basing this comment on how I understand indexes to work in SQL Server-I'll try to revise later if I get a chance to read up more on how they work in MySQL.
There could be a slight performance advantage to insert your rows in the same order as your index is sorted, versus inserting them in the opposite order.
If you insert in the same order, and your next row to insert is always greater in sort order than existing rows then you will always find the next available empty spot (when one exists) in your last page of rows data.
If you do the opposite, always have your next insert row lesser in sort order than existing rows then you will probably always have a collision in your first page of rows data and the engine will do a tad bit more work to shift the position of rows if the page has room for it.
As for your order by clause in the select statement:
1) there's nothing in the SQL standard about indexes, and nothing that guarantees your result set ordering except for the ORDER BY clause. Normally queries in SQL Server that use just one index will see results returned in the order of the index. But if the isolation level changes to "read uncommitted" (chaos?) then it will return rows in more likely in the order it finds them in memory or on disk which is not necessarily the order you want.
2) If the order by in your select statement is based on the exact same column criteria as the index, then your database server should perform the same with either the index order, or the opposite of the index order. This is pretty straightforward except perhaps if you have a multi-column index with mixed ASC-DESC declarations for different columns. You get away with equal performance with order by equal to index order and with order by equal to inverse index order where the inverse index order is determined by substituting the ASC and DESC declarations (explicit and implicit) in the index declaration with DESC and ASC in the order by clause.
Any performance change would be on querying the records, not inserting one.
For queries, I doubt this will have much affect as database queries by keys usually have similar speeds.
It also depends on your data so I would run some tests.
I want to run a simple query to get the "n" oldest records in the table. (It has a creation_date column).
How can i get that without using "order-by". It is a very big table and using order by on entire table to get only "n" records is not so convincing.
(Assume n << size of table)
When you are concerned about performance, you should probably not discard the use of order by too early.
Queries like that can be implemende as Top-N query supported by an appropriate index, that's running very fast because it doesn't need to sort the entire table, not even the selecte rows, because the data is already sorted in the index.
example:
select *
from table
where A = ?
order by creation_date
limit 10;
without appropriate index it will be slow if you are having lot's of data. However, if you create an index like that:
create index test on table (A, creation_date );
The query will be able to start fetching the rows in the correct order, without sorting, and stop when the limit is reached.
Recipe: put the where columns in the index, followed by the order by columns.
If there is no where clause, just put the order by into the index. The order by must match the index definition, especially if there are mixed asc/desc orders.
The indexed Top-N query is the performance king--make sure to use them.
I few links for further reading (all mine):
How to use index efficienty in mysql query
http://blog.fatalmind.com/2010/07/30/analytic-top-n-queries/ (Oracle centric)
http://Use-The-Index-Luke.com/ (not yet covering Top-N queries, but that's to come in 2011).
I haven't tested this concept before but try and create an index on the creation_date column. Which will automatically sort the rows is ascending order. Then your select query can use the orderby creation_date desc with the Limit 20 to get the first 20 records. The database engine should realize the index has already done the work sorting and wont actually need to sort, because the index has already sorted it on save. All it needs to do is read the last 20 records from the index.
Worth a try.
Create an index on creation_date and query by using order by creation_date asc|desc limit n and the response will be very fast (in fact it cannot be faster). For the "latest n" scenario you need to use desc.
If you want more constraints on this query (e.g where state='LIVE') then the query may become very slow and you'll need to reconsider the indexing strategy.
You can use Group By if your grouping some data and then Having clause to select specific records.