Does the order of index creation matter - mysql

Assuming I have an index on two columns on foo table indexed on (x,y)
If I search it as select * from foo where x=1 and y=2 or select * from foo where y=2 and x=1. Does it really matter on mysql.

Short answer - no, it doesn't matter. MySQL will try to pick the best index to use regardless of whether the column appears first or second in your WHERE clause.
You can prove this by running an EXPLAIN statement on each one to get more information about how MySQL will execute the query - it should show that the same index is used in both cases.
If you're talking about the order the columns appear in the index - (x,y) vs (y,x), it also doesn't matter in this case since you're selecting using both columns. If you sometimes select on just one of the columns though, that column should appear first in the index so MySQL can use the partial index to help optimize the query when only one value is provided.

Thats is so useful on Postgre on query optmizing to some querys, but MySQL just ignore index order (ASC,DESC) I dont know which version gonna suport this.
'I thought' this is a workbench bug, but the wockbench team anwser me:
Our manual, http://dev.mysql.com/doc/refman/5.5/en/create-index.html,
says:
"An index_col_name specification can end with ASC or DESC. These
keywords are permitted for future extensions for specifying ascending
or descending index value storage. Currently, they are parsed but
ignored; index values are always stored in ascending order."
So, probably this is the reason for what you see in Workbench: it
allows to add that DESC option, but server itself ignores it.
Earlier comments can be viewed at http://bugs.mysql.com/65893

Related

Indeterminate sort ordering - MySQL vs Postgres

I have the following query
SELECT *
FROM table_1
INNER JOIN table_2 ON table_1.orders = table_2.orders
ORDER BY table_2.purchasetime;
The above query result is indeterminate i.e it can change with different queries when the purchase time is of same value as per the MySQL manual itself.To overcome this we give sort ordering on a unique column and combine it with the regular sort ordering.
The customer does not want to see different results with different page refreshes so we have put in the above fix specifically for MySQL which is unnecessary and needs extra compound indexes for both asc and desc.
I am not sure whether the same is applicable for postgres.So far I have not been able to reproduce the scenario.I would appreciate if someone could answer this for postgres or point me in the right direction.
Edit 1 : The sort column is indexed.So assuming the disk data has no ordering, but in the case of index (btree data structure) a constant ordering might be possible with postgres ?
No, it will not be different in PostgreSQL (or, in fact, in any other relational database that I know of).
See http://www.postgresql.org/docs/9.4/static/queries-order.html :
After a query has produced an output table (after the select list has been processed) it can optionally be sorted. If sorting is not chosen, the rows will be returned in an unspecified order. The actual order in that case will depend on the scan and join plan types and the order on disk, but it must not be relied on. A particular output ordering can only be guaranteed if the sort step is explicitly chosen.
Even if by accident you manage to find a PostgreSQL version and index that will guarantee the order in all the test you run, please don't rely on it. Any database upgrade, data change or a change in the Maya calendar or the phase of the moon can suddenly upset your sorting order. And debugging it then is a true and terrible pain in the neck.
Your concern seems to be that order by table_2.purchasetime is indeterminate when there are multiple rows with the same value.
To fix this -- in any database or really any computer language -- you need a stable sort. You can turn any sort into a stable sort by adding a unique key. So, adding a unique column (typically an id of some sort) fixes this in both MySQL and Postgres (and any other database).
I should note that instability in sorts can be a very subtle problem, one that only shows up under certain circumstances. So, you could run the same query many times and it is fine. Then you insert or delete a record (perhaps even one not chosen by the query) and the order changes.

Should I avoid ORDER BY in queries for large tables?

In our application, we have a page that displays user a set of data, a part of it actually. It also allows user to order it by a custom field. So in the end it all comes down to query like this:
SELECT name, info, description FROM mytable
WHERE active = 1 -- Some filtering by indexed column
ORDER BY name LIMIT 0,50; -- Just a part of it
And this worked just fine, as long as the size of table is relatively small (used only locally in our department). But now we have to scale this application. And let's assume, the table has about a million of records (we expect that to happen soon). What will happen with ordering? Do I understand correctly, that in order to do this query, MySQL will have to sort a million records each time and give a part of it? This seems like a very resource-heavy operation.
My idea is simply to turn off that feature and don't let users select their custom ordering (maybe just filtering), so that the order would be a natural one (by id in descending order, I believe the indexing can handle that).
Or is there a way to make this query work much faster with ordering?
UPDATE:
Here is what I read from the official MySQL developer page.
In some cases, MySQL cannot use indexes to resolve the ORDER BY,
although it still uses indexes to find the rows that match the WHERE
clause. These cases include the following:
....
The key used to
fetch the rows is not the same as the one used in the ORDER BY:
SELECT * FROM t1 WHERE key2=constant ORDER BY key1;
So yes, it does seem like mysql will have a problem with such a query? So, what do I do - don't use an order part at all?
The 'problem' here seems to be that you have 2 requirements (in the example)
active = 1
order by name LIMIT 0, 50
The former you can easily solve by adding an index on the active field
The latter you can improve by adding an index on name
Since you do both in the same query, you'll need to combine this into an index that lets you resolve the active value quickly and then from there on fetches the first 50 names.
As such, I'd guess that something like this will help you out:
CREATE INDEX idx_test ON myTable (active, name)
(in theory, as always, try before you buy!)
Keep in mind though that there is no such a thing as a free lunch; you'll need to consider that adding an index also comes with downsides:
the index will make your INSERT/UPDATE/DELETE statements (slightly) slower, usually the effect is negligible but only testing will show
the index will require extra space in de database, think of it as an additional (hidden) special table sitting next to your actual data. The index will only hold the fields required + the PK of the originating table, which usually is a lot less data then the entire table, but for 'millions of rows' it can add up.
if your query selects one or more fields that are not part of the index, then the system will have to fetch the matching PK fields from the index first and then go look for the other fields in the actual table by means of the PK. This probably is still (a lot) faster than when not having the index, but keep this in mind when doing something like SELECT * FROM ... : do you really need all the fields?
In the example you use active and name but from the text I get that these might be 'dynamic' in which case you'd have to foresee all kinds of combinations. From a practical point this might not be feasible as each index will come with the downsides of above and each time you add an index you'll add supra to that list again (cumulative).
PS: I use PK for simplicity but in MSSQL it's actually the fields of the clustered index, which USUALLY is the same thing. I'm guessing MySQL works similarly.
Explain your query, and check, whether it goes for filesort,
If Order By doesnt get any index or if MYSQL optimizer prefers to avoid the existing index(es) for sorting, it goes with filesort.
Now, If you're getting filesort, then you should preferably either avoid ORDER BY or you should create appropriate index(es).
if the data is small enough, it does operations in Memory else it goes on the disk.
so you may try and change the variable < sort_buffer_size > as well.
there are always tradeoffs, one way to improve the preformance of order query is to set the buffersize and then the run the order by query which improvises the performance of the query
set sort_buffer_size=100000;
<>
If this size is further increased then the performance will start decreasing

How can I make this SQL non sargable?

I've used an online tool to analyse one of my sql querys (The Query took me ages to make).
My query takes a word (in this example the word is 'dog.') and tries to find it in the 'qa' table when it does it joins row data from the login table where the login.pid===qa.u
SELECT login.pid,login.name,
qa.id,qa.end,qa.react,qa.win,qa.stock,qa.num,qa.ratio,qa.u,qa.t,qa.k,qa.swipes,qa.d
FROM login,qa WHERE login.pid=qa.u AND (qa.k LIKE '%dog.%' OR qa.k='.dog.')
ORDER BY qa.d DESC LIMIT 0,15
I understand what the tool is telling me:
Argument with leading wildcard
An argument has a leading wildcard character, such as "%foo". The predicate with
this argument is not sargable and cannot use an index if one exists.
but I don't know how to use an index inside the '()' without damaging or changing the results... could someone please explain how I could use an index in the middle of a query's conditions?
I take it that if this was non-sargable then the result would be faster?
First, learn to use modern join syntax:
SELECT login.pid, login.name,
qa.id, qa.end, qa.react, qa.win, qa.stock, qa.num, qa.ratio, qa.u, qa.t,qa.k, qa.swipes, qa.d
FROM login join
qa
on login.pid = qa.u
WHERE (qa.k LIKE '%dog.%' OR qa.k = '.dog.')
ORDER BY qa.d DESC
LIMIT 0,15;
Basically "sargable" means that you can use an index on a particular expression (it is not an English word, it is an acronym). The expression on qa.k cannot use an index.
This may not make a difference, depending on the query plan for the query. For instance, if the engine decides to scan the login table and then lookup values in qa, the index wouldn't help. It helps going the other way, though.
The bad news is that you cannot make this expression sargable in MySQL. The good news is that you can use a full text index to do what you want and possibly more. You can read about them here. One small note is that the default settings ignore short words, up to three letters. So you need to change the default setting if you actually want to search for "dog".
By the way, the following expression can use an index on qa.k:
WHERE (qa.k LIKE 'dog.%' OR qa.k = '.dog.')
(I'm not sure if MySQL actually would use the index, because it sometimes gets confused by or.)

mysql IN clause not using possible keys

I have a fairly simple mysql query which contains a few inner join and then a where clause. I have created indexes for all the columns that are used in the joins as well as the primary keys. I also have a where clause which contains an IN operator. When only 5 or less ids are passed into the IN clause the query optimizer uses one of my indexes to run the query in a reasonable amount of time. When I use explain I see that the type is range and key is PRIMARY. My issue is that if I use more than 5 ids in the IN clause, the optimizer ignores all the available indexes and query runs extremely slow. When I use explain I see that the type is ALL and the key is NULL.
Could someone please shed some light on what is happening here and how I could fix this.
Thanks
Regardless of the "primary key" indexes on the tables to optimize the JOINs, you should also have an index based on common criteria you are applying a WHERE against. More info needed on columns of your query, but you should have an index on your WHERE criteria TOO.
You could also try using Mysql Index Hints. It lets you specify which index should be used during the query execution.
Examples:
SELECT * FROM table1 USE INDEX (col1_index,col2_index)
WHERE col1=1 AND col2=2 AND col3=3;
-
SELECT * FROM table1 IGNORE INDEX (col3_index)
WHERE col1=1 AND col2=2 AND col3=3;
More Information here:
Mysql Index Hints
Found this while checking up on a similar problem I am having. Thought my findings might help anyone else with a similar issue in future.
I have a MyISAM table with about 30 rows (contains common typos of similar words for a search where both the possible original typo and the alternative word may be valid spellings, the table will slowly build up in size). However the cutoff for me is that if there are 4 items in the IN clause the index is used but when 5 are in the IN clause the index is ignored (note I haven't tried alternative words so the actual individual items in the IN clause might be a factor). So similar to the OP, but with a different number of words.
Use index would not work and the index would still be ignored. Force index does work, although I would prefer to avoid specifying indexes (just in case someone deletes the index).
For some testing I padded out the table with an extra 1000 random unique rows the query would use the relevant index even with 80 items in the IN clause.
So seems MySQL decides whether to use the index based on the number of items in the IN clause compared to the number of rows in the table (probably some other factors at play though).

SQL: What is the default Order By of queries?

What is the default order of a query when no ORDER BY is used?
There is no such order present. Taken from http://forums.mysql.com/read.php?21,239471,239688#msg-239688
Do not depend on order when ORDER BY is missing.
Always specify ORDER BY if you want a particular order -- in some situations the engine can eliminate the ORDER BY because of how it
does some other step.
GROUP BY forces ORDER BY. (This is a violation of the standard. It can be avoided by using ORDER BY NULL.)
SELECT * FROM tbl -- this will do a "table scan". If the table has
never had any DELETEs/REPLACEs/UPDATEs, the records will happen to be
in the insertion order, hence what you observed.
If you had done the same statement with an InnoDB table, they would
have been delivered in PRIMARY KEY order, not INSERT order. Again,
this is an artifact of the underlying implementation, not something to
depend on.
There's none. Depending on what you query and how your query was optimised, you can get any order. There's even no guarantee that two queries which look the same will return results in the same order: if you don't specify it, you cannot rely on it.
I've found SQL Server to be almost random in its default order (depending on age and complexity of the data), which is good as it forces you to specify all ordering.
(I vaguely remember Oracle being similar to SQL Server in this respect.)
MySQL by default seems to order by the record structure on disk, (which can include out-of-sequence entries due to deletions and optimisations) but it often initially fools developers into not bother using order-by clauses because the data appears to default to primary-key ordering, which is not the case!
I was surprised to discovere today, that MySQL 5.6 and 4.1 implicitly sub-order records which have been sorted on a column with a limited resolution in the opposite direction. Some of my results have identical sort-values and the overall order is unpredictable. e.g. in my case it was a sorted DESC by a datetime column and some of the entries were in the same second so they couldn't be explicitly ordered. On MySQL 5.6 they select in one order (the order of insertion), but in 4.1 they select backwards! This led to a very annoying deployment bug.
I have't found documentation on this change, but found notes on on implicit group order in MySQL:
By default, MySQL sorts all GROUP BY col1, col2, ... queries as if you specified ORDER BY col1, col2, ... in the query as well.
However:
Relying on implicit GROUP BY sorting in MySQL 5.5 is deprecated. To achieve a specific sort order of grouped results, it is preferable to use an explicit ORDER BY clause.
So in agreement with the other answers - never rely on default or implicit ordering in any database.
The default ordering will depend on indexes used in the query and in what order they are used. It can change as the data/statistics change and the optimizer chooses different plans.
If you want the data in a specific order, use ORDER BY