Say if I have a query that look like this:
SELECT * FROM table WHERE category='5' and status='1' LIMIT 5
The table has 1 million rows.
To speed things up, I create index (status, category), i.e. multiple column index.
There are 600 categories but only 2 status (1 or 0). I'm wondering if there is any difference in performance if I create index (category, status) instead of index (status, category).
Status first.
The trick is then if you only need to query by category you can.
SELECT * from table where status in (1,0) and category = 'whatever'
and still get index support. Of course if your queries all use both columns it's the same either way, but in this case if you use only status it's much better, and only category only slightly worse if at all.
If you are looking at a lot of inserts as well, you want to minimize the number of indices, so this is your best bet rather than having multiple indices.
There shouldn't be any difference. The selectivity of the index is identical whether you order it (category, status) or (status, category).
By the way, using LIMIT is often meaningless without also using ORDER BY. The order of rows returned by an SQL query is arbitrary unless you specify an order.
Re your comment: Yes, it's common to need five random rows, but arbitrary is not the same as random. It's not common to need five arbitrary rows.
Related
In my script, I have a lot of SQL WHERE clauses, e.g.:
SELECT * FROM cars WHERE active=1 AND model='A3';
SELECT * FROM cars WHERE active=1 AND year=2017;
SELECT * FROM cars WHERE active=1 AND brand='BMW';
I am using different SQL clauses on same table because I need different data.
I would like to set index key on table cars, but I am not sure how to do it. Should I set separate keys for each column (active, model, year, brand) or should I set keys for groups (active,model and active,year and active,brand)?
WHERE a=1 AND y='m'
is best handled by INDEX(a,y) in either order. The optimal set of indexes is several pairs like that. However, I do not recommend having more than a few indexes. Try to limit it to queries that users actually make.
INDEX(a,b,c,d):
WHERE a=1 AND b=22 -- Index is useful
WHERE a=1 AND d=44 -- Index is less useful
Only the "left column(s)" of an index are used. Hence the second case, uses a, but stops because b is not in the WHERE.
You might be tempted to also have (active, year, model). That combination works well for active AND year, active AND year AND model, but not active AND model (but no year).
More on creating indexes: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
Since model implies a make, there is little use to put both of those in the same composite index.
year is not very selective, and users might want a range of years. These make it difficult to get an effective index on year.
How many rows will you have? If it is millions, we need to work harder to avoid performance problems. I'm leaning toward this, but only because the lookup would be more compact.
We use single indexing when we want to query for just one column, same asin your case and multiple group indexing when we have multiple condition in the same where clause.
Go for single indexing.
For more detailed explanation, refer this article: https://www.sqlinthewild.co.za/index.php/2010/09/14/one-wide-index-or-multiple-narrow-indexes/
Having a 10+ million table with three columns: one, two, three and SQL query like SELECT * FROM table ORDER BY one, two, three LIMIT 1 - do I really need to create a multi-column index using all three columns?
I know for sure that if one and two matches, there would be max 10 rows with distinct three.
Is it enough for fast SELECTs? -
CREATE INDEX MY_INDEX ON table (one, two);
With INDEX(one, two, three), the query will go straight down the BTree to the one (LIMIT 1) desired row.
With INDEX(one, two), the query will go straight down the BTree to the first such row, then scan forward the up-to-10 rows, save them to a tmp table, sort them (ORDER BY includes three) (probably done in memory), and deliver the first one. Although this sounds more complex it will not (in this example) be much slower.
It will not be a "table scan" ("ALL"), but perhaps a "range" scan. Use EXPLAIN SELECT ... to see.
If three is a bulky string, then the 3-col index will be bulkier; this has some impact on disk space and performance.
If you need only (one, two) for some other queries, then either index works reasonably well (barring the "bulky" comment).
If you do SELECT one, two, three FROM ..., the 3-part index will be better because it will be "covering". SELECT * won't have such a bonus.
Bottom line: Either index is "OK", many other factors factor in, making it hard to say for sure what to do.
You might think MySQL is clever enough to only read at most the first 10 rows using the index and then sort these. Unfortunately, it isn't (because the optimizer doesn't regard the limit at this point). You can verify that by using explain select ..., it will show that MySQL will do a full table scan ("ALL").
The documentation describes conditions to be able to use an index to optimize order by:
The index can also be used even if the ORDER BY does not match the index exactly, as long as all unused portions of the index and all extra ORDER BY columns are constants in the WHERE clause.
Your third column does not satisfy this. So this query will not use this index (which does not mean that it might not be usefull for something else).
Since MySQL 5.6, there is however the so-called filesort priority queue optimization to accommodate the limit: while MySQL will still read the whole table, it will not sort the whole table (which would be a time consuming process), but will stop when it knows what the first row will be, which makes your query acceptable fast.
But you can rewrite your query to do exactly what you are thinking of:
SELECT * FROM
(select * from table ORDER BY one, two LIMIT 10) sub
order by one, two, three limit 1;
This will read the first 10 rows using that index, and then just sort these. It will of course only work correctly if you are absolutely sure you will only have at most 10 rows.
A more general way to optimize your query independently from knowing the maximum number of possible rows would be e.g.
SELECT * FROM table
where one = (select min(one) from table)
order by one, two, three limit 1;
This will use the index to reduce the number of rows that have to be read and filesorted by looking up the lowest value for one first (using the index) and only considering these rows. You can similarly include a condition for two.
Or you can simply can use all three columns in your index (although depending on the size of your third column, it can make sense to not do this). These kind of optimizations tend to catch up at one point. If you e.g. use the first method, and in 2 year there will be 11 rows possible, you (or your successor) will have to remember that you have this implied condition in your code.
If I'm trying to increase the performance of a query that uses 4 different columns from a specific table, should I create 4 different indexes (one with each column individually) or should I create 1 index with all columns included?
One index with all 4 values is by my experience the fastest. If you use a where, try to put the columns in an order that makes it useful for the where.
An index with all four columns; the columns used in the WHERE should go first, and those for which you do == compare should go first of all.
Sometimes, giving priority to integer columns gives better results; YMMV.
So for example,
SELECT title, count(*) FROM table WHERE class = 'post' AND topic_id = 17
AND date > ##BeginDate and date < ##EndDate;
would have an index on: topic_id, post, date, and title, in this order.
The "title" in the index is only used so that the DB may find the value of "title" for those records matching the query, without the extra access to the data table.
The more balanced the distribution of the records on the first fields, the best results you will have (in this example, say 10% of the rows have topic_id = 17, you would discard the other 90% without ever having to run a string comparison with 'post' -- not that string comparisons are particularly costly. Depending on the data, you might find it better to index date first and post later, or even use date first as a MySQL PARTITION.
Single index is usually more effective than index merge, so if you have condition like f1 = 1 AND f2 = 2 AND f3 = 3 AND f4 = 4 single index would right decision.
To achieve best performance enumerate index fields in descending order of cardinality (count of distinct values), this will help to reduce analyzed rows count.
Index of less than 4 fields can be more effective, as it requires less memory.
http://www.mysqlperformanceblog.com/2008/08/22/multiple-column-index-vs-multiple-indexes/
I have a table of items, each of it has an a_level, b_level, and an item_id. Any b_level is dedicated to only one a_level (example: b_level 14 is "child" of a_level 2 only)
Lets say we have million of items all of them are INSERTed once and then only SELECTs are requested.
If i SELECT an item based on item_id, then i need to index the item_id column. This will make the MySQL to look all millions of items, which is bad, since i already have the a_level and b_level information. So i guess if i SELECT an item based on a specific level and i have an index on that column, the MySQL will not have to look all millions of items, just the items with that particular level.
If i INDEX both on a_level, b_level (and of course item_id) and SELECT WHERE a_level= b_level= item_id= will it be bad? I guess only INDEX on b_level and item_id and SELECT WHERE b_level= AND item_id= will be enough/the best solution?
So, since i have a_level and b_level (which any b_level as i said is "child" of only one a_level) what will be the most efficient SELECT and INDEXes created for picking up an item?
You can certainly index every column. If you do, MySQL will use index merge optimization to apply many indexes to a single query. However, for more efficiency, you might want to use composite indexes (single index on multiple columns). MySQL composite indexes are used in optimization by following the left-prefix rule. If the SELECT statement is restricted by terms that are included in a left-prefix of an index, that index is used. For example, if you have
SELECT * FROM t WHERE a_level = 1 AND b_level = 2
then the appropriate index would have to include a_level or b_level as the first columns. In other words, an index for (a_level, b_level) could index queries such as
SELECT * FROM t WHERE a_level = 1
SELECT * FROM t WHERE a_level = 1 AND b_level = 2
but not
SELECT * FROM t WHERE b_level = 2
because b_level is not a left-prefix of the index.
You'd probably first want to benchmark which of the selects you're performing most often and create indexes based on that, as long as they follow the left-prefix rule. You might want to use several indexes for a few of the different SELECT queries in order to keep from blanketing the entire table. It's not easy to perfectly answer this question without knowing the data and queries exactly.
However, if you're sure you're never going to write into the table again, you might as well cover the entire thing with an index, if space is not an issue.
if you do select by a column or set of colums frequently, then index that column or set of columns. Indexes don't look all millions of items, that's why they're indexes (without an index, it would indeed look all millions of items)
A quick question. I have a simple table with this structure:
USERS_TABLE = {id}, name, number_field_1, number_field_2, number_field_3, number_field_4, number_field_5
Sorting is a major feature, one field at a time at the moment. Table can have up to 50 million records. Most selects are done using the "id" field, which is the primary key and it's indexed.
A sample query:
SELECT * FROM USERS_TABLE WHERE id=10 ORDER BY number_field_1 DESC LIMIT 100;
The question:
Do I create separate indexes for each "number_field_*" to optimize ORDER BY statements, or is there a better way to optimize? Thanks
There's no silver bullet here, you will have to test this for your data and your queries.
If all your queries (e.g. WHERE id=10) returns few rows, it might not be worth indexing the order by columns. On the other hand, if they return a lot of rows, it might help greatly.
If you always order by atleast one of the fields, consider creating an index on that column, or a compound index on some of these columns - keep in mind that if you have a compound index on (num_field_1,num_field_2) and you order only by num_field_2 , that index will not be used. You have to include the leftmost fields of the index to make use of it.
On the other hand, you seem to have a lot of different fields you order by, the drawback of creating an index on each an every one of them is your indexes will be much larger, and your inserts/deletes/updates might start to get slower.
Basically - there is no shortcut. You'll have to play around with which indexes works best for your queries and tune accordingly, while carefully analyzing your queries with EXPLAIN