I have a table of items, each of it has an a_level, b_level, and an item_id. Any b_level is dedicated to only one a_level (example: b_level 14 is "child" of a_level 2 only)
Lets say we have million of items all of them are INSERTed once and then only SELECTs are requested.
If i SELECT an item based on item_id, then i need to index the item_id column. This will make the MySQL to look all millions of items, which is bad, since i already have the a_level and b_level information. So i guess if i SELECT an item based on a specific level and i have an index on that column, the MySQL will not have to look all millions of items, just the items with that particular level.
If i INDEX both on a_level, b_level (and of course item_id) and SELECT WHERE a_level= b_level= item_id= will it be bad? I guess only INDEX on b_level and item_id and SELECT WHERE b_level= AND item_id= will be enough/the best solution?
So, since i have a_level and b_level (which any b_level as i said is "child" of only one a_level) what will be the most efficient SELECT and INDEXes created for picking up an item?
You can certainly index every column. If you do, MySQL will use index merge optimization to apply many indexes to a single query. However, for more efficiency, you might want to use composite indexes (single index on multiple columns). MySQL composite indexes are used in optimization by following the left-prefix rule. If the SELECT statement is restricted by terms that are included in a left-prefix of an index, that index is used. For example, if you have
SELECT * FROM t WHERE a_level = 1 AND b_level = 2
then the appropriate index would have to include a_level or b_level as the first columns. In other words, an index for (a_level, b_level) could index queries such as
SELECT * FROM t WHERE a_level = 1
SELECT * FROM t WHERE a_level = 1 AND b_level = 2
but not
SELECT * FROM t WHERE b_level = 2
because b_level is not a left-prefix of the index.
You'd probably first want to benchmark which of the selects you're performing most often and create indexes based on that, as long as they follow the left-prefix rule. You might want to use several indexes for a few of the different SELECT queries in order to keep from blanketing the entire table. It's not easy to perfectly answer this question without knowing the data and queries exactly.
However, if you're sure you're never going to write into the table again, you might as well cover the entire thing with an index, if space is not an issue.
if you do select by a column or set of colums frequently, then index that column or set of columns. Indexes don't look all millions of items, that's why they're indexes (without an index, it would indeed look all millions of items)
Related
In my script, I have a lot of SQL WHERE clauses, e.g.:
SELECT * FROM cars WHERE active=1 AND model='A3';
SELECT * FROM cars WHERE active=1 AND year=2017;
SELECT * FROM cars WHERE active=1 AND brand='BMW';
I am using different SQL clauses on same table because I need different data.
I would like to set index key on table cars, but I am not sure how to do it. Should I set separate keys for each column (active, model, year, brand) or should I set keys for groups (active,model and active,year and active,brand)?
WHERE a=1 AND y='m'
is best handled by INDEX(a,y) in either order. The optimal set of indexes is several pairs like that. However, I do not recommend having more than a few indexes. Try to limit it to queries that users actually make.
INDEX(a,b,c,d):
WHERE a=1 AND b=22 -- Index is useful
WHERE a=1 AND d=44 -- Index is less useful
Only the "left column(s)" of an index are used. Hence the second case, uses a, but stops because b is not in the WHERE.
You might be tempted to also have (active, year, model). That combination works well for active AND year, active AND year AND model, but not active AND model (but no year).
More on creating indexes: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
Since model implies a make, there is little use to put both of those in the same composite index.
year is not very selective, and users might want a range of years. These make it difficult to get an effective index on year.
How many rows will you have? If it is millions, we need to work harder to avoid performance problems. I'm leaning toward this, but only because the lookup would be more compact.
We use single indexing when we want to query for just one column, same asin your case and multiple group indexing when we have multiple condition in the same where clause.
Go for single indexing.
For more detailed explanation, refer this article: https://www.sqlinthewild.co.za/index.php/2010/09/14/one-wide-index-or-multiple-narrow-indexes/
Having a 10+ million table with three columns: one, two, three and SQL query like SELECT * FROM table ORDER BY one, two, three LIMIT 1 - do I really need to create a multi-column index using all three columns?
I know for sure that if one and two matches, there would be max 10 rows with distinct three.
Is it enough for fast SELECTs? -
CREATE INDEX MY_INDEX ON table (one, two);
With INDEX(one, two, three), the query will go straight down the BTree to the one (LIMIT 1) desired row.
With INDEX(one, two), the query will go straight down the BTree to the first such row, then scan forward the up-to-10 rows, save them to a tmp table, sort them (ORDER BY includes three) (probably done in memory), and deliver the first one. Although this sounds more complex it will not (in this example) be much slower.
It will not be a "table scan" ("ALL"), but perhaps a "range" scan. Use EXPLAIN SELECT ... to see.
If three is a bulky string, then the 3-col index will be bulkier; this has some impact on disk space and performance.
If you need only (one, two) for some other queries, then either index works reasonably well (barring the "bulky" comment).
If you do SELECT one, two, three FROM ..., the 3-part index will be better because it will be "covering". SELECT * won't have such a bonus.
Bottom line: Either index is "OK", many other factors factor in, making it hard to say for sure what to do.
You might think MySQL is clever enough to only read at most the first 10 rows using the index and then sort these. Unfortunately, it isn't (because the optimizer doesn't regard the limit at this point). You can verify that by using explain select ..., it will show that MySQL will do a full table scan ("ALL").
The documentation describes conditions to be able to use an index to optimize order by:
The index can also be used even if the ORDER BY does not match the index exactly, as long as all unused portions of the index and all extra ORDER BY columns are constants in the WHERE clause.
Your third column does not satisfy this. So this query will not use this index (which does not mean that it might not be usefull for something else).
Since MySQL 5.6, there is however the so-called filesort priority queue optimization to accommodate the limit: while MySQL will still read the whole table, it will not sort the whole table (which would be a time consuming process), but will stop when it knows what the first row will be, which makes your query acceptable fast.
But you can rewrite your query to do exactly what you are thinking of:
SELECT * FROM
(select * from table ORDER BY one, two LIMIT 10) sub
order by one, two, three limit 1;
This will read the first 10 rows using that index, and then just sort these. It will of course only work correctly if you are absolutely sure you will only have at most 10 rows.
A more general way to optimize your query independently from knowing the maximum number of possible rows would be e.g.
SELECT * FROM table
where one = (select min(one) from table)
order by one, two, three limit 1;
This will use the index to reduce the number of rows that have to be read and filesorted by looking up the lowest value for one first (using the index) and only considering these rows. You can similarly include a condition for two.
Or you can simply can use all three columns in your index (although depending on the size of your third column, it can make sense to not do this). These kind of optimizations tend to catch up at one point. If you e.g. use the first method, and in 2 year there will be 11 rows possible, you (or your successor) will have to remember that you have this implied condition in your code.
If I'm trying to increase the performance of a query that uses 4 different columns from a specific table, should I create 4 different indexes (one with each column individually) or should I create 1 index with all columns included?
One index with all 4 values is by my experience the fastest. If you use a where, try to put the columns in an order that makes it useful for the where.
An index with all four columns; the columns used in the WHERE should go first, and those for which you do == compare should go first of all.
Sometimes, giving priority to integer columns gives better results; YMMV.
So for example,
SELECT title, count(*) FROM table WHERE class = 'post' AND topic_id = 17
AND date > ##BeginDate and date < ##EndDate;
would have an index on: topic_id, post, date, and title, in this order.
The "title" in the index is only used so that the DB may find the value of "title" for those records matching the query, without the extra access to the data table.
The more balanced the distribution of the records on the first fields, the best results you will have (in this example, say 10% of the rows have topic_id = 17, you would discard the other 90% without ever having to run a string comparison with 'post' -- not that string comparisons are particularly costly. Depending on the data, you might find it better to index date first and post later, or even use date first as a MySQL PARTITION.
Single index is usually more effective than index merge, so if you have condition like f1 = 1 AND f2 = 2 AND f3 = 3 AND f4 = 4 single index would right decision.
To achieve best performance enumerate index fields in descending order of cardinality (count of distinct values), this will help to reduce analyzed rows count.
Index of less than 4 fields can be more effective, as it requires less memory.
http://www.mysqlperformanceblog.com/2008/08/22/multiple-column-index-vs-multiple-indexes/
I'm deploying a Rails application that aggregates coupon data from various third-party providers into a searchable database. Searches are conducted across four fields for each coupon: headline, coupon code, description, and expiration date.
Because some of these third-party providers do a rather bad job of keeping their data sorted, and because I don't want duplicate coupons to creep into my database, I've implemented a unique compound index across those four columns. That prevents the same coupon from being inserted into my database more than once.
Given that I'm searching against these columns (via simple WHERE column LIKE %whatever% matching for the time being), I want these columns to each individually benefit from the speed gains to be had by indexing them.
So here's my question: will the compound index across all columns provide the same searching speed gains as if I had applied an individual index to each column? Or will it only guarantee uniqueness among the rows?
Complicating the matter somewhat is that I'm developing in Rails, so my question pertains both to SQLite3 and MySQL (and whatever we might port over to in the future), rather than one specific RDBMS.
My guess is that the indexes will speed up searching across individual columns, but I really don't have enough "under the hood" database expertise to feel confident in that judgement.
Thanks for lending your expertise.
will the compound index across all
columns provide the same searching
speed gains as if I had applied an
individual index to each column?
Nope. The order of the columns in the index is very important. Lets suppose you have an index like this: create unique index index_name on table_name (headline, coupon_code, description,expiration_date)
In this case these queries will use the index
select * from table_name where headline = 1
select * from table_name where headline = 1 and cupon_code = 2
and these queries wont use the unique index:
select * from table_name where coupon_code = 1
select * from table_name where description = 1 and cupon_code = 2
So the rule is something like this. When you have multiple fields indexed together, then you have to specify the first k field to be able to use the index.
So if you want to be able to search for any one of these fields then you should create on index on each of them separately (besides the combined unique index)
Also, be careful with the LIKE operator.
this will use index SELECT * FROM tbl_name WHERE key_col LIKE 'Patrick%';
and this will not SELECT * FROM tbl_name WHERE key_col LIKE '%Patrick%';
index usage http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
multiple column index http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html
Say if I have a query that look like this:
SELECT * FROM table WHERE category='5' and status='1' LIMIT 5
The table has 1 million rows.
To speed things up, I create index (status, category), i.e. multiple column index.
There are 600 categories but only 2 status (1 or 0). I'm wondering if there is any difference in performance if I create index (category, status) instead of index (status, category).
Status first.
The trick is then if you only need to query by category you can.
SELECT * from table where status in (1,0) and category = 'whatever'
and still get index support. Of course if your queries all use both columns it's the same either way, but in this case if you use only status it's much better, and only category only slightly worse if at all.
If you are looking at a lot of inserts as well, you want to minimize the number of indices, so this is your best bet rather than having multiple indices.
There shouldn't be any difference. The selectivity of the index is identical whether you order it (category, status) or (status, category).
By the way, using LIMIT is often meaningless without also using ORDER BY. The order of rows returned by an SQL query is arbitrary unless you specify an order.
Re your comment: Yes, it's common to need five random rows, but arbitrary is not the same as random. It's not common to need five arbitrary rows.